Inequality and its consequences are the subject of intense recent debate. Using a simplified model of the economy, we address the relation between inequality and liquidity, the latter understood as the frequency of economic exchanges. Assuming a Pareto distribution of wealth for the agents, that is consistent with empirical findings, we find an inverse relation between wealth inequality and overall liquidity. We show that an increase in the inequality of wealth results in an even sharper concentration of the liquid financial resources. This leads to a congestion of the flow of goods and the arrest of the economy when the Pareto exponent reaches one.

The International School for Advanced Studies (SISSA) was founded in 1978 and was the first institution in Italy to promote post-graduate courses leading to a Doctor Philosophiae (or PhD) degree. A centre of excellence among Italian and international universities, the school has around 65 teachers, 100 post docs and 245 PhD students, and is located in Trieste, in a campus of more than 10 hectares with wonderful views over the Gulf of Trieste.
SISSA hosts a very high-ranking, large and multidisciplinary scientific research output. The scientific papers produced by its researchers are published in high impact factor, well-known international journals, and in many cases in the world's most prestigious scientific journals such as Nature and Science. Over 900 students have so far started their careers in the field of mathematics, physics and neuroscience research at SISSA.
ISSN: 1742-5468
Journal of Statistical Mechanics: Theory and Experiment (JSTAT) is a multi-disciplinary, peer-reviewed international journal created by the International School for Advanced Studies (SISSA) and IOP Publishing (IOP). JSTAT covers all aspects of statistical physics, including experimental work that impacts on the subject.
João Pedro Jerico et al J. Stat. Mech. (2016) 073402
Itai Arad et al J. Stat. Mech. (2016) 033301
Local interactions in many-body quantum systems are generally non-commuting and consequently the Hamiltonian of a local region cannot be measured simultaneously with the global Hamiltonian. The connection between the probability distributions of measurement outcomes of the local and global Hamiltonians will depend on the angles between the diagonalizing bases of these two Hamiltonians. In this paper we characterize the relation between these two distributions. On one hand, we upperbound the probability of measuring an energy τ in a local region, if the global system is in a superposition of eigenstates with energies . On the other hand, we bound the probability of measuring a global energy
in a bipartite system that is in a tensor product of eigenstates of its two subsystems. Very roughly, we show that due to the local nature of the governing interactions, these distributions are identical to what one encounters in the commuting cases, up to exponentially small corrections. Finally, we use these bounds to study the spectrum of a locally truncated Hamiltonian, in which the energies of a contiguous region have been truncated above some threshold energy. We show that the lower part of the spectrum of this Hamiltonian is exponentially close to that of the original Hamiltonian. A restricted version of this result in 1D was a central building block in a recent improvement of the 1D area-law.
Eric W Tramel et al J. Stat. Mech. (2016) 073401
Approximate message passing (AMP) has been shown to be an excellent statistical approach to signal inference and compressed sensing problems. The AMP framework provides modularity in the choice of signal prior; here we propose a hierarchical form of the Gauss–Bernoulli prior which utilizes a restricted Boltzmann machine (RBM) trained on the signal support to push reconstruction performance beyond that of simple i.i.d. priors for signals whose support can be well represented by a trained binary RBM. We present and analyze two methods of RBM factorization and demonstrate how these affect signal reconstruction performance within our proposed algorithm. Finally, using the MNIST handwritten digit dataset, we show experimentally that using an RBM allows AMP to approach oracle-support performance.
Vincent D Blondel et al J. Stat. Mech. (2008) P10008
We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.
Anna Dawid and Yann LeCun J. Stat. Mech. (2024) 104011
Current automated systems have crucial limitations that need to be addressed before artificial intelligence can reach human-like levels and bring new technological revolutions. Among others, our societies still lack level-5 self-driving cars, domestic robots, and virtual assistants that learn reliable world models, reason, and plan complex action sequences. In these notes, we summarize the main ideas behind the architecture of autonomous intelligence of the future proposed by Yann LeCun. In particular, we introduce energy-based and latent variable models and combine their advantages in the building block of LeCun's proposal, that is, in the hierarchical joint-embedding predictive architecture.
Preetum Nakkiran et al J. Stat. Mech. (2021) 124003
We show that a variety of modern deep learning tasks exhibit a 'double-descent' phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effective model complexity and conjecture a generalized double descent with respect to this measure. Furthermore, our notion of model complexity allows us to identify certain regimes where increasing (even quadrupling) the number of train samples actually hurts test performance.
Francesco Ferraro et al J. Stat. Mech. (2025) 023301
We investigate a disordered multi-dimensional linear system in which the interaction parameters are colored noises, varying stochastically in time with defined temporal correlations. We refer to this type of disorder as 'annealed', in contrast to quenched disorder in which couplings are fixed over time. Using generating functional methods, we extend dynamical mean-field theory to accommodate annealed disorder and employ it to find the exact solution of the linear model in the limit of a large number of degrees of freedom. Our analysis yields analytical results for the non-stationary autocorrelation, the stationary variance, the power spectral density, and the phase diagram of the model. Some unexpected features emerge upon changing the correlation time of the interactions. The stationary variance of the system and the critical variance of the disorder are generally found to be non-monotonic functions of the correlation time of the interactions. We also find that a re-entrant phase transition can take place when this correlation time is varied.
Vincent Blondel et al J. Stat. Mech. (2024) 10R001
The Louvain method was proposed 15 years ago as a heuristic method for the fast detection of communities in large networks. During this period, it has emerged as one of the most popular methods for community detection: the task of partitioning vertices of a network into dense groups, usually called communities or clusters. Here, after a short introduction to the method, we give an overview of the different generalizations, modifications and improvements that have been proposed in the literature, and also survey the quality functions, beyond modularity, for which it has been implemented. Finally, we conclude with a discussion on the limitations of the method and perspectives for future research.
Soheli Mukherjee and Naftali R Smith J. Stat. Mech. (2025) 033205
We calculate the steady state distribution of the position of a Brownian particle under an intermittent confining potential that switches on and off with a constant rate γ. We assume the external potential
to be smooth and to have a unique global minimum at
, and in dimension d > 1 we additionally assume that
is central. We focus on the rapid-switching limit
. Typical fluctuations follow a Boltzmann distribution
, with an effective potential
, where D is the diffusion coefficient. However, we also calculate the tails of
which behave very differently. In the far tails
, a universal behavior
emerges, that is independent of the trapping potential. The mean first-passage time to reach position X is given, in the leading order, by
. This coincides with the Arrhenius law (for the effective potential
) for
, but deviates from it elsewhere. We give explicit results for the harmonic potential. Finally, we extend our results to periodic one-dimensional systems. Here, we find that in the limit of
and D → 0, the logarithm of
exhibits a singularity which we interpret as a first-order dynamical phase transition (DPT). This DPT occurs in the absence of any external drift. We also calculate the nonzero probability current in the steady state that is a result of the nonequilibrium nature of the system.
Hugo Cui J. Stat. Mech. (2025) 023402
Recent years have been marked with the fast-pace diversification and increasing ubiquity of machine learning (ML) applications. Yet, a firm theoretical understanding of the surprising efficiency of neural networks (NNs) to learn from high-dimensional data still proves largely elusive. In this endeavour, analyses inspired by statistical physics have proven instrumental, enabling the tight asymptotic characterization of the learning of NNs in high dimensions, for a broad class of solvable models. This manuscript reviews the tools and ideas underlying recent progress in this line of work. We introduce a generic model—the sequence multi-index model, which encompasses numerous previously studied models as special instances. This unified framework covers a broad class of ML architectures with a finite number of hidden units—including multi-layer perceptrons, autoencoders, attention mechanisms, and tasks –(un)supervised learning, denoising, contrastive learning, in the limit of large data dimension, and comparably large number of samples. We explicate in full detail the analysis of the learning of sequence multi-index models, using statistical physics techniques such as the replica method and approximate message-passing algorithms. This manuscript thus provides a unified presentation of analyses reported in several previous works, and a detailed overview of central techniques in the field of statistical physics of ML. This review should be a useful primer for ML theoreticians curious of statistical physics approaches; it should also be of value to statistical physicists interested in the transfer of such ideas to the study of NNs.
Hisato Komatsu J. Stat. Mech. (2025) 043401
In recent years, simulations of pedestrians using multi-agent reinforcement learning (MARL) have been studied. This study considers roads in a grid-world environment and implements pedestrians as MARL agents using an echo-state network and the least squares policy iteration method. In this environment, the ability of these agents to learn to move forward by avoiding other agents is investigated. Specifically, we consider two types of tasks: the choice between a narrow direct route and a broad detour and the bidirectional pedestrian flow in a corridor. The simulation results indicate that the learning is successful when the density of agents is not that high.
M A G Portillo and M G E da Luz J. Stat. Mech. (2025) 043202
In physics, with the advent of topological materials, and in chemistry, within the scope of chemical graph theory, topological invariants and/or indices have been considered to successfully characterize innumerous systems. In particular, strong links have been identified (both numerically and analytically) between properties of the Ising model on a lattice L and features of the so-called spanning trees (STs) of L. Nontheless, studies exploring this connection tend to address only a handful of cases given the demands of the necessary calculations. But examining only a few instances prevents one from looking for general trends across numerous L's, which could eventually reveal universal traits. In this contribution, we present the most comprehensive investigation to date, analyzing the Ising-ST relation for all the L's belonging to the families of
–uniform periodic tiling of the plane, in a total of 1248 lattices. With this goal, we develop optimized protocols (taking advantage of a recently proposed
representation for
) to compute for each L its ST constant λ and the Kac–Ward matrix. The determinant of the latter yields the Ising model free energy and consequently the critical temperature
. Then, considering the relatively large sample generated, we use machine learning techniques, which disclose a general correlation between the Ising critical temperature and the ST constant, described by a simple quadratic polynomial function P. As a benchmark, we test P for some arbitrary lattices (outside
), finding rather satisfactory fittings. These results point to a useful classification scheme for Ising
in 2D, demonstrating that λ can be a relevant topological concept to investigate lattice models. Finally, as a positive 'side-effect' of our computations, for these
's we confirm (and even improve) a conjectured inequality associating λ and the effective coordinator number κ of a lattice.
Alberto Bassanoni et al J. Stat. Mech. (2025) 043201
Even in a simple stochastic process, the study of the full distribution of time-integrated observables can be a difficult task. This is the case of a much-studied process such as the Ornstein–Uhlenbeck process where, recently, anomalous dynamical scaling of large deviations of time-integrated functionals has been highlighted. Using the mapping of a continuous stochastic process to a continuous time random walk via the 'excursions technique', we introduce a comprehensive formalism that enables the calculation of the complete distribution of the time-integrated observable , where n is a positive integer and v(t) is the random velocity of a particle following Ornstein–Uhlenbeck dynamics. We reveal an interesting connection between the anomalous rate function associated with the observable A and the statistics of the area under the first-passage functional during an excursion. The rate function of the latter, analyzed here for the first time, exhibits anomalous scaling behavior and a critical point in its dynamics, both of which are explored in detail. The case of the anomalous scaling of large deviations, originally associated with the presence of an instantonic solution in the weak noise regime of a path integral approach, is here produced by a so-called 'big jump effect', in which the contribution to rare events is dominated by the largest excursion. Our approach, which is quite general for continuous stochastic processes, allows us to associate a physical meaning with the anomalous scaling of large deviations through the big jump principle.
Xiaosi Gu and Tomoyuki Obuchi J. Stat. Mech. (2025) 033404
Semi-supervised learning (SSL) is a machine learning methodology that leverages unlabeled data in conjunction with a limited amount of labeled data. Although SSL has been applied in various applications and its effectiveness has been empirically demonstrated, it is still not fully understood when and why SSL performs well. Several existing theoretical studies have attempted to address this issue by modeling classification problems using the so-called Gaussian mixture model (GMM). These studies provide notable and insightful interpretations. However, their analyses are focused on specific purposes, and a thorough investigation of the properties of GMM in the context of SSL has been lacking. In this paper, we conduct a detailed analysis of the properties of the high-dimensional GMM for binary classification in the SSL setting. To this end, we employ the approximate message-passing and state evolution methods, which are widely used in high-dimensional settings and originate from statistical mechanics. We deal with two estimation approaches: the Bayesian one and the -regularized maximum likelihood estimation (RMLE). We conduct a comprehensive comparison of these two approaches, examining aspects such as the global phase diagram, estimation error for the parameters, and prediction error for the labels. A specific comparison is made between the Bayes-optimal (BO) estimator and RMLE, as the BO setting provides the optimal estimation performance and is ideal as a benchmark. Our analysis shows that with appropriate regularizations, RMLE can achieve a near-optimal performance in terms of both the estimation error and prediction error, especially when there is a large amount of unlabeled data. These results demonstrate that the
regularization term plays an effective role in estimation and prediction in SSL approaches.
Alessandro Pacco et al J. Stat. Mech. (2025) 033302
We compute the distribution of triplets of stationary points in the energy landscape of the spherical p-spin model, by evaluating the quenched three-point complexity by means of the Kac–Rice formalism. We show the occurrence of transitions in the organization of stationary points in the landscape, identifying regions where local minima and saddles accumulate and cluster around other stationary points, thus displaying the presence of correlations in the landscape. We discuss the implications of these findings for the dynamical exploration of the energy landscape in the activated regime, specifying conditions under which transitions between local minima are expected to exhibit correlated rates and when, conversely, activated jumps are likely to be memoryless.
Vincent Blondel et al J. Stat. Mech. (2024) 10R001
The Louvain method was proposed 15 years ago as a heuristic method for the fast detection of communities in large networks. During this period, it has emerged as one of the most popular methods for community detection: the task of partitioning vertices of a network into dense groups, usually called communities or clusters. Here, after a short introduction to the method, we give an overview of the different generalizations, modifications and improvements that have been proposed in the literature, and also survey the quality functions, beyond modularity, for which it has been implemented. Finally, we conclude with a discussion on the limitations of the method and perspectives for future research.
Annabel L Davies and Tobias Galla J. Stat. Mech. (2022) 11R001
Network meta-analysis (NMA) is a technique used in medical statistics to combine evidence from multiple medical trials. NMA defines an inference and information processing problem on a network of treatment options and trials connecting the treatments. We believe that statistical physics can offer useful ideas and tools for this area, including from the theory of complex networks, stochastic modelling and simulation techniques. The lack of a unique source that would allow physicists to learn about NMA effectively is a barrier to this. In this article we aim to present the 'NMA problem' and existing approaches to it coherently and in a language accessible to statistical physicists. We also summarise existing points of contact between statistical physics and NMA, and describe our ideas of how physics might make a difference for NMA in the future. The overall goal of the article is to attract physicists to this interesting, timely and worthwhile field of research.
Shamik Gupta et al J. Stat. Mech. (2014) R08001
The phenomenon of spontaneous synchronization, particularly within the framework of the Kuramoto model, has been a subject of intense research over the years. The model comprises oscillators with distributed natural frequencies interacting through a mean-field coupling, and serves as a paradigm to study synchronization. In this review, we put forward a general framework in which we discuss in a unified way known results with more recent developments obtained for a generalized Kuramoto model that includes inertial effects and noise. We describe the model from a different perspective, highlighting the long-range nature of the interaction between the oscillators, and emphasizing the equilibrium and out-of-equilibrium aspects of its dynamics from a statistical physics point of view. In this review, we first introduce the model and discuss both for the noiseless and noisy dynamics and for unimodal frequency distributions the synchronization transition that occurs in the stationary state. We then introduce the generalized model, and analyze its dynamics using tools from statistical mechanics. In particular, we discuss its synchronization phase diagram for unimodal frequency distributions. Next, we describe deviations from the mean-field setting of the Kuramoto model. To this end, we consider the generalized Kuramoto dynamics on a one-dimensional periodic lattice on the sites of which the oscillators reside and interact with one another with a coupling that decays as an inverse power-law of their separation along the lattice. For two specific cases, namely, in the absence of noise and inertia, and in the case when the natural frequencies are the same for all the oscillators, we discuss how the long-time transition to synchrony is governed by the dynamics of the mean-field mode (zero Fourier mode) of the spatial distribution of the oscillator phases.
Alberto Bassanoni et al J. Stat. Mech. (2025) 043201
Even in a simple stochastic process, the study of the full distribution of time-integrated observables can be a difficult task. This is the case of a much-studied process such as the Ornstein–Uhlenbeck process where, recently, anomalous dynamical scaling of large deviations of time-integrated functionals has been highlighted. Using the mapping of a continuous stochastic process to a continuous time random walk via the 'excursions technique', we introduce a comprehensive formalism that enables the calculation of the complete distribution of the time-integrated observable , where n is a positive integer and v(t) is the random velocity of a particle following Ornstein–Uhlenbeck dynamics. We reveal an interesting connection between the anomalous rate function associated with the observable A and the statistics of the area under the first-passage functional during an excursion. The rate function of the latter, analyzed here for the first time, exhibits anomalous scaling behavior and a critical point in its dynamics, both of which are explored in detail. The case of the anomalous scaling of large deviations, originally associated with the presence of an instantonic solution in the weak noise regime of a path integral approach, is here produced by a so-called 'big jump effect', in which the contribution to rare events is dominated by the largest excursion. Our approach, which is quite general for continuous stochastic processes, allows us to associate a physical meaning with the anomalous scaling of large deviations through the big jump principle.
Soheli Mukherjee and Naftali R Smith J. Stat. Mech. (2025) 033205
We calculate the steady state distribution of the position of a Brownian particle under an intermittent confining potential that switches on and off with a constant rate γ. We assume the external potential
to be smooth and to have a unique global minimum at
, and in dimension d > 1 we additionally assume that
is central. We focus on the rapid-switching limit
. Typical fluctuations follow a Boltzmann distribution
, with an effective potential
, where D is the diffusion coefficient. However, we also calculate the tails of
which behave very differently. In the far tails
, a universal behavior
emerges, that is independent of the trapping potential. The mean first-passage time to reach position X is given, in the leading order, by
. This coincides with the Arrhenius law (for the effective potential
) for
, but deviates from it elsewhere. We give explicit results for the harmonic potential. Finally, we extend our results to periodic one-dimensional systems. Here, we find that in the limit of
and D → 0, the logarithm of
exhibits a singularity which we interpret as a first-order dynamical phase transition (DPT). This DPT occurs in the absence of any external drift. We also calculate the nonzero probability current in the steady state that is a result of the nonequilibrium nature of the system.
Luke Neville J. Stat. Mech. (2025) 033204
Using a path integral approach, we derive and study the hydrodynamic equations and large deviation functions (LDFs) for three active lattice gases. After a review of the path integral for master equations, we first look at a one-dimensional model of motility-induced phase separation (MIPS), rederiving the LDF that was previously found through mapping to the ABC model. After extracting the deterministic hydrodynamic equations from the LDF, we analyse them perturbatively near the MIPS critical point using a weakly non-linear analysis. By doing this, we show that they reduce to equilibrium model B very close to criticality, with non-equilibrium, or active model B terms emerging as we leave the critical region. The same type of weakly non-linear analysis is then applied to the full LDF and we show that the near-critical stationary probability distribution is given by the exponential of a φ4 free energy, as expected in ordinary equilibrium phase separation. Similar calculations are then done for the two other lattice gases, one which is another MIPS model and another that models flocking, and in both cases we find analogous results.

H Jürgens and H Boos J. Stat. Mech. (2025) 033104
We present an ansatz of generalising the construction of recursion relations for the correlation functions of the -invariant fundamental exchange model in the thermodynamic limit by Jimbo, Miwa, Smirnov, Takeyama and one of our present authors in 2004 for higher rank. Due to the structure of the correlators as functions of their inhomogeneity parameters, a recursion formula for the reduced density matrix was proven. In the case of
, we use the explicit results of Klümper and Ribeiro, and Nirov, Hutsalyuk and one of our present authors for the reduced density matrix of up to operator length three to verify whether it is possible to relate the residues of the density matrix of length n to the density matrix of length smaller than n as in
. This is unclear, since the reduced quantum Knizhnik–Zamolodchikov equation splits into two parts for higher rank. In fact, we show two relations, one of which is a straightforward generalisation to the
case and one which is completely new. This allows us to construct an analogue of the operator Xk, which we call the snail operator. In the
case, this operator has many useful properties, including in particular the fact that only one irreducible representation of the Yangian
, the Kirillov–Reshetikhin module Wk, contributes to the residue at
. Here, we give an overview of the mathematical background, T-systems, and show a new application of the extended T-systems introduced by Mukhin and Young in 2012 regarding the snail operator.
Valtteri Haavisto et al J. Stat. Mech. (2025) 033301
Predicting the future behavior of complex systems exhibiting critical-like dynamics is often considered to be an intrinsically hard task. Here, we study the predictability of the depinning dynamics of elastic interfaces in random media driven by a slowly increasing external force, a paradigmatic complex system exhibiting critical avalanche dynamics linked to a continuous non-equilibrium depinning phase transition. To this end, we train a variety of machine learning models to infer the mapping from features of the initial relaxed line shape and the random pinning landscape to predict the sample-dependent staircase-like force–displacement curve that emerges from the depinning process. Even if for a given realization of the quenched random medium the dynamics are in principle deterministic, we find that there is an exponential decay of the predictability with the displacement of the line as it nears the depinning transition from below. Our analysis on how the related displacement scale depends on the system size and the dimensionality of the input descriptor reveals that the onset of the depinning phase transition gives rise to fundamental limits to predictability.
Riccardo Travaglino et al J. Stat. Mech. (2025) 033102
We employ the quasiparticle picture of entanglement evolution to obtain an effective description for the out-of-equilibrium entanglement Hamiltonian at the hydrodynamical scale following quantum quenches in free fermionic systems in two or more spatial dimensions. Specifically, we begin by applying dimensional reduction techniques in cases where the geometry permits, building directly on established results from one-dimensional systems. Subsequently, we generalize the analysis to encompass a wider range of geometries. We obtain analytical expressions for the entanglement Hamiltonian valid at the ballistic scale, which reproduce the known quasiparticle picture predictions for the Renyi entropies and full counting statistics. We also numerically validate the results with excellent precision by considering quantum quenches from several initial configurations.
Haggai Bonneau et al J. Stat. Mech. (2025) 033201
Particle–particle correlation functions in ionic systems control many of their macroscopic properties. In this work, we use stochastic density functional theory to compute these correlations, and then we analyze their long-range behavior. In particular, we study the system's response to a rapid change (quench) in the external electric field. We show that the correlation functions relax diffusively toward the non-equilibrium stationary state and that in a stationary state, they present a universal conical shape. This shape distinguishes this system from systems with short-range interactions, where the correlations have a parabolic shape. We relate this temporal evolution of the correlations to the algebraic relaxation of the total charge current reported previously.
M A Korzeniowska and O E Garcia J. Stat. Mech. (2025) 023206
Long-range correlations manifested as power spectral density scaling for frequency f and a range of exponents β are investigated for a superposition of uncorrelated pulses with distributed durations τ. Closed-form expressions for the frequency power spectral density are derived for a one-sided exponential pulse function and several variants of bounded and unbounded power-law distributions of pulse durations
with abrupt and smooth cutoffs. The asymptotic scaling relation
is demonstrated for
in the limit of an infinitely broad distribution
. Logarithmic corrections to the frequency scaling are exposed at the boundaries of the long-range dependence regime, β = 0 and β = 2. Analytically demonstrated finite-size effects associated with distribution truncations are shown to reduce the frequency ranges of scale invariance by several decades. The regimes of validity of the
relation are clarified.
Hugo Cui J. Stat. Mech. (2025) 023402
Recent years have been marked with the fast-pace diversification and increasing ubiquity of machine learning (ML) applications. Yet, a firm theoretical understanding of the surprising efficiency of neural networks (NNs) to learn from high-dimensional data still proves largely elusive. In this endeavour, analyses inspired by statistical physics have proven instrumental, enabling the tight asymptotic characterization of the learning of NNs in high dimensions, for a broad class of solvable models. This manuscript reviews the tools and ideas underlying recent progress in this line of work. We introduce a generic model—the sequence multi-index model, which encompasses numerous previously studied models as special instances. This unified framework covers a broad class of ML architectures with a finite number of hidden units—including multi-layer perceptrons, autoencoders, attention mechanisms, and tasks –(un)supervised learning, denoising, contrastive learning, in the limit of large data dimension, and comparably large number of samples. We explicate in full detail the analysis of the learning of sequence multi-index models, using statistical physics techniques such as the replica method and approximate message-passing algorithms. This manuscript thus provides a unified presentation of analyses reported in several previous works, and a detailed overview of central techniques in the field of statistical physics of ML. This review should be a useful primer for ML theoreticians curious of statistical physics approaches; it should also be of value to statistical physicists interested in the transfer of such ideas to the study of NNs.
Maria Refinetti et al J. Stat. Mech. (2025) 024001
The uncanny ability of over-parameterised neural networks to generalise well has been explained using various 'simplicity biases'. These theories postulate that neural networks avoid overfitting by first fitting simple, linear classifiers before learning more complex, non-linear functions. Meanwhile, data structure is also recognised as a key ingredient for good generalisation, yet its role in simplicity bias is not yet understood. Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, such as mean and covariance, and exploit higher-order statistics only later during training. We first demonstrate this distributional simplicity bias (DSB) in a solvable model of a single neuron trained on synthetic data. We then demonstrate DSB empirically in a range of deep convolutional networks and visual transformers trained on CIFAR10, and show that it even holds in networks pre-trained on ImageNet. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of Gaussian universality in learning.