Letter The following article is Free article

Open challenges in environmental data analysis and ecological complex systems(a)

, and

Published 2 March 2021 Copyright © 2021 EPLA
, , Progress on Statistical Physics and Complexity Citation D. T. Hristopulos et al 2020 EPL 132 68001 DOI 10.1209/0295-5075/132/68001

0295-5075/132/6/68001

Abstract

This letter focuses on open challenges in the fields of environmental data analysis and ecological complex systems. It highlights relations between research problems in stochastic population dynamics, machine learning and big data research, and statistical physics. Recent and current developments in statistical modeling of spatiotemporal data and in population dynamics are briefly reviewed. The presentation emphasizes stochastic fluctuations, including their statistical representation, data-based estimation, prediction, and impact on the physics of the underlying systems. Guided by the common thread of stochasticity, a deeper and improved understanding of environmental processes and ecosystems can be achieved by forging stronger interdisciplinary connections between statistical physics, spatiotemporal data modeling, and ecology.

Export citation and abstract BibTeX RIS

Introduction

Current research in environmental and ecological complex systems faces two important challenges: i) a growing need to identify and understand general mechanisms that underlie the effects of environmental noise on population dynamics [14]; ii) the ability to process and model a plethora of data pertaining to various aspects of environmental, climate and ecological processes [58]. Such data are being gathered by means of ground-based stations, sensor networks, and remote-sensing instruments. In contrast with the past when environmental data were scarce and relatively inaccessible, to date there is an overwhelming amount of Earth observation data. These are characterized by complex spatiotemporal (ST) dependences, high volume (large size), high speed (time series of satellite images), high dimensionality (multiple sources/spectral bands, etc.), high uncertainty (due to measurement and registration errors), non-repeatability (due to non-stationary evolution) [9,10]. In addition, the data often include spatial or temporal gaps.

Modeling complex environmental data and the dynamics of ecological complex systems poses new challenges. Statistical physics can help to address these challenges by: i) providing physically inspired tools for statistical modeling, ii) elucidating through theoretical models the underlying physical processes, and iii) improving our understanding of the role of environmental noise.

Stochastic components are important in environmental and ecological modeling. Stochastic signals are often referred to as "noise" in statistical physics. However, it should be mentioned that "noise" involves correlated (colored), in addition to uncorrelated (white) fluctuations. The presence of correlations, short- or long-ranged, contains important physical information which can be used to characterize, simulate and predict the underlying processes. In addition, the combination of stochastic fluctuations with nonlinear dynamics has significant impact on the climate, the quality of the environment, and the availability of natural resources [1118]. Accurate statistical physics models of the fluctuations and understanding of their interplay with nonlinearity are thus crucial for prediction/forecasting. In ecological systems the noise is generated by the continuous and inexorable presence of random fluctuations originating in the environment [1921]. Consequently, the dynamics of communities, genetics, and epidemics should be described by means of stochastic approaches with multiplicative noise sources [1,2,2224]. In particular, stochastic fluctuations give rise to phenomena that cannot be explained in the context of deterministic approaches which treat noise as mere nuisance [2527]. For example, in gene expression stochasticity helps cells adapt to fluctuating environments and respond to sudden stresses. It also contributes to establishing population heterogeneity during cellular differentiation and development [2832]. Noise can also induce heterogeneity in the fate of cells through adaptation mechanisms. Indeed, the presence of noise can change fundamentally the physics of the system. Thus, it is not surprising that the study of noise in biological and ecological systems has emerged as a "hot" research topic in the last few years [3340].

The aim of this letter is to review current research and to draw the interest of physicists in open research challenges originating in ecological complex systems and environmental data modeling [17,4151]. Such challenges involve questions related to stochastic dynamic models for ecological systems, and data analysis models which can provide detailed information regarding the intrinsic features of environmental and ecological systems. The advent of big environmental and Earth observation data presents many challenges and opportunities for physicists, because the analysis of such data can benefit from physical understanding of the underlying processes.

The paper is organized in two sections: i) "Statistical models for spatiotemporal data"; ii) "Stochastic modeling of population dynamics". In the first section we focus on three topics: flexible mathematical framework, regression, and physical understanding. In the second, we focus on recent advances in noisy non-equilibrium processes useful for describing the complex dynamics of ecological systems through three main topics: population dynamics, spatially extended systems, and non-Gaussian noise sources.

Due to lack of space the list of topics is highly biased and the discussion is non-mathematical.

Statistical models for spatiotemporal data

In this section we focus on statistical models and methods inspired by physics which can be used to represent and analyze environmental, climate and ecological (henceforward, environmental for brevity) data. Typical tasks of statistical modeling involve model construction, parameter inference, model selection (if more than one model are considered), and prediction. The latter may involve interpolation (filling of spatial or temporal gaps within the data domain), extrapolation (prediction outside the spatial domain of the data), or forecasting (prediction at future times). The probabilistic framework is ideally suited for the development of flexible and accurate models, given the omnipresence of stochastic fluctuations and the data features discussed in the Introduction [6,17,5254].

Random fields

Environmental data are distributed over spatial domains of varying size and span different time periods. Hence, suitable statistical models should include both space and time dependence. In addition, they should account for the inherent uncertainty of the data and the a priori unknown complex variability of the generating process. A space-time random field $\{ X(\mathbf{s},t, \omega)$ : $\mathbf{s} \in \mathcal{D} \subset \mathbb{R}^{d}, t \in T \subset \mathbb{R} \}$ , where s denotes the spatial position in the domain $\mathcal{D}$ and t the time instant in the interval T, is a scalar, real-valued random function defined on a probability space $(\Omega, F, P)$ ; ${\Omega}$ is the sample space, $F$ is the σ-field of subspaces of ${\Omega}$ , and $P$ is a probability measure [55].

Random fields were introduced in fluid turbulence studies by Kolmogorov and his students. Due to their flexible space-time (ST) dependence, random fields are widely used in various disciplines including statistical field theory, hydrology, and ST data modeling, among others [11,15,17,5254,56,57].

The construction of Gaussian random fields follows different paths in statistical physics and in statistics: in the former the field is often formulated in the Boltzmann-Gibbs (B-G) representation by means of a suitable energy function, while in the latter the field is defined in terms of its expectation (mean) and covariance function. The B-G framework provides sparse representation and computational speed for lattice data, e.g., [58,59]. These advantages stem from explicit expressions which can be obtained for the precision (inverse covariance) matrix. On regular grids the B-G representation leads to Gauss-Markov random fields [6062]. On the other hand, in continuum systems the B-G representation gives rise to Gaussian field theories [57,63,64]. If the latter admit closed-form solutions, they can lead to new spatial covariance functions, e.g., [65,66]. However, space-time field theories do not yield easily explicit expressions for the covariance function [67].

Random field theory provides a powerful toolbox for the interpolation, forecasting and simulation of complex ST processes. The best linear unbiased predictor (BLUP), also known as kriging, is the key tool for prediction purposes. The predictive equations at an unobserved ST point require solving an $N\times N$ linear system (where N is the number of data) which involves the field's covariance function. The computational time required for solving such systems with dense covariance matrices scales as $\mathcal{O}(N^3)$ and the memory storage requirements as $\mathcal{O}(N^2)$  [17,53,68,69]. Hence, approximations or different approaches are necessary in order to handle large datasets.

Open research questions of relevance to statistical physics include the following: 1) The development of ST covariance functions which are not only mathematically permissible but also physically meaningful, e.g., [54,67,7072]. 2) New interpolation and simulation approaches for big datasets that reduce the computational cost, e.g., methods based on sparse precision matrices [42,44,59]. 3) The construction of more flexible B-G random fields for continuum and lattice spaces. 4) Novel, computationally tractable models for non-Gaussian dependence. For example, one possibility is offered by using the kappa exponential and logarithm functions [73] to derive κ-lognormal random (KLN) fields by transforming a latent Gaussian random field [17]. These modified lognormal fields have a probability density function (pdf) whose right tail is lighter than the lognormal's and its decay is controlled by the κ parameter. A κ-deformed Weibull marginal pdf [74] has been successfully applied to fit the long (power-law) tail of earthquake recurrence times data [75] and Covid-19 mortality data [76]. A joint pdf generalizing the κ-Weibull marginal for joint "many-body" dependence correlations would be a welcome addition. The topics outlined above are also pertinent for machine learning (see below).

Machine learning (ML) and Gaussian processes (GPs)

ML research has captured the interest of physicists [15,69,77,78]. ML methods can "learn" from the data almost automatically (i.e., with minimal or no interaction with the modeler) and easily adapt (generalize) to new data. Such features are very appealing in the era of big data. ML methods can successfully perform complex classification tasks, and they also provide new methods for the solution of partial differential equations that represent physical processes [7880].

Gaussian processes generalize Gaussian random fields; they represent functions $\Phi(\mathbf{x}; \omega)$ , where $\mathbf{x} \in \mathbb{R}^{D}$ is an input vector in a D-dimensional space, not necessarily restricted to the space-time domain. GP regression (GPR) is an ML procedure which provides an optimal estimate of the output variable $y_{\ast}=\Phi(\mathbf{x}_{\ast}; \omega)$ based on a set of input vectors and their respective outputs $\{ \mathbf{x}_{n}, y_{n} \}_{n=1}^{N}$ , where $\mathbf{x}_{\ast} \neq \mathbf{x}_{n}, \forall n =1, \ldots, N$ . GPR predictive equations are similar to BLUP. The main difference is that the ST covariance used in BLUP is replaced by a kernel function in GPR; the latter measures output correlations in terms of the distance between the input vectors. The Bayesian framework allows including informed (non-flat) prior guesses for GP parameters. Hence, the mathematical machinery developed for random fields carries over nicely to GPs. Open research questions for random fields are also pertinent for GPs. In particular, the development of scalable GP models that can handle massive data is a current priority [45,8184]. Sparse GPs are approximations which can improve the computational cost of GPs to $\mathcal{O}(N^{2}M)$ and the memory requirements to $\mathcal{O}(NM)$ , where M < N [85,86]. A somewhat different approach is the development of sparse GPs based on local inverse covariance (precision) operators [17,44,66].

Neural networks (NN)

NN are popular ML tools for classification and regression tasks. NN are thus ideal candidates for environmental data modeling. For example, a GP regression network applied to ground pollution data gave improved statistical validation measures compared to other methods [87]. Strong links exist between NN and GP (and therefore random fields), which are little known outside the ML community. For example, it has been shown that single-layer, feed-forward Bayesian NN with an infinite number of hidden units (i.e., an infinitely wide NN) and independent, identically distributed priors over the parameters are equivalent to GPs [88]. The kernel (covariance function) of the equivalent GP can be obtained in closed form in this case [17,69]. More recently, it was also shown [43,89] that deep, infinitely wide NN are also equivalent to GPs. A computationally efficient recipe for computing the GP covariance corresponding to wide neural networks with a finite number of layers was formulated, and NN accuracy was found to approach the respective GP's accuracy with increasing layer width. Another recent contribution shows that the output of a (residual) convolutional NN with an appropriate prior over the weights and biases is equivalent to a GP in the limit of infinite depth, and that the equivalent kernel can be computed exactly [90].

ML applications in environmental data modeling will multiply in coming years. Harnessing connections between ML, statistical physics and statistical learning will lead to new advances in all of these fields.

Stochastic modeling of population dynamics

This section focuses on non-equilibrium processes used to describe the complex dynamics of ecological systems through three main topics: population dynamics, spatially extended systems, and non-Gaussian noise sources. Indeed, the investigation of stochastic nonlinear effects in various fields of life sciences [9195], interdisciplinary physics [96,97], and condensed matter [98,99] strongly attracts the attention of researchers. The interplay between the nonlinearity of living systems and the environmental noise can give rise to new counterintuitive phenomena such as noise-induced phase transitions [19,100103], stochastic and coherence resonance [25,104], stochastic resonance activation [99,105,106], noise-enhanced stability [92,107113], noise-induced excitability and synchronization [114], transition from order to chaos [115], noise-induced transport [116,117], and stochastic pattern formation [106,118120]. The characterization of the resulting spatiotemporal patterns and spatial organization is key to the analysis of ecological time series and the modeling of ecosystem dynamics [47]. Such phenomena are also actively studied in neuron models [121]. Moreover, qualitative transformations and ecological shifts caused by random fluctuations, similar to phase transitions, were also found in population systems [36,46,95,119,122].

Population dynamics

Population dynamics is a specific branch of the dynamics of complex ecological systems, which can be considered as a foundational subfield of non-equilibrium statistical physics. Recently, population dynamics has become a crucial tool for investigating the fundamental puzzle of the emergence and stabilization of biodiversity [123]. Ecological complex systems are open, subject to random environmental perturbations, and involve nonlinear interactions between constituent parts. These systems are very sensitive to the initial conditions, deterministic external perturbations and random fluctuations. The study of far-from-equilibrium stochastic processes is crucial for modeling ecological dynamics and understanding the mechanisms which govern the spatiotemporal dynamics of ecosystems, see ref. [47]. Even low-dimensional systems exhibit a huge variety of noise-driven phenomena, ranging from less to more ordered system dynamics [47,50,51,124,125].

In systems of interacting populations, even small random disturbances can cause opposite effects, such as extinction or explosive population growth. The identification of general laws governing such stochastic phenomena and the development of constructive methods for their mathematical modeling and analysis are important tasks. The diverse and complex behavior of population systems is associated with the nonlinear nature of interacting biological factors: limited ecological niche, age and gender differences, dependence of fertility on population size, interactions with other populations, and environmental influences [126].

Random environmental fluctuations represent a major source of risk for wild populations. Hence, identifying general mechanisms linking global environmental changes with population dynamics is an important topic of research. In connection to this, there is an urgent need to understand and predict the temporal and spatial autocorrelation patterns of environmental noise.

Reference [1] shows experimentally that the environmental autocorrelation has significant impact on population dynamics and extinction rates, and the latter can be accurately predicted if the memory of the past environment is accounted for. The experiment exposed nearly 1000 lines of the microalgae Dunaliella salina to randomly fluctuating salinity, with autocorrelation ranging from negative to strongly positive values. The authors observed lower population growth and greater extinction rates for the lower autocorrelation values, thus demonstrating that non-genetic inheritance is potentially a major driver of population dynamics in randomly fluctuating environments.

Spatially extended systems

Stochastic population dynamics models in spatially extended systems demonstrate the crucial role of noise and correlations in biological systems [123]. Theoretical approaches typical of non-equilibrium statistical physics capture the noisy kinetics of complex many-body biological systems and have led to unexpected new and intriguing behavior in simple paradigmatic model systems. This behavior ranges from persistent population oscillations stabilized by intrinsic noise and strong renormalization of the associated kinetic parameters induced by correlations to the emergence of continuous out-of-equilibrium phase transitions, as well as the spontaneous formation of rich spatiotemporal patterns. The spatial degrees of freedom can drastically extend extinction times through the emergence of noise-stabilized structures, and hence promote ecological stability and species diversity. For example, spatially extended predator-prey systems display noise-stabilized activity fronts that generate persistent correlations, so that the critical steady-state and non-equilibrium relaxation dynamics at the predator extinction threshold are governed by the directed percolation universality class [123]. Ecosystems display complex spatial organization. However, spatial models still present several open problems, thus limiting the quantitative understanding of spatial biodiversity and different time scales. Indeed, the connection between spatially extended ecological models and the physics of non-equilibrium phase transitions represents an open problem of paramount importance. In this context, translating concepts developed in statistical physics into language and tools applicable to population dynamics is crucial for progress [123,127]. Moreover, changes of external conditions strongly influence the dynamics and the organization of biological systems. In fact, the characteristic timescales of environmental variations as well as their correlations play a fundamental role in how living systems adapt and respond to the variability of environmental parameters. Relevant questions include the stationary population density and the role of random fluctuations on the extinction dynamics within an ecosystem [24].

The crucial role of random fluctuations is evident in cell biology. It is known that the presence of noise in intracellular processes can limit the performance of cells by preventing optimal concentration of the cell's molecular components. In contrast, noise can be used to create diversity in a clonal population, providing the basis for bet-hedging strategies in fluctuating environments. As a result, noise optimization can lead to substantial selective forces acting on genome evolution [128].

Non-Gaussian noise sources

Experimental population dynamics data sampled in natural systems are in general modeled as multiplicative white Gaussian noise [23,129]. Aiming to provide a more realistic description of stochastic dynamics of natural systems, some recent works investigate intrinsically non-Gaussian noise signals, characterized by sudden random variations [130]. In particular, the environmental fluctuations were modeled by using the archetypical pulse noise source, i.e., a sequence of rectangular pulses with the following properties: i) fixed width; ii) height h distributed according to a certain probability function wh ; iii) times t of occurrence distributed according to a certain probability function wt . The impact of a pulse noise source, modeled as Poisson white noise, on population dynamics was studied in [131]. More recently, the stability conditions for the dynamics of termite populations have been investigated in the presence of a noise source with different statistical properties, ranging from sub- to super-Poisson process, in two different cases: i) positive-defined pulses; ii) negative-defined pulses [130]. The effect of noise correlations has been evaluated by means of a stochastic differential equation with a noise source modeled as a renewal process with suitable statistics. This work extended previous studies (see ref. [130] and references therein), in which the stability properties of such a model were investigated in the presence of a multiplicative, positive-defined, sub-Poisson pulse process.

This Perspective paper poses as an urgent open problem the development of new and effective tools for describing and modeling the dynamics that underlie environmental and biological data. This requires theoretical approaches in which non-equilibrium statistical physics and stochastic processes play a key role in the construction of realistic and effective models, which can capture the intrinsically nonlinear and noisy dynamics of natural systems.

Concluding remarks

This letter highlights the interdisciplinary links between the analysis of environmental data and their modeling through statistical physics and machine learning. These topics are linked by the common need for modeling and predicting the spatiotemporal patterns of the inescapable stochastic fluctuations (noise).

While the focus is on environmental data, statistical physics and theoretical ecology, the topics covered herein are of interest to researchers who work in the fields of population dynamics, space-time epidemiology, geography, hydrology, and renewable energy resources.

Non-equilibrium and equilibrium statistical physics represent the main theoretical tools for modeling and forecasting environmental processes. They can lead to new methods for analyzing and reproducing the dynamics underlying experimental data and field observations [132135]. We also expect such new theoretical approaches originating in non-equilibrium statistical physics to become a driving force for new experiments. The interdisciplinary cross-fertilization between statistical physics and ecology will become increasingly important in the future, as statistical physics is a fundamental tool suitable for a deeper understanding and quantitative description of the organization and functioning of ecosystems.

Footnotes

  • (a) 

    Contribution to the Focus Issue Progress on Statistical Physics and Complexity edited by Roberta Citro, Giorgio Kaniadakis, Claudio Guarcello, Antonio Maria Scarfone and Davide Valenti.

Please wait… references are loading.
10.1209/0295-5075/132/68001