Export citation and abstract BibTeX RIS
1. Introduction
This research note presents a derivation and implementation of efficient and scalable gradient computations using the celerite algorithm for Gaussian Process (GP) modeling. The algorithms are derived in a "reverse accumulation" or "backpropagation" framework and they can be easily integrated into automatic differentiation frameworks to provide a scalable method for evaluating gradients of the GP likelihood. The algorithms derived in this note use less memory and are more efficient than versions using automatic differentiation and the cost scales linearly with the number of data points.
GPs (Rasmussen & Williams 2006) are a class of models used extensively in the astrophysical literature to model stochastic processes. The applications are broad-ranging with examples from time domain variability to the cosmic microwave background (see the references in Foreman-Mackey et al. 2017). In these applications, the calculation and optimization of the GP marginalized likelihood function
is the computational bottleneck. For a data set with N data points, every likelihood evaluation requires the log-determinant and inverse of the N × N covariance matrix . The cost scales as in general, but the celerite method can be used to compute Equation (1) for a class of models with scaling (Foreman-Mackey et al. 2017).
The details can be found in Foreman-Mackey et al. (2017), and the only difference in notation is that all the matrices in what follows are the "pre-conditioned" matrices, indicated with a tilde by Foreman-Mackey et al. (2017). The tilde is not included here for simplicity. I use the symbol P for the (N − 1) × J pre-conditioning matrix that was called ϕ by Foreman-Mackey et al. (2017). The factorization algorithm derived by Foreman-Mackey et al. (2017, their Equation (46)) is as follows:
function celerite_factor(U, P, , W) |
# Initially and W = V |
zeros(J, J) |
: |
return , W, S |
Similarly, the algorithm for applying the inverse of K is (Equations (47)–(48) in Foreman-Mackey et al. 2017):
function celerite_solve |
# Initially Z = Y |
zeros(J, Nrhs) |
: |
: |
zeros(J, Nrhs) |
: |
return Z, F, G |
2. Algorithms
Many numerical inference methods can benefit from efficient calculation of the gradient of Equation (1). This is generally computed following Section 5.4.1 in Rasmussen & Williams (2006), but even with a scalable method for applying , the cost of this method scales as . It was recently demonstrated that higher performance can be achieved by directly differentiating factorization algorithms (Murray 2016).
Following this reasoning and using the notation from (Giles 2008), I present the reverse-mode gradients of the celerite method. After some tedious algebra, the reverse accumulation function for celerite_factor is:
function celerite_factor_rev |
# Initially and |
: |
return , , , |
function celerite_solve_rev |
# Initially |
: |
: |
: |
return , , , , |
3. Discussion
This research note presents the algorithms needed to efficiently compute gradients of GP models applied to large data sets using the celerite method. These developments increase the performance of inference methods based on celerite and improve the convergence properties of nonlinear optimization routines. Furthermore, the derivation of reverse accumulation algorithms for celerite allow its integration into popular model building and automatic differentiation libraries like Stan (Carpenter et al. 2015), TensorFlow (Abadi et al. 2016), and others.