DYNAMICAL INFERENCE FROM A KINEMATIC SNAPSHOT: THE FORCE LAW IN THE SOLAR SYSTEM

, , and

Published 2010 February 22 © 2010. The American Astronomical Society. All rights reserved.
, , Citation Jo Bovy et al 2010 ApJ 711 1157 DOI 10.1088/0004-637X/711/2/1157

0004-637X/711/2/1157

ABSTRACT

If a dynamical system is long-lived and non-resonant (that is, if there is a set of tracers that have evolved independently through many orbital times), and if the system is observed at any non-special time, it is possible to infer the dynamical properties of the system (such as the gravitational force or acceleration law) from a snapshot of the positions and velocities of the tracer population at a single moment in time. In this paper, we describe a general inference technique that solves this problem while allowing (1) the unknown distribution function of the tracer population to be simultaneously inferred and marginalized over, and (2) prior information about the gravitational field and distribution function to be taken into account. As an example, we consider the simplest problem of this kind: we infer the force law in the solar system using only an instantaneous kinematic snapshot (valid at 2009 April 1.0) for the eight major planets. We consider purely radial acceleration laws of the form ar = −A [r/r0]−α, where r is the distance from the Sun. Using a probabilistic inference technique, we infer 1.989 < α < 2.052 (95% interval), largely independent of any assumptions about the distribution of energies and eccentricities in the system beyond the assumption that the system is phase-mixed. Generalizations of the methods used here will permit, among other things, inference of Milky Way dynamics from Gaia-like observations.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

The Gaia Satellite (Perryman et al. 2001) will measure positions and velocities for millions to billions of stars at varying precision. One of the principal goals of this mission is to provide the data necessary to infer the dynamical state of the Milky Way. However, there are issues in principle with inference of dynamics from a snapshot or instantaneous set of configuration and velocity measurements: the instantaneous positions and velocities have no necessary relationship with the gravitational potential or accelerations. Indeed, despite considerable literature (for example, Oort 1932; Schwarzschild 1979; Little & Tremaine 1987; Kaasalainen & Binney 1994; Johnston et al. 1999; Beloborodov & Levin 2004) there is no methodology for performing this inference that naturally handles all of the issues, including finite and non-trivial observational uncertainties or noise, missing data, non-steady aspects of the mass distribution, and the (incredibly likely) possibility that the potential is not (simply) integrable. Robust inference may not even be possible if the Milky Way has significant time dependence or is strongly chaotic or is far from showing any simple symmetries (such as axisymmetry).

Certainly there is no hope for dynamical inference on the massive scale required for the Gaia data set if we cannot perform it on much simpler, much more symmetrical, much older (in a dynamical sense), and much smaller (in a data sense) systems. In what follows, we take one of the simplest possible systems—the solar system—and the smallest possible data set—the positions and velocities of the major planets at a moment in time—and perform a complete dynamical inference. For a test system we could also have chosen the black hole at the Galactic center, where similar considerations apply. However, this system has additional issues with "missing data" because not all six phase-space coordinates are directly measurable for all stars. The solar system truly is the simplest problem in this class.

Our inferential starting point is orbital roulette (Beloborodov & Levin 2004). In this previous work, it was assumed that the orbital angles (in the action–angle formalism) are uniformly distributed. Thus, dynamical model parameters that correspond to orbital angles that suspiciously lack diversity are rejected. More specifically, the method considers a large range of possible dynamical model parameters, and computes orbital angles at every value of the parameters and a distribution statistic on those orbital angles. If the distribution statistic is designed to monotonically increase with the diversity of the angles or the flatness of the angle distribution, the "best-fit" dynamical model parameters are those that optimize the distribution statistic. This kind of approach is inherently frequentist: on many applications of these procedures to independent data sets a confidence interval captures the true dynamical parameters on a guaranteed fraction of trials. However, for any given data set the confidence interval produced might not represent a credible set of parameters.

In what follows we cast the dynamical parameters estimation as a probabilistic inference problem (a "Bayesian" approach). We adopt the same core assumption as the roulette problem: we assume that the orbital angles are uniformly distributed. However, in this framework, we must also construct prior probability distribution functions for the dynamical model parameters and conditional probabilities of the data given the model to encode the assumptions of a long-lived and bound solar system. These prior and conditional probability distributions and the data create posterior probability distribution functions for all the parameters. We use this new method to infer the gravitational force law (radial dependence and amplitude). What is new here, in the context of solar system dynamics, is that we perform this inference with only a snapshot of the kinematic state, that is, with only the positions and velocities of the planets at a single instant of time.

Of course the kinematic snapshot we employ is, in fact, a set of initial conditions for a solar system integration (Giorgini et al. 1996). These initial conditions were determined not by a single measurement at a single epoch, but are in fact the result of an optimization of a solar system integration to observed planetary positions over many decades. In the context of this paper—a demonstration of a method—it is best to think of these "data" as "simulated data" useful for testing the method. They just happen to be data that have been simulated by the analog computer we know as the solar system.

What is new here, in the context of dynamical inference, is that our method is fully probabilistic or Bayesian. This is important for future problems, such as the Gaia problem, or for inferring the mass of the black hole at the center of the Milky Way, because in these real data analysis problems, the data points come with non-trivial and highly correlated observational uncertainties, and because entire dimensions of phase space are missing or unobserved. At the Galactic center, we do not know the radial distance to anything accurately, and in the Gaia data set many of the radial velocities will not be measured. The Bayesian framework handles these real-world data issues very naturally, although in fact they are not important in the test problem we solve here. Aside from these issues of principle, it is also the case, as we will show, that the Bayesian method performs extremely well.

Of course, a lot is known about the gravitational force law in the solar system, so we do not expect, at the outset, to be surprised by our results. The first force-law inference in the solar system (Newton 1687, and also work by contemporaries, particularly Hooke, who may have priority) made use of full orbit shape determination (Kepler 1609). In this sense, Newton's problem—find the force law (from among a small, discrete group of possible force laws) that leads to elliptical orbits with the Sun at one focus—was much easier than the problem we have set for ourselves. Of course, along the way, Newton had to develop for the first time the general principles of kinematics and dynamics!

2. PARAMETERIZED FORCE LAW OR DYNAMICAL MODEL

We are going to assume spherical symmetry of the solar system's force law and gravitational potential, although nothing in the general inference formalism that follows will require this. Consider a radial force law (really acceleration law) of the form

Equation (1)

where A is an amplitude, r is the distance from the Sun, r0 is a distance scale (in this case we will use r0 = 1 AU so that A can be thought of as the acceleration at Earth's orbit), α parameterizes the radial dependence, and ${\hat{r}}$ is the radial direction. In this model, the list of free parameters is

Equation (2)

where we have taken the logarithm of A because in inference problems, dimensioned parameters are usually best handled in the log (Jeffreys 1939; Sivia & Skilling 2006).

The potential u (potential energy per unit planet mass), radial effective potential ueff, and binding energy per unit mass epsilon ≡ −E/m are

Equation (3)

Equation (4)

Equation (5)

where j2 is the square of the magnitude of the planet's angular momentum per unit mass (or L2/m2), and vr is the radial component of the velocity (the component of ${\vec{\bm v}}$ parallel to ${\vec{\bm x}}$). The perihelion and aphelion distances rperi and rap are both found by setting epsilon = −ueff. With these, we can define a radial asymmetry e as

Equation (6)

where we have called this "e" because in the Kepler–Newton α = 2 case it is the orbital eccentricity. One way of thinking of this radial asymmetry is that at any point in the space made up of the dynamical parameters $\boldsymbol{{\omega }}$ and the binding energy epsilon, the radial asymmetry e is a dimensionless description of the angular momentum magnitude.

Importantly for what follows, we can define a "radial angle" ϕr that increases linearly with time from perihelion passage through next perihelion passage. Any planet at radius r on an orbit with perihelion distance rperi and aphelion distance rap can be assigned this angle ϕr by

Equation (7)

where the first numerator is the time it takes to go from rperi to r outbound, the first denominator is the time it takes to go from rperi to rap outbound, the second numerator and denominator are the times inbound, and all time differences between two radii can be computed numerically for general values of α by integrating the inverse of the radial velocity between these radii. The first-order form of this integral has an integrable singularity at the perihelion and aphelion, which can be handled by an appropriate change of variables (e.g., Press et al. 2007). A planet observed at a set of random times spanning many orbits will be observed to have radial angles ϕr drawn from a flat distribution in the range 0 < ϕr < 2 π. This radial angle is one of the angles in the action–angle formulation of the system, which is integrable for the simple reason that it is spherically symmetric.

3. KINEMATIC DATA

In what follows, we are going to use and compare several methods for inferring the force-law parameters $\boldsymbol{{\omega }}$ (the amplitude ln A and radial exponent α of the spherical force law) from an instantaneous snapshot of the positions and velocities of the eight major planets. This snapshot was taken from JPL's HORIZONS System which provides highly accurate ephemerides for solar system objects4 (Giorgini et al. 1996). It is an extrapolation (at the time of writing) to 2009 April 1.0, approximately 400 years after the important publication of Kepler (1609). This kinematic snapshot is given in Table 1.

Table 1. Planet Ephemerides for 2009-Apr-01 00:00:00.0000 (CTa)

Planet x y z vx vy vz
  (AU) (AU) (AU) (AU yr−1) (AU yr−1) (AU yr−1)
Mercury 0.324190175 0.090955208 −0.022920510 −4.627851589 10.390063716 1.273504997
Venus −0.701534590 −0.168809218 0.037947785 1.725066954 −7.205747212 −0.198268558
Earth −0.982564148 −0.191145980 −0.000014724 1.126784520 −6.187988860 0.000330572
Mars 1.104185888 −0.826097003 −0.044595990 3.260215854 4.524583075 0.014760239
Jupiter 3.266443877 −3.888055863 −0.057015321 2.076140727 1.904040630 −0.054374153
Saturn −9.218802228 1.788299816 0.335737817 −0.496457364 −2.005021061 0.054667082
Uranus 19.930781147 −2.555241579 −0.267710968 0.172224285 1.357933443 0.002836325
Neptune 24.323085642 −17.606227355 −0.197974999 0.664855006 0.935497207 −0.034716967

Notes. The xyz-coordinate system is defined as follows: the xy-plane is given by the plane of the Earth's orbit at J2000.0, the x-axis is out along the ascending node of the instantaneous plane of Earth's orbit and Earth's mean equator at J2000.0, and the z-axis is perpendicular to the xy-plane in the directional (+ or −) sense of Earth's north pole at J2000.0. The origin of the coordinate system is given by the barycenter of the solar system. One year is defined as 365.25 days. aCT is a coordinate time used in connection with ephemerides. It differs from UTC by about 66 s (see http://ssd.jpl.nasa.gov/?horizons_doc#timesys).

Download table as:  ASCIITypeset image

Since this snapshot is obtained by integrating the positions and velocities of solar system bodies, the accuracy is limited by (1) the correctness of the dynamical model used, (2) the numerical integration of the equations of motion, and (3) the accuracy to which the initial conditions are known. It is generally believed that the dynamical model used is correct and complete, and that the numerical integration is sufficiently accurate. The main uncertainty in the ephemerides is then that due to the uncertainty in the initial conditions. The current set of initial conditions (DE405; Standish 1998) is a fit to a set of optical, radar, and VLBI observations as well as to a set of spacecraft range and Doppler points from various space missions. The uncertainties are the largest for the outer planets, since the data for these are almost entirely from optical observations (with the exception of Jupiter), and because Neptune has not been observed over a full orbit since the start of precise measurements. A comparison between the DE405 ephemerides and more recent observations shows that the positions of the inner planets are known to a fractional accuracy of approximately 10−8, while those of the outer planets are known to a fractional accuracy of 10−6–10−7 (Standish 2004). Uncertainties in the velocities are at the same fractional magnitude.

This kinematic snapshot is not, of course, a fair data set with which to perform the inference below, for the main reason that the "measured" kinematic state of the solar system is in fact the output of fitting observations with a dynamical model that assumes α = 2. For this reason, the data should be thought of as "idealized" or "simulated data" and the work must be considered a test of the method rather than a definitive inference.

4. BOUND, VIRIALIZED, AND LONG-LIVED

The virial theorem relates the time averages 〈T〉 and 〈U〉 of a test particle's kinetic and potential energies through

Equation (8)

where α is the exponent in the radial force law. Given that a planet's potential energy is a function of the dynamical parameters $\boldsymbol{{\omega }}$, while its kinetic energy is not, the virial relation for each planet becomes a one-dimensional locus in the $\boldsymbol{{\omega }}$ space. Using kinetic and potential energies computed from the observations as a proxy for the time-averaged energies, these loci are shown in Figures 1 and 2. The fact that all eight lines cross near a single point in the space is encouraging that the system is virialized (as we expect) and that the inference will work. That the eight lines so nearly intersect in a single point is because, in fact, many of the planets are on circular orbits; the probabilistic method developed in this paper does not assume this, but it does retain the precision that is possible because of this situation.

Figure 1.

Figure 1. Virial relation between the kinetic energy and the potential energy (Equation (8)) for each of the eight planets in the solar system. For the combinations of dynamical parameters in the light gray region in the lower right at least one planet becomes unbound. When the dynamical parameters are in the light gray region in the upper left at least one planet has rperi < R. The light gray regions overlap in the dark gray region. In the units used in this figure, the "true" value of A lies at ln Ar20G−1 = 0.

Standard image High-resolution image
Figure 2.

Figure 2. Zoomed in version of Figure 1.

Standard image High-resolution image

Also shown in Figures 1 and 2 is the region of parameter space in which one or more of the planets is unbound because T>U or, equivalently, epsilon < 0. In what follows, we will assign vanishing probability to regions of parameter space in which one or more planets is unbound. However, as we will see, this does not affect any of our conclusions. Also shown in Figure 1 is the region of parameter space in which one or more of the planets has rperi < R, where R is the radius of the Sun. This part of parameter space ought also to be excluded, although in practice this is not necessary for any of what follows.

In preparation for what follows, we pre-compute all of the planet radial angles ϕr,i—given their positions ${\vec{\bm x}}_i$ and velocities ${\vec{\bm v}}_i$—as a function of the dynamical parameters. These angles are shown in Figure 3.

Figure 3.

Figure 3. Computed radial angles ϕr for each of the eight planets as a function of the dynamical parameters. The gray triangular region in the bottom-right corner is the region excluded by the condition that all the planets are bound. Each planet has an angle range of 0 < ϕr < π if it has radial velocity vr>0 (outgoing from perihelion) or π < ϕr < 2 π if it has vr < 0 (incoming from aphelion).

Standard image High-resolution image

5. FREQUENTIST ORBITAL ROULETTE

In orbital roulette (Beloborodov & Levin 2004), the idea is to compute, for each point $\boldsymbol{{\omega }}$ in parameter space, the N radial angles ϕr,i and analyze them statistically for being well mixed. In practice, this means applying a distribution test or multiple distribution tests to the angles and preferring parameters for which these tests are more consistent with a random or flat distribution of angles in the range 0 < ϕr < 2 π. Because each test provides one constraint, and in the case described here the parameter space is two dimensional, at least two qualitatively different tests are required to make localized constraints in $\boldsymbol{{\omega }}$. In addition to a test for a flat angle distribution we also apply a test for an angle–energy correlation.

About the simplest consistency test for the calculated angles is a test of the mean of the angles: Is the mean consistent with the expected mean for a uniformly distributed set of N planets? For this to perform well one must fold the angles of the inbound planets onto the interval [0, π], that is, disregard the information in the sign of the radial velocity; then, the expected mean of the angles is equal to π/2 (for details on how to test this assumption, see Beloborodov & Levin 2004). The fact that we have to perform this mapping indicates that the procedure is ad hoc. Indeed, for a uniform distribution on the circle there is no specific meaning to the perihelion and the aphelion, or to any two points, such that no real meaning can be attached to the mean of the angles between two arbitrary points on the circle.

Better, one can test the consistency of the full distribution function of the angles with a uniform distribution. This could again be done with the full [0, 2π] distributed angles or with the folded angles; the results will depend on this choice. Testing an observed distribution for consistency with an expected distribution often involves comparing cumulative distribution functions (Kolmogorov 1941). The Kolmogorov–Smirnov (K–S) test is the simplest in practice, since the distribution of the test statistic—the maximum difference between the cumulative distributions—can be approximated by an analytic function (Stephens 1970). The K–S test is by construction most sensitive to deviations near the median value; this rules out dynamical parameters at which the planets bunch up at perihelion or at aphelion, the situation in which about half are at perihelion and half at aphelion can easily dupe the test.

A statistic can be chosen that is sensitive to deviations at all values, such as the Anderson–Darling statistic (Anderson & Darling 1952). However, no approximate analytic description of the distribution of this statistic exists and in practice this distribution has to be obtained by Monte Carlo sampling (e.g., Beloborodov & Levin 2004). A statistic more appropriate to the problem at hand (although we are not primarily interested in a careful examination of the differences between different frequentist procedures) is Kuiper's statistic (Kuiper 1962). This statistic—the sum of the maximum distance of the observed cumulative distribution above and below the expected cumulative distribution—is invariant under periodic shifts of the angle and was specifically designed to test uniform distributions on the circle. The advantage over Anderson–Darling is that the asymptotic distribution of the Kuiper statistic is known (e.g., Press et al. 2007).

All of these tests for the uniformity of the distribution of the angles are shown in Figure 4. These tests can fail when for a certain combination of dynamical parameters some planets are near aphelion while other planets are near perihelion. This situation appears here in Figure 3, at α far from 2, where there are large regions in which the inner planets, especially Mercury and Venus, are near perihelion, while the outer planets, especially Uranus and Neptune, are near aphelion and vice versa. This prevents the mean angle and K–S tests from excluding those regions. The Kuiper test performs better.

Figure 4.

Figure 4. Various frequentist tests applied to test the uniformity of the angle distribution and the absence of angle–energy correlations. From top to bottom, left to right: test of the mean of the angles; K–S test for the uniformity of the angle distribution; Kuiper test for the uniformity of the angles; Kendall τ test for the absence of angle–energy correlations; combined confidence intervals from the Kuiper test and the Kendall test; combined confidence intervals from the K–S test and the Kendall test. In the plots with a single statistic the darkest region corresponds to the 95% confidence region, the lighter region to the 99% confidence region. The same is true for the plots with combinations of statistics, except that a Bonferroni correction has been applied to the significances of the individual statistics that make up the combination. In each plot, the lightest region is excluded because at least one planet becomes unbound for those parameter values.

Standard image High-resolution image

A second constraint (for the two-dimensional parameter space) comes from a second test. In regions of parameter space in which the inner planets are all near perihelion and the outer planets are all near aphelion, a significant correlation between the angles and the energies exists. This correlation is unphysical if the system is not being observed at any special time. A non-parametric test for the correlation is preferred here as the angle–energy correlation will not in general be linear. We perform a test of the angle–energy correlation using Kendall's τ (Kendall 1938). This is a rank test; it only considers the relative ordering of the angles and energies of different planets (for details on this test see Press et al. 2007). That this test is in a sense orthogonal to the tests of the uniformity of the angle distribution can be seen in Figure 4.

All of the frequentist tests permit acceptance or rejection of a dynamical model at a certain confidence level. Confidence intervals of 95% and 99% for all of the frequentist tests are shown in Figure 4. Also shown is the combination of the tests of the uniformity of the angle distribution with the test of the angle–energy correlation.

6. PROBABILISTIC DYNAMICAL INFERENCE

The frequentist procedures perform reasonably well in this simple problem because the number of dynamical parameters is small and the data have vanishing errors and no missing components. As the number of dynamical parameters increases the frequentist must find larger numbers of tests in order to break degeneracies. While data uncertainties can be included by sampling the error distribution of the data and combining the results (e.g., Beloborodov & Levin 2004), this will perform badly in the limit of low signal-to-noise or missing data. These difficulties are related to the fact that these procedures use only a very crude model of the data, that is, that the angles are distributed uniformly and that angle–energy correlations should be absent, which allows no room for discovery of structure in phase space. A fully Bayesian treatment of this problem can treat the phase-space distribution function as an unknown function to be inferred from the data. Modeling the full phase-space distribution function permits simultaneous inference of missing data and properly marginalized probability distributions for dynamical parameters.

Imagine that we have the three-vector positions ${\vec{\bm x}}_i$ and three-vector velocities ${\vec{\bm v}}_i$ at some time t for N planets i and a parameterized model for the gravitational acceleration law (force law per unit mass) ${\vec{a}}_{{{\omega }}}({\vec{\bm x}})$, a function of position ${\vec{\bm x}}$ and a list of parameters $\boldsymbol{{\omega }}$. We wish to obtain an estimate of the posterior probability distribution $p(\boldsymbol{{\omega }}|\lbrace {\vec{\bm x}}_i,{\vec{\bm v}}_i\rbrace)$ for the parameters, where $\lbrace {\vec{\bm x}}_i,{\vec{\bm v}}_i\rbrace$ is the set of all planet positions and velocities. We employ Bayes's theorem as follows:

Equation (9)

where, as usual, $p(\lbrace {\vec{\bm x}}_i,{\vec{\bm v}}_i\rbrace |\boldsymbol{{\omega }})$ is the likelihood or the probability distribution function for the data given the model parameters, evaluated at the observed values of the data, $p(\boldsymbol{{\omega }})$ is the prior probability distribution function for the parameters, and the denominator is (for our purposes here) a normalization constant.

For our chosen parameterization of the dynamical parameters, $\boldsymbol{{\omega }}$, a broad flat or uniform prior in the space represents a reasonable description of our (assumed) prior knowledge. The much more challenging problem is to specify the likelihood of the dynamical parameters, the conditional probability distribution function $p(\lbrace {\vec{\bm x}}_i,{\vec{\bm v}}_i\rbrace |\boldsymbol{{\omega }})$. Without detailed knowledge of how the solar system formed, this probability distribution is also a representation of our prior beliefs. Ideally our ability to learn from the observed data will not be too sensitive to these beliefs.

It is easier to express beliefs about the angle ϕr, radial asymmetry e, and binding energy epsilon of each planet than its position and velocity. This is because, given our assumption that the planets constitute an angle-mixed population, Jeans's theorem (Jeans 1915; Binney & Tremaine 2008) tells us that the distribution function, which is proportional to the probability of observing a planet at a certain locus in phase space, is only a function of the integrals of the motion. We wish to assign zero probability to dynamical parameters that lead to any of the computed binding energies $\epsilon ({\vec{\bm x}}_i,{\vec{\bm v}}_i,\boldsymbol{{\omega }})$ being negative (because we defined epsilon>0 to be bound). We would also like to express our prior information or assumption that the system is long-lived (in units of the dynamical time), that it is non-resonant, and that we are not seeing the system at any special time, or that the radial angles ϕr will be randomly distributed between 0 and 2 π. In the absence of any better information, we will try to be as agnostic as possible about the actions (conserved quantities) of the planets but extremely confident that all radial angles 0 < ϕr < 2 π are equally likely.

In the simple spherical or radial situation under consideration here, in which there are no missing data, we can rewrite the likelihood as a function of the planets' radial coordinates, as the orientation of the orbit does not depend on the dynamical parameters:

Equation (10)

where, again, we have gone to ln epsilon because dimensioned parameters are usually best handled in the log, and J(ln epsilon, e, ϕr; r, vr, j2) is the Jacobian matrix of all the partial derivatives of (ln epsilon, e, ϕr) with respect to (r, vr, j2). For spherical potentials, this Jacobian is given by

Equation (11)

where the derivative is evaluated at the current location in phase space and Tr is the radial period, which depends on the dynamical parameters and the integrals of the motion. For general α this radial period can be computed numerically (see Section 2). The derivative of the radial asymmetry with respect to the specific angular momentum squared can be written in terms of the perihelion and aphelion distance as

Equation (12)

In the special case α = 2, this Jacobian is

Equation (13)

where we have dropped any terms that do not depend on the mass of the Sun.

As well as assuming that the angles are independently and uniformly distributed, we might further assume that the energy and radial asymmetry of each planet were drawn independently:

Equation (14)

However, we do not know a priori the distribution function from which radial asymmetries and binding energies were drawn. Fixing $p(e_i,\ln {\epsilon }_i|\boldsymbol{{\omega }})$ to a broad distribution would be making a strong assumption: regardless of what the data say we would continue to dogmatically believe that the distribution function was broad, which could result in poor inferences about the model as a whole.

We can assume that the planets' properties were drawn independently and identically distributed, without making strong assumptions about what that distribution was. This is achieved by introducing auxiliary parameters $\boldsymbol{{\theta }}= \lbrace \boldsymbol{{\theta }}_e, \boldsymbol{{\theta }}_{\epsilon }\rbrace$ that if known would specify the distribution function. Since we do not know these nuisance parameters, introducing a prior $p(\boldsymbol{{\theta }}|\boldsymbol{{\omega }})$ and marginalizing over them,

Equation (15)

is then part of the inference task. With all of this in place, we apply Equation (9) to get the posterior distribution over the dynamical parameters

Equation (16)

where each planet's value of (ln epsiloni, ei, ϕr,i) is a function of phase-space position $({\vec{\bm x}}_i,{\vec{\bm v}}_i)$ and dynamical parameters $\boldsymbol{{\omega }}$, and the Jacobian is evaluated at each planet's value of (ln epsiloni, ei, ϕr,i).

For situations in which we have missing data or noisy observations, further unknown quantities will be added to the model and marginalized over. As the model becomes more complicated it is more likely that Markov chain Monte Carlo (e.g., Neal 1993) or other approximate computational methods will be required.

6.1. Basic Method

To keep things as simple as possible, at first we model the distribution function $p(\ln {\epsilon }, e|\boldsymbol{{\omega }},\boldsymbol{{\theta }})$ as a product of a top-hat function in ln epsilon from ln epsilona to ln epsilonb with a top-hat function in e from ea to eb. In this context, the phase-space parameter list is

Equation (17)

In what follows, we only consider values A < ln epsilona < ln epsilonb < B, where A and B provide very distant (uninformative) limits (below, we will take the limit), and 0 ⩽ eaeb ⩽ 1. These enforce our assumption or prior information that the system is bound.

For our initial model we also set the prior $p(\boldsymbol{{\theta }},\boldsymbol{{\omega }})$ flat or uniform in all the parameters. Now the marginalization required to compute the posterior (Equation (16)), an integral over all four of the phase-space distribution parameters in $\boldsymbol{{\theta }}$, can be performed analytically; it leaves

Equation (18)

where ln epsilonL is the lowest planetary binding energy at this point $\boldsymbol{{\omega }}$ in dynamical parameter space, ln epsilonK is the highest, eL is the lowest planetary radial asymmetry at this point $\boldsymbol{{\omega }}$, eM is the highest, and we have taken the limit in which the range of the parameters (ln epsilona, ln epsilonb) goes to infinity, or A → −, B. This posterior probability distribution for $\boldsymbol{{\omega }}$ represents our posterior beliefs marginalized over all possible values of {ln epsilona, ln epsilonb, ea, eb} of the top-hat model.

6.2. Alternative Methods

The results of any inference depend both on the data and on modeling assumptions. Thus, it is always sensible to question any assumptions and investigate how sensitive the results are to them.

The marginalization in our basic method is tractable at the expense of believability: it seems unlikely that the true distribution function over the planet properties is uniform in the chosen parameterization. If the true distribution function is far from this assumption, the inference performed here could be in trouble because the zero prior probability assigned to all other reasonable forms of the distribution function makes it impossible for the model to learn this. This family of distribution functions also has no special status: a distribution which is uniform in one parameterization will not be uniform in another. Instead of modeling the distribution in (ln epsilon, e, ϕr), we could have chosen to model the distribution function in terms of (ln epsilon, e2, ϕr). This set of parameters is, in a sense, more natural as it is closer to action–angle coordinates, and, thus, the Jacobian |J(ln epsilon, e2, ϕr; r, vr, j2)| is more close to constant. Indeed, from Equation (13) it is clear that, in the case of α = 2, this Jacobian does not depend on the eccentricity of the orbit any longer and Figure 8 shows that this Jacobian is close to constant for all values of ln A and α. However, it is not obvious a priori whether using this parameterization will give better results. Rather than guessing, we can assume that the distribution over radial asymmetries is uniform in an unknown parameterization, and infer from the data what this parameterization should be. We tried assuming that the radial asymmetry was uniform over some range of e, e2, or $\sqrt{e}$, with equal prior probability over each choice. This method corresponds to extending $\boldsymbol{{\theta }}$ to parameterize a richer class of distributions. As we did with nuisance parameters before, we can marginalize over this choice.

In general, we could marginalize over a very flexible class of distributions for both the binding energy and radial asymmetry. A generic way of representing distributions is with a histogram, where any distribution can be represented within an arbitrary tolerance for sufficiently narrow bins. Some care is required in putting a prior $p(\boldsymbol{{\theta }})$ over the heights of the bins. It is tempting to use a Dirichlet distribution, which allows analytical marginalization. However, this assumes that the heights of the bins are unrelated except through their overall normalization. An arbitrarily spiky distribution function is not only unphysical, but we would be unable to infer its shape from point observations.

We attempted to construct a sensible prior over distributions using histograms with many bins by coupling the heights of the bins so that the densities appear smooth. Since the focus on this paper is on the principles of dynamical inference and not on the specific result on the solar system's force law, we only provide a brief description of how we implemented this approach. We refer the interested reader to the references provided in the following for more details on the specifics. A multivariate Gaussian prior can be put over the log bin heights, approximating a logistic Gaussian process (Leonard 1978). We used 100 equal-width bins coupled with a Gaussian covariance function (Rasmussen & Williams 2006). We put uniform priors over the start and end points of the histogram and over physically reasonable ranges of the covariance function's parameters. The posterior over the dynamical parameters was estimated by Markov chain Monte Carlo simulation of all the unknowns; Gibbs sampling was used to update the dynamical parameters over a grid of values for which we had precomputed the Jacobian term. Slice sampling (Neal 2003) was used to update all nuisance parameters, except the bin heights which were updated with a Metropolis–Hastings method (Neal 1999, Equation (15)).

7. RESULTS AND DISCUSSION

In our basic method we evaluate the Jacobian |J(ln epsilon, e, ϕr; ri, vr,i, j2i)| for each planet at the observed radius, radial velocity, and specific angular momentum, for each value of the dynamical parameters $\boldsymbol{{\omega }}$, and multiply the magnitudes of the determinants of these Jacobians together. Following Equation (18), we multiply this product with the value of the marginalized product of the distribution function and finally multiply this with the (uniform) prior on the dynamical parameters in order to obtain the posterior probability distribution for the dynamical parameters.

It is instructive to look at the magnitude of the determinant of the Jacobian for each planet as a function of the dynamical parameters in Figure 5. These factors seem to show a clear preference for the dynamical parameters of each planet to lie on its virial locus. Why is this? For nearly circular orbits the last two factors in Equation (12) are equal and are unbounded for perfectly circular orbits; this results in a natural preference for nearly circular orbits for all of the planets, i.e., for each planet to lie somewhere along its virial locus. At the same time the measured non-zero radial velocity limits the extent to which a planet's orbit can be circularized by a suitable choice of the dynamical parameters and, thus, how close to the circular singularity an individual planet can be brought. This naturally leads to a weighting scheme in which planets that are closer to circular orbits (as measured by the observed angle between the planet's velocity and the tangent to the circular orbit going through its present location) receive a higher weight in the analysis than planets on less circular orbits. This weighting scheme is good in that planets on more circular orbits are indeed more informative about the potential than planets on less circular orbits. Therefore, it may seem that the decision to model the radial asymmetries as uniform in e was fortuitous, but in fact, as we will discuss below in Section 7.1, the results obtained using the basic method are robust in that more flexible models learn that the top hat in e leads to a good interpretation of the data.

Figure 5.

Figure 5. Density plots of the absolute value of the determinant |J(ln epsilon, e, ϕr; r, vr, j2)| of the Jacobian of the transformation from the energy epsilon, radial asymmetry e, and radial angle ϕr coordinates to the relevant positional and kinematical observables, evaluated at the observed positions and velocities of the planets, as a function of the dynamical parameters. Gray scales are linear with darker shades for larger values.

Standard image High-resolution image

The posterior probability distribution is shown in Figure 6. A strong peak is apparent near α = 2 and the Newtonian solar value for ln A. The width of the probability distribution is indicated by the 95% and 99% posterior contours. These contours are defined to enclose the smallest area that holds 95% and 99%, respectively, of the posterior probability distribution.

Figure 6.

Figure 6. Posterior probability distribution $p(\boldsymbol{{\omega }}|\lbrace {\vec{\bm x}}_i,{\vec{\bm v}}_i\rbrace)$ for the dynamical parameters on a linear scale. Contours are 95% and 99% posterior regions.

Standard image High-resolution image

In order to infer the exponent α of the force power law we perform a second marginalization of the posterior probability distribution, this time over the (for our purposes) uninteresting parameter ln A. This gives a posterior probability distribution for the parameter α,

Equation (19)

This probability distribution is shown in Figure 7. The 95% and 99% posterior intervals in this figure are defined to exclude 2.5% and 0.5%, respectively, of the distribution on either side of the central region.

Figure 7.

Figure 7. Marginalized posterior probability distribution for the parameter α with 95% and 99% posterior intervals.

Standard image High-resolution image

The result of the inference is not a value for the parameters but a posterior probability distribution. A "best-fit" value for α could be obtained according to various criteria. For example, the posterior mean minimizes the expected square difference from the true parameter. However, since the posterior distribution is multi-modal, a point estimate does not capture our uncertainty. The posterior is better summarized by a credible interval, containing a given fraction of the probability mass. The 95% posterior interval is 1.989 < α < 2.052. This compares favorably with the results obtained by Newton (1687) who inferred α = 2 from a much richer data set (though a less rich model set). Modern tests constrain the value of the exponent α to a fractional accuracy of 10−8–10−9 on solar system scales (Adelberger et al. 2003; Fischbach & Talmadge 1999) using lunar laser ranging tests (Williams et al. 2004) and Keplerian tests comparing G(r)M and the rate of precession for different planets (Talmadge et al. 1988).

Our method appears to work well: the true dynamical parameters A and α are plausible under a fairly tight posterior distribution, see Figure 6. The confidence intervals from the frequentist tests, Figure 4, are somewhat broader than the posterior intervals, but are not directly comparable. Frequentist confidence intervals do not represent beliefs about the parameters for this data set, simply a region that satisfies some sampling properties. As a result it is unsurprising that the frequentist confidence intervals appear less good if interpreted as beliefs given the data. It is difficult to use the frequentist confidence intervals to make statements about α alone: without the assumptions of the Bayesian method we cannot marginalize over the parameter A. One approach that allows frequentist guarantees is to set A adversarially, but this pessimistic view would give unreasonably large error bars for α.

That the posterior distribution $p(\alpha |\lbrace {\vec{\bm x}}_i,{\vec{\bm v}}_i\rbrace)$ turns out to be multi-modal is no surprise in the light of the virial considerations from Figures 1 and 2. The two peaks in $p(\alpha |\lbrace {\vec{\bm x}}_i,{\vec{\bm v}} _i\rbrace)$ correspond to the two main regions in parameter space in which the different virial loci cross. These virial considerations, and also the frequentist techniques, are very similar to the probabilistic approach in that they all prefer each planet to be in a non-special region of radial angle space, that is, between perihelion and aphelion; this can only happen simultaneously near the points in parameter space at which virial loci cross. The advantage of the probabilistic approach is that it explains and quantifies this reasoning, and uses it to set formal limits on the dynamical parameters.

7.1. Sensitivity Analysis

The strongly peaked Jacobian factors shown in Figure 5 depend strongly on the choice of parameterization. If the model was expressed using the radial asymmetries squared rather than the radial asymmetries, the Jacobians, some of which are shown in Figure 8, are much flatter. If the model is maintained by also transforming the prior over radial asymmetry distributions to the new parameterization, the posterior distribution over dynamical parameters will be unchanged. Setting all the radial asymmetries close to 1 is penalized elsewhere in the mathematical expression.

Figure 8.

Figure 8. Density plots of the absolute value of the determinant |J(ln epsilon, e2, ϕr; r, vr, j2)| of the Jacobian of the transformation from the energy epsilon, radial asymmetry squared e2, and radial angle ϕr coordinates to the relevant positional and kinematical observables, evaluated at the observed positions and velocities of the planets, as a function of the dynamical parameters. Gray scales are linear with darker shades for larger values. Only the Jacobians for Mercury and Venus are shown here; the corresponding Jacobians for the other planets are very similar to these.

Standard image High-resolution image

However, setting the prior over radial asymmetry distributions to consider only uniform distributions was a strong assumption. If we had only considered distributions uniform in the radial asymmetry squared, this combined with the flat Jacobians in Figure 8 would not have given as tight a posterior distribution. Fortunately we can infer a suitable parameterization, or equivalently the distribution function from a richer family of possibilities, from the data.

Figure 9 shows the posterior over α from using the two more flexible priors outlined in Section 6.2. Assuming that the radial asymmetry is uniform in one of $\sqrt{e}$, e, or e2 gave a 95% credible interval of 1.990 < α < 2.035. A variety of different flexible priors for the radial asymmetry distribution gave very similar results. Allowing the ln A constant in the force law to come from a flexible family of distributions changed the posterior very little. The 95% credible interval from the flexible non-parametric prior on both distributions gave a 95% credible interval of 1.991 < α < 2.040. The fine details of the posterior distribution, specifically the relative mass in the two modes, are sensitive to prior assumptions. This is unsurprising with only eight data points. However, the position of the bulk of the posterior mass, which gives the confidence interval, is surprisingly robust across a wide range of modeling choices.

Figure 9.

Figure 9. Alternative posterior probability distributions for the parameter α with 95% and 99% posterior intervals. Top: distribution over radial asymmetry is assumed to be uniform in one of $\sqrt{e}$, e, or e2. Bottom: results from a non-parametric prior over the distributions of both e and ln A.

Standard image High-resolution image

As a warning, it is possible to obtain poor results through bad modeling choices. The reasonable posterior from our basic method, Figure 7, was to some extent due to chance. The family of possible radial asymmetry distributions is inflexible as it forces this distribution to be flat over some range in radial asymmetry. Using the basic method it is not possible to learn the true distribution, which is not flat in radial asymmetry, from data, because the truth is assigned zero prior probability. We were fortunate to choose a parameterization in which the inflexible family was not too unreasonable. The non-parametric approach is much more flexible in that its weak smoothness constraints assign non-zero prior probabilities to all reasonable distribution functions such that the true one could be picked out with enough data. However, care must also be taken with non-parametric modeling. An unsmooth Dirichlet prior over bin heights in a non-parametric prior over distributions gave a broad and biased distribution over α.

7.2. Future Work

The necessity of a proper probabilistic approach as laid out in this paper has been recognized before in the context of inferring the mass of the Galaxy from a small set of highly informative tracers: distant satellite galaxies (Little & Tremaine 1987) and high velocity stars (Leonard & Tremaine 1990). The limitations of those works are that they choose restrictive forms of the distribution function in which the distribution of angular momentum is fixed as either radial or isotropic, or given as an inflexible parametric function in between these two extrema (Kochanek 1996). The precision of their results for the fundamental parameters of the Galaxy is limited by the systematic uncertainty arising from this assumption. In the case of estimating the local escape velocity from a sample of high velocity stars, a power-law model for the energy part of the distribution function was used; some progress has been made marginalizing over the exponent, a nuisance parameter, with a prior coming from cosmological simulations (Smith et al. 2007). In the future, using the technique introduced in this paper there is no need to make strong assumptions about the degree of anisotropy of the system at hand or the form of the distribution function beyond the assumption of complete angle mixing. The results of this paper show that by introducing flexible models for the distribution function that can learn from the data, robust constraints on the parameters of a dynamical system can be obtained.

The approach developed here can be applied to other (perhaps more pressing) dynamical inference problems in which test particles can be relied upon to be well-mixed in angle space. One such problem is the dynamics in the region surrounding the black hole at the Galactic center. Often in these problems complications arise because of large observational uncertainties (often highly correlated), the absence of some of the six-dimensional phase-space coordinates, and selection effects, all of which are absent in the simple problem considered here. It will be necessary for these problems to model the full six-dimensional phase-space distribution function. This will complicate the marginalization over the phase-space parameters, which was trivial here in the basic method, but at the same time it will permit the discovery of structure in the phase-space distribution, which can aid with the presence of missing data and large observational uncertainties. That this approach has much potential has been shown before in the case of the Galactic center; the assumption that a set of stars at the Galactic center is part of a disk-like population has been successful in reconstructing missing data (Beloborodov et al. 2006). Extension of the approach developed in this paper will permit incorporation of much more of the available data on the dynamics in the central region of the Galaxy. This, in turn, will lead to a better determination of the mass of the black hole and the surrounding density profile.

On larger scales, approaches like that developed in this paper will prove to be essential for the analysis of the large data sets of upcoming surveys such as Gaia. As the duration of the Gaia mission is vanishingly smaller than any dynamical timescale, the problem posed is essentially the same as the problem posed here: infer the dynamics from a snapshot of the kinematics. Our approach cannot simply be applied to this larger problem because the system is almost certainly not (trivially) integrable, and the assumption of mixed angles and lack of resonances is invalidated by the clear abundance of substructure in the halo (e.g., Willman et al. 2005; Belokurov et al. 2006, 2007; Koposov et al. 2008) and the disk (Dehnen 1998; Bovy, Hogg, & Roweis 2009); indeed, Jeans himself, in the paper in which he first wrote down his eponymous theorem (Jeans 1915), argued against assuming a steady state for the Galaxy. Modeling the details of the phase-space distribution function will be even more important in this context. However, we expect that a large fraction of the Galaxy, even a large fraction of the stellar halo, is well-mixed, such that mixed-angle approaches are expected to lead to valuable inferences. Combining the mixed-angle and unmixed-substructure techniques into a general inference about the dynamics of the Galaxy requires a fully probabilistic method and the approach developed in this paper is a baby step toward this ambitious goal.

First of all we thank Scott Tremaine for many inspiring discussions that helped shape this paper. It is also a pleasure to thank Kyle Cranmer, Dustin Lang, Yuri Levin, Phil Marshall, Bill Press, and Hans-Walter Rix for very helpful discussions and the anonymous referee for valuable comments. This research was partially supported by NASA (ADP grant NNX08AJ48G), the NSF (grant AST-0908357), and a Research Fellowship of the Alexander von Humboldt Foundation. This project made use of the HORIZONS System provided by the Solar System Dynamics Group of the Jet Propulsion Laboratory, the NASA Astrophysics Data System, and the open-source Python modules scipy, numpy, and matplotlib.

Footnotes

Please wait… references are loading.
10.1088/0004-637X/711/2/1157