Paper

A Gaia DR2 Mock Stellar Catalog

, , , , , and

Published 2018 May 21 © 2018. The Astronomical Society of the Pacific. All rights reserved.
, , Citation Jan Rybizki et al 2018 PASP 130 074101 DOI 10.1088/1538-3873/aabd70

1538-3873/130/989/074101

Abstract

We present a mock catalog of Milky Way stars, matching in volume and depth the content of the Gaia data release 2 (GDR2). We generated our catalog using Galaxia, a tool to sample stars from a Besançon Galactic model, together with a realistic 3D dust extinction map. The catalog mimics the complete GDR2 data model and contains most of the entries in the Gaia source catalog: five-parameter astrometry, three-band photometry, radial velocities, stellar parameters, and associated scaled nominal uncertainty estimates. In addition, we supplemented the catalog with extinctions and photometry for non-Gaia bands. This catalog can be used to prepare GDR2 queries in a realistic runtime environment, and it can serve as a Galactic model against which to compare the actual GDR2 data in the space of observables. The catalog is hosted through the virtual observatory GAVO's Heidelberg data center (http://dc.g-vo.org/tableinfo/gdr2mock.main) service, and thus can be queried using ADQL as for GDR2 data.

Export citation and abstract BibTeX RIS

1. Introduction

Gaia (Gaia Collaboration et al. 2016) is an ongoing ESA astrometric space mission about to deliver positions, parallaxes, proper motions, and three photometric bands for a set of ∼1.4 billion sources across the whole sky with its second data release (GDR2; Lindegren et al. 2018). This data set will also provide effective temperatures, luminosities, extinction estimates and radial velocity measurements for a substantial subset of those plus some other data products. This vast amount of data will be a practical challenge to explore and should usher the community into a new regime in Galactic stellar astronomy, where well-designed ADQL3 queries become a common tool to obtain manageable data sets from hosting services like the Virtual Observatory (VO; Demleitner 2014).

To help prepare the scientific community for this phase change, we present in this paper a mock catalog that contains the prospective GDR2 stellar content. A first mock data set of Gaia data has long been available, the so-called Gaia Universe Model (GUMS; Robin et al. 2012). However, the primary goal of that catalog was to provide simulations to the data processing consortium (DPAC). Hence, its design does not offer the same capabilities as our GDR2mock catalog. In addition to an improved 3D extinction map which results in a slightly larger starcount (i.e., ∼1.1 billion stars compared to ∼1.0 bn in GUMS for stars brighter than G = 20 and a total starcount of ∼1.6 bn for our complete catalog down to G = 20.7), the main difference by construction is that this catalog fully mimicks the GDR2 format. This enables GDR2 users to test their ADQL queries and helps with their science analysis (e.g., selection function).

Our catalog is accessible online, most easily via topcat exploiting the VO table access protocol (TAP) service from GAVO4 where the catalog is referenced under gdr2mock.main.

2. Catalog Generation

Our catalog is based on a chemo-dynamical model Milky Way, Galaxia (Sharma et al. 2011), which we associated with a 3D dust extinction model before generating photometric observables. The following subsections outline the steps in this mock data set generation.5

2.1. The Galaxia Model

Galaxia is a tool that allows one to sample stars from the Besançon Galactic model (Robin et al. 2003), using a specific set of stellar isochrones to obtain their astrophysical parameters. The Galactic warp was switched on during the simulations and the solar zero-point was set to (X, Y, Z) = (−8.0, 0.0, 0.015) kpc and the velocities to $(U,V,W)=(11.1,239.08,7.25)\mathrm{km}{{\rm{s}}}^{-1}$. Transformations from phase-space to observable coordinates on the sky (ra, dec, pm_ra_cosdec, pm_dec and radial_velocity) were done using astropy6 (The Astropy Collaboration et al. 2018). And we used the latest PARSEC isochrones7PARSEC v1.2S+ COLIBRI PR16 (Bressan et al. 2012; Marigo et al. 2017; Rosenfield et al. 2016; Marigo et al. 2013)—which also provide photometric values for each star using the nominal Gaia DR1 photometric bands G, BP, and RP (Jordi et al. 2010). GDR2 passbands where not available during the construction of this catalog.

At this stage, we were already able to account for the magnitude limit of Gaia and only selected stars with apparent magnitude brighter than G = 20.7 mag, which preliminarily resulted in over six billion sources.

2.2. Dust-attenuated Photometry

A crucial step in transforming a Galaxia simulation into a catalog resembling actual observations is the application of a dust distribution, which will change the apparent colors and luminosities of the stars.

Because the Gaia photometric bands span a broad wavelength range (∼300 nm), the simple conversion of extinction coefficients from e.g., Schlafly & Finkbeiner (2011, Table 6) to reddening and extinction into the Gaia bands, e.g., AG, is only a poor approximation and may lead to significant inconsistency across the broad range of stellar spectra. Instead we must account for non-linearities in particular with respect to the stars' colors. Fortunately, the PARSEC isochrones also provide dust-attenuated photometry in various photometric systems, including the Gaia passbands (DR1, nominal passbands).

To include a realistic dust distribution on the Galaxia model, we used the combined 3D extinction map from Bovy et al. (2016), through its python package mwdust8 , which is capable of returning line-of-sight extinctions when provided with sky coordinates and distances. This 3D dust map combines the results of Marshall et al. (2006); Green et al. (2015), and Drimmel et al. (2003) and it provides E(B-V)SFD values on the scale defined in Schlegel et al. (1998).9 As discussed in Schlafly & Finkbeiner (2011), the E(B-V)SFD scale overestimates the extinction by 14% with respect to their own findings. Hence we corrected for this overestimation and adopted the prescription associated with the PARSEC isochrones of Cardelli et al. (1989); O'Donnell (1994) with R0 = 3.1 to derive the monochromatic extinction (in mag) at wavelength λ = 547.7 nm as

Equation (1)

Matching each star from Galaxia to an isochrone and a proper amount of extinction is a challenging task for 6 billion stars. Instead, we approximated each star to its closest match from a precomputed collection of dust-attenuated stellar isochrones. The grid spans A0 values ranging from 0 to 15 mag with in steps of 0.025 mag (for stars with even higher extinction we linearly extrapolated the extinction values) and [Fe/H] values from −2 to 0.5 dex in steps of 0.25 dex. We further bin in $\mathrm{log}({{\rm{T}}}_{\mathrm{eff}})$ in 0.02 dex steps and $\mathrm{log}(\mathrm{lum})$ in 0.2 dex steps on a star-by-star basis. Each star in our catalog is associated with an index_parsec number that records this matching step and maps each star onto the grid of isochrones and thus allows us to query photometric measurements in other bands from the supplementary parsec photometry and extinction table. Figure 1 shows the resulting color–magnitude and absolute magnitude diagrams of the resulting final data set (applying Gaia selection after accounting for the dust attenuation).

The following ADQL query provides the data to plot the left panel of Figure 1:

SELECT count() AS N,
ROUND(phot_bp_mean_mag---phot_rp_mean_mag, 2) AS color,
ROUND(phot_g_mean_mag + 5log10(parallax/100), 1) AS mag
FROM gdr2mock.main
GROUP BY color, mag

Download table as:  ASCIITypeset image

As the latest PARSEC models (v1.2S + COLIBRI) did not provide dust-attenuated photometry when this catalog was drawn up, we had to match the previous version, PARSEC1.2S (Chen et al. 2014; Tang et al. 2014; Chen et al. 2015) to Galaxia, based on PARSEC v1.2S+ isochrones. This inconsistency affects only a limited range of evolution phases that were deeply revised between the two sets of isochrones (e.g., O stars, TP-AGB).

2.3. Additional Non-Gaia Photometry

Our catalog provides apparent magnitudes in the nominal DR1 passbands10 G, BP, and RP. In addition, we provide an additional table, which can be used to obtain photometry for UBVRIJHK (Bessell & Brett 1988; Bessell 1990; Maíz Apellániz 2006), SDSS (Fukugita et al. 1996), 2MASS (Cohen et al. 2003), and WISE11 (Wright et al. 2010) to a precision of ≈0.1 mag. This uncertainty mainly arises from the finite resolution of the isochrone grid we used, which corresponds to 0.2 dex spacing in log-luminosity. With actual GDR2 data, those would be obtained with catalog cross-matching, which of course is not possible with a mock catalog and its random realization of the actual star positions.

The following query illustrates how to obtain complementary photometry (e.g., 2MASS) to the main GDR2mock catalog:

SELECT COUNT() AS N, mag_2mass_j AS mag, mag_2mass_j---mag_2mass_ks AS color
FROM gdr2mock.main AS main
JOIN gdr2mock.photometry AS phot
USING (index_parsec)
WHERE main.random_index <=1606747
GROUP BY color, mag

Download table as:  ASCIITypeset image

Note that this query also subsamples the catalog to 0.1% using the random_index and that the queried photometry is in absolute magnitudes.

2.4. Uncertainty Model

All values provided in the mock catalog are noise-free. As a result, there are no negative parallaxes and the parallaxes can be directly inverted to give exact model distances. To obtain noisy mock observations, one should sample any quantity, say the parallax measurement, from a Gaussian with the true parallax as mean and the parallax_error as the standard deviation. To enable this we provide in the catalog astrometric and photometric-uncertainty estimates based on the nominal uncertainty model12 (de Bruijne 2005) scaled to the duration of the data segment in GDR2 (which is about 668 days or 37% of the 5 year nominal mission duration). This nominal model depends also on the ecliptic latitude, β (which enters via an averaged version of the scanning law). We assume an uncertainty scaling relation of $\tfrac{1}{\sqrt{n}}$ with the number of observations, n, for parallaxes, positions, proper motions and magnitudes, neglecting the noise floors and slightly different scaling for the proper motions based on official communication.

More specifically, we use an approximation of the Gaia scanning law (scaled to the 22 month data segment) that gives us the number of observations, n, as a function of ecliptic latitude in 20 bins.13 To calculate the parallax uncertainty we use the nominal end-of-mission (eom) parallax uncertainty, ${\sigma }_{\varpi ,\mathrm{eom}}(G,V-I)$, multiply it by the ecliptic latitude dependent uncertainty factor ${{\rm{x}}}_{\varpi }(| \sin (\beta )| )$ (https://www.cosmos.esa.int/web/gaia/table-6 which includes the nominal number of observations) and rescale with the shortened 37% baseline:

Equation (2)

We do the same with the positions and proper motions, which are also related to ${\sigma }_{\varpi ,\mathrm{eom}}$, but have their own ecliptic latitude dependent uncertainty factors provided by the abovementioned online Table 6.

For the nominal single-transit (st) photometric uncertainty ${\sigma }_{{\rm{G}},\mathrm{st}}(G)$ and ${\sigma }_{\mathrm{BP},\mathrm{RP},\mathrm{st}}(G,V-I)$ we simply scale with 1 over the square root of number of observations,

Equation (3)

where X denotes the respective photometric band, i.e., BP, RP, or G.

We do not provide uncertainty estimates for the radial velocity, but the interested reader is referred to Gaia Collaboration et al. (2018).

2.5. Astrophysical Parameters

A complete simulation of the Milky Way, such as Galaxia, offers not only exact phase-space information of the stars and prediction of their photometric properties, but also of their underlying physical parameters: ages, masses, metallicities, gravities, luminosities, and effective temperatures, etc. These underlying stellar parameters should prove useful in tuning cuts in observables (e.g., color, magnitude and parallax) to optimize for a specific target stellar population (e.g., OB stars, stars with high extinction, old metal-rich stars etc.), and we include them in this mock catalog. Note that GDR2 will provide observational quantities for some of these stellar parameters, which were derived for sources with G ≤ 17 mag from the Gaia photometry and parallax measurements (Andrae et al. 2018), namely, effective temperature for some 161 million sources, line-of-sight extinction and the reddening, for 88 million sources, and luminosity and radius for 77 million sources.

3. Catalog Content, Access, and Limitations

3.1. Data Model and Catalog Content

Our catalog contains a total number of stars of 1 606 747 035, when matching the approximate flux limits of Gaia. The actual data model of our catalog can be inspected here: http://dc.g-vo.org/tableinfo/gdr2mock.main, mimicking by design the GDR2 data model14 : fields and associated names as well as their units. Note, however, that not all columns that appear in DR2 are filled in our catalog and that we provide a few additional ones. Specifically,

  • Nobs is added, reflecting the nominal ecliptic latitude dependent number of visits for GDR2.
  • Age, mass, feh, logg and a0 are added, while luminosity, effective temperature, AG, E(BP-RP) and radius are filled into their respective Apsis (Bailer-Jones et al. 2013) fields: teff_val, a_g_val, e_bp_min_rp_val, lum_val and radius_val. Beware that in DR2 these are only provided for a subset of stars with G ≤ 17 mag (cf. Andrae et al. 2018), whereas in our mock catalog we provide entries for all sources.
  • Index_parsec is an index for joining the main mock catalog to other photometric bands/extinctions in the gdr2mock.photometry table.

Similarly to GDR2, we also provide

  • Random_index is an integer ranging from 0 to 1 606 747 034, the total number of stars in the mock catalog minus one. This index is useful to create random subsamples representative of the entire catalog.
  • Source_id follows the Gaia referencing scheme. It is primarily the healpix15 number using NSIDE = 4096 with the nested scheme in equatorial coordinates multiplied by 235. The remaining digits of source_id are reserved for a running number that serves as a unique identifier per healpix cell. Unlike Gaia no bits are reserved for Data Processing Center identification. Still the source_id can be easily turned into healpix number for any arbitrary healpix level smaller than 12 (level 12 corresponding to Nside = 4096) via division:
    Equation (4)

3.2. Catalog Access

The table is available through GAVO's TAP service16 and is registered in the VO registry as ivo://org.gavo.dc/gdr2mock/q/main. The full catalog will be hosted by GAVO for at least six month and potentially until GDR3. In the long term there will be a subsample hosted by GAVO which will be cut using the first 10% stars according to the random_index. However, a bulk download of the complete catalog (without time limitations) is available as FITS binary tables from the reference URL.17

The GDR2mock main table is instantiated using a view (resembling the GDR2 data model) of the actual FITS files. This is why the indexed columns are not marked as such in the gdr2mock.main table but instead in the gdr2mock.generated_data table. Indexed and therefore fast to query columns are: ra, dec, l, b, pmra, pmdec, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag, source_id and random_index.

It is also planned to host the complete catalog on the Gaia archive (https://gea.esac.esa.int/archive/).

3.3. Limitations

This mock catalog has obvious scientific limitations that stem both from the underlying Milky Way model and from our generation of mock observables.

Galaxia is simulating neither stellar binaries nor stellar remnants, which will appear in the Gaia data. The phase-space distributions of the stars are assumed smooth and therefore does not generate phase-space or configuration-space clustering. The model does not account for extragalactic systems, including LMC, SMC, M31 and M33 which are prominently visible in the GDR1 panel of Figure 2 and not in our mock catalog.

Figure 1.

Figure 1. Color-absolute magnitude (left) and color-apparent magnitude (right) diagrams both including extinction in the Gaia photometry system of the 1.6 billion stars in our mock catalog. For every star down to G = 20.7 mag in the Galaxia model, we calculated its associated dust-attenuated photometry (see Section 2.2). The color for each panel, which represents the stellar density, scales logarithmically. Units are Vega magnitudes with parallax ϖ in mas.

Standard image High-resolution image

To produce extinction estimates, we approximated each star from Galaxia by its nearest model in astrophysical space of a grid of isochrones (see Section 2.2). In addition, observational artifacts were not simulated in our catalog, which can affect the photometry and magnitude limits of stars close to bright sources in the real GDR2 catalog. In particular, we did not attempt to simulate the scanning law and varying magnitude completeness due to crowding issues.

Figure 2.

Figure 2. Stellar source density map of GDR1 (top) and our mock catalog (bottom) in Galactic coordinates using Aitoff projection. The Galactic center is in the middle and Galactic longitude increasing toward the left. The color represents the density of star counts down to G = 20.7 mag in each healpix (NSIDE = 128, 1 healpix ≈ 0.21 deg2) and saturates at both ends to enhance Galactic structures.

Standard image High-resolution image

Finally, this model aims to reproduce the statistical properties of the Milky Way, not its actual properties at the star-by-star level. Hence, cross-matching of our catalog with any other catalog would be moot.

4. More Example Queries

This catalog offers means to prepare and test ADQL queries for the prospective of GDR2 science cases in a similar runtime environment to the real GDR2 data.18 Because of the sheer number of sources, a sequential scan (i.e., processing all rows, bypassing indices) will take about an hour wall clock time. This is true for the query on the GAVO service that yielded the data displayed in Figure 2:

SELECT count() AS N, ivo_healpix_index(7, ra, dec) AS healpix
FROM gdr2mock.main
GROUP BY healpix

Download table as:  ASCIITypeset image

The real query time may depend on the service used and the server load at the time of query. For more information on the underlying technique, see Taylor et al. (2016).

We therefore recommend to restrict one's queries to a reasonable spatial subset during the development phase. The ADQL extension Common Table Expressions facilitates this. For instance the luminosity function toward the galactic center restricted to a half-degree cone

SELECT COUNT() AS ct, ROUND(phot_g_mean_mag,1) as bin
FROM gdr2mock.main
WHERE DISTANCE(POINT('GALACTIC', l, b), POINT('GALACTIC', 0., 0.)) <0.5
GROUP BY bin
ORDER BY bin

Download table as:  ASCIITypeset image

takes just a few seconds and can still run through "synchronous" query mode, compared to querying the complete luminosity function, which takes about an hour:
SELECT COUNT() AS ct, ROUND(phot_g_mean_mag,1) as bin
FROM gdr2mock.main
GROUP BY bin

Download table as:  ASCIITypeset image

For illustration, we compare this last query against the luminosity function of Gaia DR1 and TGAS in Figure 3.

Figure 3.

Figure 3. Luminosity function of the GDR2mock catalog in red, compared to Gaia DR1 in blue and its TGAS subset in green. Also indicated are the approximate apparent magnitude limits of the GDR2 radial velocity measurement (solid black) and stellar parameter estimates (dashed gray). The bin size is 0.1 mag in G.

Standard image High-resolution image

For bright magnitudes, G < 11, we can compare the properties of GDR2mock directly with TGAS. For example, we can compare the proper motion in right ascension of TGAS to our catalog. The query for that data is exactly the same for both catalogs except that gdr2mock.main needs to be exchanged for tgas.main:

SELECT AVG(pmra) AS mean_pmra, IVO_HEALPIX_INDEX(5, ra, dec) AS healpix
FROM gdr2mock.main
WHERE phot_g_mean_mag < 11 AND 1/parallax > 0.5
GROUP BY healpix

Download table as:  ASCIITypeset image

Figure 4 shows that overall the two catalogs have similar distributions of motion.

Figure 4.

Figure 4. Mean proper motion in right ascension, μα, across the sky (Nside = 32, 1 healpix = 3.4 deg2) in Galactic coordinates for TGAS at the top and our mock catalog in the bottom for G < 11 and $\tfrac{1}{\varpi }\gt 0.5\,\mathrm{kpc}$. The color-coding indicates the mean μα per healpix in mas/yr and saturates at the displayed limits. White pixels in TGAS have no data.

Standard image High-resolution image

We can also compare the parallaxes between TGAS and GDR2mock. The following query:

SELECT parallax, phot_g_mean_mag
FROM gdr2mock.main
WHERE phot_g_mean_mag < 11

Download table as:  ASCIITypeset image

yields Figure 5, the distribution of stars in (apparent) magnitude—distance, where the prominent diagonal stripe is composed of red clump stars.

Figure 5.

Figure 5. Distance in kpc vs. apparent G magnitude for TGAS (left) and GDR2mock (right). The color-coding shows the log density.

Standard image High-resolution image

Similarly, we compare their parallax histograms

SELECT COUNT() AS ct, ROUND(parallax,2) AS bin
FROM gdr2mock.main
WHERE phot_g_mean_mag < 11
GROUP BY bin

Download table as:  ASCIITypeset image

in Figure 6, which illustrates the difference between true (GDR2mock) and measured (TGAS) parallaxes (i.e., inclusion of measurement uncertainties). Beware that parallax measurements from GDR2 will be more accurate than from TGAS, even though the nominal uncertainty model is very optimistic for those bright stars. When for example sampling observed parallaxes for this G < 11 subsample of GDR2mock using parallax and parallax_error the chances of measuring a non-positive parallax at all is below 1%.

Figure 6.

Figure 6. Parallax histogram in 0.01 mas bins for TGAS and our mock catalog with G < 11. The tail of negative parallaxes in TGAS is missing in this graphic representation.

Standard image High-resolution image

The distribution of stars in the Galaxy which will have radial velocities in GDR2 is displayed in Figure 7, which resulted from the query

SELECT 8–COS(RADIANS(b))(1/parallax)COS(RADIANS(l)) AS x, COS(RADIANS(b))(1/parallax)SIN(RADIANS(l)) AS y, 0.015 + (1/parallax)SIN(RADIANS(b)) AS z
FROM gdr2mock.main
WHERE phot_g_mean_mag < 13 AND
teff_val > 3550 AND teff_val < 6900

Download table as:  ASCIITypeset image

Figure 7.

Figure 7. Spatial distribution of stars in the GDR2mock catalog with G < 13 and 3550 < teff_val < 6900, illustrating the expected volume for which GDR2 will provide full 6D phase-space information. The color encodes logarithmic density. The Fingers of God effect is due to dust along the line of sight, the observer being centered at solar position i.e., (X, Y, Z) = (8.0, 0.0, 0.015) kpc.

Standard image High-resolution image

5. Summary

We presented a simulation of the Gaia DR2 stellar content which can be accessed via http://dc.g-vo.org/tableinfo/gdr2mock.main. Using Galaxia and realistic 3D extinction maps we have produced a catalog, GDR2mock, that closely resembles the Gaia observations (cf. Figure 2). Together with the scaled nominal uncertainty estimates, our mock catalog will give the scientific community a convenient tool to hone queries and know what to expect from GDR2; beyond the GDR2 release, this mock catalog provides a valuable comparison for science analysis. It should serve as a test-bed for first day GDR2 scientific projects (in runtime and ADQL syntax), as well as a comparison to real queries in order to establish field contamination or confirm unexpected features.

We thank the anonymous referee for their prompt report. The authors thank Leo Girardi, Jo Bovy, Sanjib Sharma and Alcione Mora for their useful help.

This work made use of topcat (Taylor 2005), HEALPix (Górski et al. 2005), astropy, and ezpadova19 suites and packages.

We thank the German Astrophysical Virtual Observatory20 for the publishing platform and for fruitful discussions on the technical aspects of this endeavor.

This work was funded in part by the DLR (German space agency) via grant 50 QG 1403. J.R and H.W.R. acknowledge funding from the European Research Council under the European Unions Seventh Framework Programme (FP7) ERC Advanced Grant Agreement No. [321035].

This project was developed in part at the 2017 Heidelberg Gaia Sprint, hosted by the Max-Planck-Institut für Astronomie, Heidelberg.

This work has made use of data from the European Space Agency (ESA) mission Gaia, processed by the Gaia Data Processing and Analysis Consortium (DPAC). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.

Footnotes

Please wait… references are loading.
10.1088/1538-3873/aabd70