Abstract
We present a mock catalog of Milky Way stars, matching in volume and depth the content of the Gaia data release 2 (GDR2). We generated our catalog using Galaxia, a tool to sample stars from a Besançon Galactic model, together with a realistic 3D dust extinction map. The catalog mimics the complete GDR2 data model and contains most of the entries in the Gaia source catalog: five-parameter astrometry, three-band photometry, radial velocities, stellar parameters, and associated scaled nominal uncertainty estimates. In addition, we supplemented the catalog with extinctions and photometry for non-Gaia bands. This catalog can be used to prepare GDR2 queries in a realistic runtime environment, and it can serve as a Galactic model against which to compare the actual GDR2 data in the space of observables. The catalog is hosted through the virtual observatory GAVO's Heidelberg data center (http://dc.g-vo.org/tableinfo/gdr2mock.main) service, and thus can be queried using ADQL as for GDR2 data.
Export citation and abstract BibTeX RIS
1. Introduction
Gaia (Gaia Collaboration et al. 2016) is an ongoing ESA astrometric space mission about to deliver positions, parallaxes, proper motions, and three photometric bands for a set of ∼1.4 billion sources across the whole sky with its second data release (GDR2; Lindegren et al. 2018). This data set will also provide effective temperatures, luminosities, extinction estimates and radial velocity measurements for a substantial subset of those plus some other data products. This vast amount of data will be a practical challenge to explore and should usher the community into a new regime in Galactic stellar astronomy, where well-designed ADQL3 queries become a common tool to obtain manageable data sets from hosting services like the Virtual Observatory (VO; Demleitner 2014).
To help prepare the scientific community for this phase change, we present in this paper a mock catalog that contains the prospective GDR2 stellar content. A first mock data set of Gaia data has long been available, the so-called Gaia Universe Model (GUMS; Robin et al. 2012). However, the primary goal of that catalog was to provide simulations to the data processing consortium (DPAC). Hence, its design does not offer the same capabilities as our GDR2mock catalog. In addition to an improved 3D extinction map which results in a slightly larger starcount (i.e., ∼1.1 billion stars compared to ∼1.0 bn in GUMS for stars brighter than G = 20 and a total starcount of ∼1.6 bn for our complete catalog down to G = 20.7), the main difference by construction is that this catalog fully mimicks the GDR2 format. This enables GDR2 users to test their ADQL queries and helps with their science analysis (e.g., selection function).
Our catalog is accessible online, most easily via topcat exploiting the VO table access protocol (TAP) service from GAVO4 where the catalog is referenced under gdr2mock.main.
2. Catalog Generation
Our catalog is based on a chemo-dynamical model Milky Way, Galaxia (Sharma et al. 2011), which we associated with a 3D dust extinction model before generating photometric observables. The following subsections outline the steps in this mock data set generation.5
2.1. The Galaxia Model
Galaxia is a tool that allows one to sample stars from the Besançon Galactic model (Robin et al. 2003), using a specific set of stellar isochrones to obtain their astrophysical parameters. The Galactic warp was switched on during the simulations and the solar zero-point was set to (X, Y, Z) = (−8.0, 0.0, 0.015) kpc and the velocities to . Transformations from phase-space to observable coordinates on the sky (ra, dec, pm_ra_cosdec, pm_dec and radial_velocity) were done using astropy6 (The Astropy Collaboration et al. 2018). And we used the latest PARSEC isochrones7 —PARSEC v1.2S+ COLIBRI PR16 (Bressan et al. 2012; Marigo et al. 2017; Rosenfield et al. 2016; Marigo et al. 2013)—which also provide photometric values for each star using the nominal Gaia DR1 photometric bands G, BP, and RP (Jordi et al. 2010). GDR2 passbands where not available during the construction of this catalog.
At this stage, we were already able to account for the magnitude limit of Gaia and only selected stars with apparent magnitude brighter than G = 20.7 mag, which preliminarily resulted in over six billion sources.
2.2. Dust-attenuated Photometry
A crucial step in transforming a Galaxia simulation into a catalog resembling actual observations is the application of a dust distribution, which will change the apparent colors and luminosities of the stars.
Because the Gaia photometric bands span a broad wavelength range (∼300 nm), the simple conversion of extinction coefficients from e.g., Schlafly & Finkbeiner (2011, Table 6) to reddening and extinction into the Gaia bands, e.g., AG, is only a poor approximation and may lead to significant inconsistency across the broad range of stellar spectra. Instead we must account for non-linearities in particular with respect to the stars' colors. Fortunately, the PARSEC isochrones also provide dust-attenuated photometry in various photometric systems, including the Gaia passbands (DR1, nominal passbands).
To include a realistic dust distribution on the Galaxia model, we used the combined 3D extinction map from Bovy et al. (2016), through its python package mwdust8 , which is capable of returning line-of-sight extinctions when provided with sky coordinates and distances. This 3D dust map combines the results of Marshall et al. (2006); Green et al. (2015), and Drimmel et al. (2003) and it provides E(B-V)SFD values on the scale defined in Schlegel et al. (1998).9 As discussed in Schlafly & Finkbeiner (2011), the E(B-V)SFD scale overestimates the extinction by 14% with respect to their own findings. Hence we corrected for this overestimation and adopted the prescription associated with the PARSEC isochrones of Cardelli et al. (1989); O'Donnell (1994) with R0 = 3.1 to derive the monochromatic extinction (in mag) at wavelength λ = 547.7 nm as
Matching each star from Galaxia to an isochrone and a proper amount of extinction is a challenging task for 6 billion stars. Instead, we approximated each star to its closest match from a precomputed collection of dust-attenuated stellar isochrones. The grid spans A0 values ranging from 0 to 15 mag with in steps of 0.025 mag (for stars with even higher extinction we linearly extrapolated the extinction values) and [Fe/H] values from −2 to 0.5 dex in steps of 0.25 dex. We further bin in in 0.02 dex steps and in 0.2 dex steps on a star-by-star basis. Each star in our catalog is associated with an index_parsec number that records this matching step and maps each star onto the grid of isochrones and thus allows us to query photometric measurements in other bands from the supplementary parsec photometry and extinction table. Figure 1 shows the resulting color–magnitude and absolute magnitude diagrams of the resulting final data set (applying Gaia selection after accounting for the dust attenuation).
The following ADQL query provides the data to plot the left panel of Figure 1:
SELECT count(∗) AS N, |
ROUND(phot_bp_mean_mag---phot_rp_mean_mag, 2) AS color, |
ROUND(phot_g_mean_mag + 5 ∗ log10(parallax/100), 1) AS mag |
FROM gdr2mock.main |
GROUP BY color, mag |
Download table as: ASCIITypeset image
As the latest PARSEC models (v1.2S + COLIBRI) did not provide dust-attenuated photometry when this catalog was drawn up, we had to match the previous version, PARSEC1.2S (Chen et al. 2014; Tang et al. 2014; Chen et al. 2015) to Galaxia, based on PARSEC v1.2S+ isochrones. This inconsistency affects only a limited range of evolution phases that were deeply revised between the two sets of isochrones (e.g., O stars, TP-AGB).
2.3. Additional Non-Gaia Photometry
Our catalog provides apparent magnitudes in the nominal DR1 passbands10 G, BP, and RP. In addition, we provide an additional table, which can be used to obtain photometry for UBVRIJHK (Bessell & Brett 1988; Bessell 1990; Maíz Apellániz 2006), SDSS (Fukugita et al. 1996), 2MASS (Cohen et al. 2003), and WISE11 (Wright et al. 2010) to a precision of ≈0.1 mag. This uncertainty mainly arises from the finite resolution of the isochrone grid we used, which corresponds to 0.2 dex spacing in log-luminosity. With actual GDR2 data, those would be obtained with catalog cross-matching, which of course is not possible with a mock catalog and its random realization of the actual star positions.
The following query illustrates how to obtain complementary photometry (e.g., 2MASS) to the main GDR2mock catalog:
SELECT COUNT(∗) AS N, mag_2mass_j AS mag, mag_2mass_j---mag_2mass_ks AS color |
FROM gdr2mock.main AS main |
JOIN gdr2mock.photometry AS phot |
USING (index_parsec) |
WHERE main.random_index <=1606747 |
GROUP BY color, mag |
Download table as: ASCIITypeset image
2.4. Uncertainty Model
All values provided in the mock catalog are noise-free. As a result, there are no negative parallaxes and the parallaxes can be directly inverted to give exact model distances. To obtain noisy mock observations, one should sample any quantity, say the parallax measurement, from a Gaussian with the true parallax as mean and the parallax_error as the standard deviation. To enable this we provide in the catalog astrometric and photometric-uncertainty estimates based on the nominal uncertainty model12 (de Bruijne 2005) scaled to the duration of the data segment in GDR2 (which is about 668 days or 37% of the 5 year nominal mission duration). This nominal model depends also on the ecliptic latitude, β (which enters via an averaged version of the scanning law). We assume an uncertainty scaling relation of with the number of observations, n, for parallaxes, positions, proper motions and magnitudes, neglecting the noise floors and slightly different scaling for the proper motions based on official communication.
More specifically, we use an approximation of the Gaia scanning law (scaled to the 22 month data segment) that gives us the number of observations, n, as a function of ecliptic latitude in 20 bins.13 To calculate the parallax uncertainty we use the nominal end-of-mission (eom) parallax uncertainty, , multiply it by the ecliptic latitude dependent uncertainty factor (https://www.cosmos.esa.int/web/gaia/table-6 which includes the nominal number of observations) and rescale with the shortened 37% baseline:
We do the same with the positions and proper motions, which are also related to , but have their own ecliptic latitude dependent uncertainty factors provided by the abovementioned online Table 6.
For the nominal single-transit (st) photometric uncertainty and we simply scale with 1 over the square root of number of observations,
where X denotes the respective photometric band, i.e., BP, RP, or G.
We do not provide uncertainty estimates for the radial velocity, but the interested reader is referred to Gaia Collaboration et al. (2018).
2.5. Astrophysical Parameters
A complete simulation of the Milky Way, such as Galaxia, offers not only exact phase-space information of the stars and prediction of their photometric properties, but also of their underlying physical parameters: ages, masses, metallicities, gravities, luminosities, and effective temperatures, etc. These underlying stellar parameters should prove useful in tuning cuts in observables (e.g., color, magnitude and parallax) to optimize for a specific target stellar population (e.g., OB stars, stars with high extinction, old metal-rich stars etc.), and we include them in this mock catalog. Note that GDR2 will provide observational quantities for some of these stellar parameters, which were derived for sources with G ≤ 17 mag from the Gaia photometry and parallax measurements (Andrae et al. 2018), namely, effective temperature for some 161 million sources, line-of-sight extinction and the reddening, for 88 million sources, and luminosity and radius for 77 million sources.
3. Catalog Content, Access, and Limitations
3.1. Data Model and Catalog Content
Our catalog contains a total number of stars of 1 606 747 035, when matching the approximate flux limits of Gaia. The actual data model of our catalog can be inspected here: http://dc.g-vo.org/tableinfo/gdr2mock.main, mimicking by design the GDR2 data model14 : fields and associated names as well as their units. Note, however, that not all columns that appear in DR2 are filled in our catalog and that we provide a few additional ones. Specifically,
- Nobs is added, reflecting the nominal ecliptic latitude dependent number of visits for GDR2.
- Age, mass, feh, logg and a0 are added, while luminosity, effective temperature, AG, E(BP-RP) and radius are filled into their respective Apsis (Bailer-Jones et al. 2013) fields: teff_val, a_g_val, e_bp_min_rp_val, lum_val and radius_val. Beware that in DR2 these are only provided for a subset of stars with G ≤ 17 mag (cf. Andrae et al. 2018), whereas in our mock catalog we provide entries for all sources.
- Index_parsec is an index for joining the main mock catalog to other photometric bands/extinctions in the gdr2mock.photometry table.
Similarly to GDR2, we also provide
- Random_index is an integer ranging from 0 to 1 606 747 034, the total number of stars in the mock catalog minus one. This index is useful to create random subsamples representative of the entire catalog.
- Source_id follows the Gaia referencing scheme. It is primarily the healpix15
number using NSIDE = 4096 with the nested scheme in equatorial coordinates multiplied by 235. The remaining digits of source_id are reserved for a running number that serves as a unique identifier per healpix cell. Unlike Gaia no bits are reserved for Data Processing Center identification. Still the source_id can be easily turned into healpix number for any arbitrary healpix level smaller than 12 (level 12 corresponding to Nside = 4096) via division:
3.2. Catalog Access
The table is available through GAVO's TAP service16 and is registered in the VO registry as ivo://org.gavo.dc/gdr2mock/q/main. The full catalog will be hosted by GAVO for at least six month and potentially until GDR3. In the long term there will be a subsample hosted by GAVO which will be cut using the first 10% stars according to the random_index. However, a bulk download of the complete catalog (without time limitations) is available as FITS binary tables from the reference URL.17
The GDR2mock main table is instantiated using a view (resembling the GDR2 data model) of the actual FITS files. This is why the indexed columns are not marked as such in the gdr2mock.main table but instead in the gdr2mock.generated_data table. Indexed and therefore fast to query columns are: ra, dec, l, b, pmra, pmdec, phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag, source_id and random_index.
It is also planned to host the complete catalog on the Gaia archive (https://gea.esac.esa.int/archive/).
3.3. Limitations
This mock catalog has obvious scientific limitations that stem both from the underlying Milky Way model and from our generation of mock observables.
Galaxia is simulating neither stellar binaries nor stellar remnants, which will appear in the Gaia data. The phase-space distributions of the stars are assumed smooth and therefore does not generate phase-space or configuration-space clustering. The model does not account for extragalactic systems, including LMC, SMC, M31 and M33 which are prominently visible in the GDR1 panel of Figure 2 and not in our mock catalog.
To produce extinction estimates, we approximated each star from Galaxia by its nearest model in astrophysical space of a grid of isochrones (see Section 2.2). In addition, observational artifacts were not simulated in our catalog, which can affect the photometry and magnitude limits of stars close to bright sources in the real GDR2 catalog. In particular, we did not attempt to simulate the scanning law and varying magnitude completeness due to crowding issues.
Download figure:
Standard image High-resolution imageFinally, this model aims to reproduce the statistical properties of the Milky Way, not its actual properties at the star-by-star level. Hence, cross-matching of our catalog with any other catalog would be moot.
4. More Example Queries
This catalog offers means to prepare and test ADQL queries for the prospective of GDR2 science cases in a similar runtime environment to the real GDR2 data.18 Because of the sheer number of sources, a sequential scan (i.e., processing all rows, bypassing indices) will take about an hour wall clock time. This is true for the query on the GAVO service that yielded the data displayed in Figure 2:
SELECT count(∗) AS N, ivo_healpix_index(7, ra, dec) AS healpix |
FROM gdr2mock.main |
GROUP BY healpix |
Download table as: ASCIITypeset image
We therefore recommend to restrict one's queries to a reasonable spatial subset during the development phase. The ADQL extension Common Table Expressions facilitates this. For instance the luminosity function toward the galactic center restricted to a half-degree cone
SELECT COUNT(∗) AS ct, ROUND(phot_g_mean_mag,1) as bin |
FROM gdr2mock.main |
WHERE DISTANCE(POINT('GALACTIC', l, b), POINT('GALACTIC', 0., 0.)) <0.5 |
GROUP BY bin |
ORDER BY bin |
Download table as: ASCIITypeset image
SELECT COUNT(∗) AS ct, ROUND(phot_g_mean_mag,1) as bin |
FROM gdr2mock.main |
GROUP BY bin |
Download table as: ASCIITypeset image
Download figure:
Standard image High-resolution imageFor bright magnitudes, G < 11, we can compare the properties of GDR2mock directly with TGAS. For example, we can compare the proper motion in right ascension of TGAS to our catalog. The query for that data is exactly the same for both catalogs except that gdr2mock.main needs to be exchanged for tgas.main:
SELECT AVG(pmra) AS mean_pmra, IVO_HEALPIX_INDEX(5, ra, dec) AS healpix |
FROM gdr2mock.main |
WHERE phot_g_mean_mag < 11 AND 1/parallax > 0.5 |
GROUP BY healpix |
Download table as: ASCIITypeset image
Download figure:
Standard image High-resolution imageWe can also compare the parallaxes between TGAS and GDR2mock. The following query:
SELECT parallax, phot_g_mean_mag |
FROM gdr2mock.main |
WHERE phot_g_mean_mag < 11 |
Download table as: ASCIITypeset image
Download figure:
Standard image High-resolution imageSimilarly, we compare their parallax histograms
SELECT COUNT(∗) AS ct, ROUND(parallax,2) AS bin |
FROM gdr2mock.main |
WHERE phot_g_mean_mag < 11 |
GROUP BY bin |
Download table as: ASCIITypeset image
Download figure:
Standard image High-resolution imageThe distribution of stars in the Galaxy which will have radial velocities in GDR2 is displayed in Figure 7, which resulted from the query
SELECT 8–COS(RADIANS(b)) ∗ (1/parallax) ∗ COS(RADIANS(l)) AS x, COS(RADIANS(b)) ∗ (1/parallax) ∗ SIN(RADIANS(l)) AS y, 0.015 + (1/parallax) ∗ SIN(RADIANS(b)) AS z |
FROM gdr2mock.main |
WHERE phot_g_mean_mag < 13 AND |
teff_val > 3550 AND teff_val < 6900 |
Download table as: ASCIITypeset image
Download figure:
Standard image High-resolution image5. Summary
We presented a simulation of the Gaia DR2 stellar content which can be accessed via http://dc.g-vo.org/tableinfo/gdr2mock.main. Using Galaxia and realistic 3D extinction maps we have produced a catalog, GDR2mock, that closely resembles the Gaia observations (cf. Figure 2). Together with the scaled nominal uncertainty estimates, our mock catalog will give the scientific community a convenient tool to hone queries and know what to expect from GDR2; beyond the GDR2 release, this mock catalog provides a valuable comparison for science analysis. It should serve as a test-bed for first day GDR2 scientific projects (in runtime and ADQL syntax), as well as a comparison to real queries in order to establish field contamination or confirm unexpected features.
We thank the anonymous referee for their prompt report. The authors thank Leo Girardi, Jo Bovy, Sanjib Sharma and Alcione Mora for their useful help.
This work made use of topcat (Taylor 2005), HEALPix (Górski et al. 2005), astropy, and ezpadova19 suites and packages.
We thank the German Astrophysical Virtual Observatory20 for the publishing platform and for fruitful discussions on the technical aspects of this endeavor.
This work was funded in part by the DLR (German space agency) via grant 50 QG 1403. J.R and H.W.R. acknowledge funding from the European Research Council under the European Unions Seventh Framework Programme (FP7) ERC Advanced Grant Agreement No. [321035].
This project was developed in part at the 2017 Heidelberg Gaia Sprint, hosted by the Max-Planck-Institut für Astronomie, Heidelberg.
This work has made use of data from the European Space Agency (ESA) mission Gaia, processed by the Gaia Data Processing and Analysis Consortium (DPAC). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.
Footnotes
- 3
ADQL = astronomical data query language.
- 4
- 5
Part of the routines we used can be retrieved from https://github.com/jan-rybizki/Galaxia_wrap.
- 6
- 7
PARSEC = Padova Trieste evolution code (including the pre-main sequence phase); http://stev.oapd.inaf.it/cgi-bin/cmd.
- 8
- 9
For a few 3D positions the map returns negative extinctions, but we truncated these to zero.
- 10
The catalog will potentially have an update that uses the GDR2 passbands which will be called gdr2mock_v2.
- 11
- 12
End-of-mission astrometric- and single-transit photometric-uncertainty relations from https://www.cosmos.esa.int/web/gaia/science-performance requiring V-I color which we calculated internally.
- 13
- 14
- 15
- 16
Access URL http://dc.g-vo.org/tap, which is also what the runtime estimates refer to.
- 17
- 18
ADQL syntax check on a GDR2 VO service can be run here: http://gaia.ari.uni-heidelberg.de/adql-validator.html.
- 19
- 20