Articles

AN OPTIMIZED METHOD TO IDENTIFY RR Lyrae STARS IN THE SDSS×Pan-STARRS1 OVERLAPPING AREA USING A BAYESIAN GENERATIVE TECHNIQUE

, , , , , , and

Published 2014 May 28 © 2014. The American Astronomical Society. All rights reserved.
, , Citation Mohamad Abbas et al 2014 AJ 148 8 DOI 10.1088/0004-6256/148/1/8

1538-3881/148/1/8

ABSTRACT

We present a method for selecting RR Lyrae (RRL) stars (or other types of variable stars) in the absence of a large number of multi-epoch data and light curve analyses. Our method uses color and variability selection cuts that are defined by applying a Gaussian Mixture Bayesian Generative Method (GMM) on 636 pre-identified RRL stars instead of applying the commonly used rectangular cuts. Specifically, our method selects 8115 RRL candidates (heliocentric distances < 70 kpc) using GMM color cuts from the Sloan Digital Sky Survey (SDSS) and GMM variability cuts from the Panoramic Survey Telescope and Rapid Response System 1 3π survey (PS1). Comparing our method with the Stripe 82 catalog of RRL stars shows that the efficiency and completeness levels of our method are ∼77% and ∼52%, respectively. Most contaminants are either non-variable main-sequence stars or stars in eclipsing systems. The method described here efficiently recovers known stellar halo substructures. It is expected that the current completeness and efficiency levels will further improve with the additional PS1 epochs (∼3 epochs per filter) that will be observed before the conclusion of the survey. A comparison between our efficiency and completeness levels using the GMM method to the efficiency and completeness levels using rectangular cuts that are commonly used yielded a significant increase in the efficiency level from ∼13% to ∼77% and an insignificant change in the completeness levels. Hence, we favor using the GMM technique in future studies. Although we develop it over the SDSS×PS1 footprint, the technique presented here would work well on any multi-band, multi-epoch survey for which the number of epochs is limited.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

Understanding the process of galaxy formation has always been an important goal in astrophysics. In particular, the formation and evolution of disk galaxies still pose many unsolved questions. Many observational studies have focused on the Milky Way as the one disk galaxy that can be studied in the greatest detail (e.g., see the reviews of Freeman & Bland-Hawthorn 2002; Ivezić et al. 2012). Special emphasis in these studies has been placed on the Galactic stellar halo (e.g., Johnston et al. 2008; Schlaufman et al. 2009), the old roughly spherical and extended component of our Galaxy, which is believed to hold important information about the process of galaxy formation.

While accretion of massive systems and in situ star formation processes (e.g., Yanny et al. 2003; Jurić et al. 2008; De Lucia & Helmi 2008; Zolotov et al. 2010; Font et al. 2011; Schlaufman et al. 2012) presumably resulted in the formation of the inner halo (Galactocentric radius less than 15 kpc), it is believed that the outer halo formed as a result of accretions and mergers of smaller systems (e.g., Ibata et al. 1995; Bullock & Johnston 2005; Newberg et al. 2003; Duffau et al. 2006; Carollo et al. 2007; McCarthy et al. 2012; Beers et al. 2012). This scenario implies that many of the halo stars were formed in dwarf galaxies outside the Milky Way (e.g., Bullock & Johnston 2005; Abadi et al. 2006). As witnesses of the early phase of the formation of our Galaxy, these halo stars can be used as fossils to trace back the history of our Galaxy (e.g., Johnston et al. 2008; Schlaufman et al. 2009; Zolotov et al. 2010). A complete and a deep map of the halo is vital to find the remnants of the accretion processes (e.g., Keller et al. 2008; Bell et al. 2008; Zolotov et al. 2010). Over the past decade, various halo overdensities and stellar streams have been discovered using different methods and different types of stars. For a summary, see Ivezić et al. (2012). The accreted substructures identified so far mainly seem to consist of old stars. Thus, it is expected that such populations are revealed by maps of RR Lyrae (RRL) stars, these being found only in old stellar populations.

Hence, finding RRL stars and their distances is one way to map the Galactic halo and find its stellar streams. These stars can also be used as objects to study the intrinsic halo population, the distribution, and the gradients in halo metallicity. For instance, the domination of the inner and outer halo by slightly more metal-rich and metal-poor stars, respectively, and their different global kinematics supports the different scenarios of the formation processes of the inner (in situ formation) and outer (accretion processes) halo (Carollo et al. 2007, 2010). This evolutionary picture is also supported by studying RRL stars in both parts of the halo (e.g., Kinman et al. 2012). However, the number of predicted substructures vary substantially (e.g., Bell et al. 2008; Deason et al. 2011; Zinn et al. 2014) and thus more observations are needed.

Another advantage of using RRL stars to map the Galactic halo is their well-defined mean absolute V-band magnitude (〈MV〉 = 0.6; Layden et al. 1996), which can be used to infer their distances in addition to their well-studied colors and light curve properties. They are variable horizontal branch stars with periods less than ∼1 day (Smith 1995), so the detection of RRL stars requires repeated observations.

For instance, Watkins et al. (2009) and Sesar et al. (2010) used data from the Sloan Digital Sky Survey (SDSS; Fukugita et al. 1996; York et al. 2000; Abazajian et al. 2009) to look for RRL stars in Stripe 82 (−50° < R.A. < 59°, −1fdg25 < decl. < 1fdg25), which was observed around 80 times. The Watkins et al. (2009) and Sesar et al. (2010) catalogs contain 407 and 483 RRL stars in Stripe 82, respectively, with heliocentric distances (dh) in the ∼4–120 kpc range. According to Sesar et al. (2010), their catalog has efficiency (fraction of the true RRL stars in the sample) and completeness (fraction of the RRL stars recovered in the sample) levels of ≳ 99%. Hence, we use the latter catalog as a comparison catalog to compute the efficiency and completeness levels in our study.

Using the SDSS and the Lincoln Near Earth Asteroid Research survey (LINEAR; Harris 1998; Sesar et al. 2011), Sesar et al. (2013) announced the discovery of ∼5000 RRL stars with dh in the 5–30 kpc range that cover ∼8000 deg2 of the sky. LINEAR has no spectral filters and has a mean number of 250 observations per object. These RRL stars were selected using SDSS color cuts, LINEAR variability cuts, and light curve analysis.

The Catalina Real-Time Transient Survey (CRTS; Drake et al. 2009, 2013) was used to discover ∼14,000 RRab stars with dh up to 100 kpc using variability statistics, period finding and Fourier fitting techniques (Drake et al. 2009, 2013). Just like LINEAR, CRTS observes the sky repeatedly (∼250 times per object) using no spectral filters.

Using data from the SDSS, Panoramic Survey Telescope, and Rapid Response System 1 3π survey (hereafter PS1; Kaiser et al. 2002, 2010), and the CRTS, Abbas et al. (2014, hereafter Paper I) were able to detect ∼6371 RRL stars with an efficiency of ∼99% and ∼87% for RRab and RRc stars, respectively. The high efficiency level obtained was due to the accurate variability statistics and light curve analyses obtained from the CRTS multi-epoch data. The template fitting method (Layden 1998; Layden et al. 1999) and visual inspection were performed on all light curves for a more reliable classification (Paper I). When light curve analyses are available, the techniques used in Paper I can be adopted to detect RRL stars easily. However, light curve analysis is not always possible as not all surveys provide enough multi-epoch data. The technique developed and used in the current paper can be adopted in such surveys with few epochs.

In the current paper, we look for RRL candidates by cross-matching the SDSS data with data from PS1. We show that using a Gaussian Mixture Bayesian Generative Method (GMM; VanderPlas et al. 2012) to set selection boundary cuts on the SDSS colors and PS1 variability allows one to find RRL stars (or other types of variable stars) even when only a small number of repeated observations are available and light curve analysis is not possible. Our method's efficiency and completeness levels also allow us to detect halo stellar streams and substructures.

A more detailed description of the surveys we used is given in Section 2. In Section 3, we study the properties of RRL stars in the SDSS and PS1 photometric systems using more than 600 pre-identified RRL stars. In Section 4, we describe our method for selecting RRL candidates using the GMM selection boundary cuts for the SDSS colors and the PS1 variability. In the same section, we compute the efficiency and completeness levels of our method by comparing our results with the catalog of RRL stars from Sesar et al. (2010). Additionally, we compare the efficiency and completeness levels of our GMM method to the efficiency and completeness levels obtained using the rectangular cuts technique. In the same section, we study the properties of the contaminant stars. In Section 5, we apply our color and variability cuts to the whole overlapping footprint between the SDSS and PS1 to find the RRL candidates. In Section 6, we derive the distances for our RRL candidates and we use these distances to recover two known halo substructures. The content of the paper is summarized and discussed in Section 7.

2. SURVEY DATA

Our method for searching for RRL stars works by using color and variability information from the SDSS and PS1, respectively.

2.1. SDSS And PS1

The SDSS (Stoughton et al. 2002; Abazajian et al. 2009) is a deep spectroscopic and photometric survey (g < 23.3) that uses five filters (u, g, r, i, and z) to survey ∼12,000 deg2 of the sky. Although most of the SDSS data are based on single-epoch observations, ∼270 deg2 of the Southern Galactic hemisphere, the so-called Stripe 82, have been observed around 80 times.

The PS1 3π survey (Kaiser et al. 2002, 2010) is a ∼3.5 yr (2010 May–2014 March) multi-epoch photometric and astrometric survey that is being conducted in Hawaii. The PS1 telescope repeatedly observes the entire sky north of declination 30° (3π survey). It uses a 1.8 m telescope with a 7 deg2 field of view. It is equipped with the largest digital camera in the world (1.4 gigapixels). One of its goals is to carry out a photometric and astrometric survey of stars in the Milky Way and the Local Group in five bandpasses (gP1, rP1, iP1, zP1, and yP1) covering the spectral range of 4000 Å < λ < 10500 Å. More information about these filters can be found in Tonry et al. (2012). The PS1 obtains multiple images of three quarters of the celestial sphere in the optical and near-infrared (Kaiser et al. 2002) to ∼22 mag in gP1 in individual exposures (Morganson et al. 2012). Specifically, it is designed to take four exposures per year and area with each of its filters (Morganson et al. 2012). By the end of the survey there should be ∼12 exposures per field and filter. Currently, the average number of observations in each of the five filters is 8 (Magnier et al. 2013).

The PS1 was mainly designed to detect potentially hazardous asteroids and near Earth objects (Kaiser et al. 2002). Because it is a deep survey that is repeatedly observing three quarters of the sky, its data are of interest for a wide range of different scientific topics. These topics cover different science areas, from solar system objects to cosmology. The PS1 data are of particular interest also for structural studies of the Milky Way affording a deeper and wider area coverage than previous surveys. When more than ∼4 epochs are available in at least two filters (∼4 epochs in each filter), the repeat observations of the PS1 allow one to identify variable stars such as RRL stars.

3. RRL STARS

RRL stars are best identified using color cuts, variability cuts, and light curve analysis. Although the colors of RRL stars in the SDSS photometric system have been studied and identified, the lack of variability information and light curve analysis poses difficulties in identifying these stars using the SDSS data alone. The SDSS data are based on single epoch observations with the exception of the overlapping regions and Stripe 82 (Sesar et al. 2007, 2010; Bramich et al. 2008). The PS1 is a multi-epoch survey that can be used to study the variability of stars but finding RRL stars using the PS1 data alone is a challenge since the number of repeat observations used in PS1 is small (at most 10 epochs per filter) and the cadence is somewhat irregular.

Most of the previous studies that looked for RRL stars used a large number of multi-epoch data for each star which allowed them to analyze their light curves. We on the other hand are using the small number of PS1 repeated observations, which makes finding these stars a challenge. Nonetheless, we will demonstrate that using GMM (VanderPlas et al. 2012) to set selection boundary cuts on the SDSS colors and PS1 variability allows us to find RRL stars to detect halo stellar streams and substructures.

3.1. The Colors of RRL Stars

The SDSS colors of RRL stars have been studied and characterized using the 483 RRL stars detected in Stripe 82 (Sesar et al. 2010, and other studies). Since the SDSS (ug) color serves as a surface gravity indicator for these stars, the range (∼0.3 mag) and the root-mean-square (rms) scatter (∼0.06 mag) are the smallest in this color (Ivezić et al. 2005).

The g, r, i, and z bands from the SDSS are similar to the gP1, rP1, iP1, and zP1 bands from the PS1, respectively. However, the u band is used only in the SDSS but not in the PS1, and the yP1 band is found only in the PS1 but not in the SDSS. This is due to the difference in the surveys' major scientific goals and in the different sensitivities in the used cameras. The lack of the u filter in the PS1 is a disadvantage when it comes to finding RRL stars.

Additionally, the SDSS operates in a drift-scanning mode where the sky objects pass through its five different filters almost simultaneously. The correct colors of the observed sky objects can then be obtained unless they are variable on very short time scales (i.e., few minutes). Consequently, the SDSS drift-scanning technique gives the correct colors of RRL stars as these stars have periods in the ∼0.2–1 day range.

However, the correct colors of RRL stars are not provided with the PS1 photometric system because of the PS1 imaging technique. The PS1 images a selected patch of the sky with different filters at different times. Magnitudes in different filters correspond to different phases for short period variable objects like RRL stars.

Figure 1(a) illustrates the (gr) versus (ri) color–color diagram of stars in Stripe 82 from the seventh data release of the SDSS (SDSS DR7; Abazajian et al. 2009), and Figure 1(b) illustrates the PS1 (gP1rP1) versus (rP1iP1) color–color diagram for the same stars. Red dots represent a subsample of non-RRL stars while blue filled circles represent a subsample of the RRL stars detected in Stripe 82 (Sesar et al. 2010). While the RRL stars occupy a small and well-defined region in the SDSS color–color diagram (see Figure 1(a)), they are spread out over a large and wide region in the PS1 color–color diagram (see Figure 1(b)). This is a result of the different observing techniques used by the SDSS (near-simultaneous imaging using different filters) and PS1 (non-simultaneous imaging).

Figure 1.

Figure 1. Illustration of the difference of the colors of RRL stars in the SDSS and PS1 photometric systems. Red dots show a subsample of non-RRL stars in Stripe 82 while blue filled circles show a subsample of the RRL stars detected in the same Stripe (Sesar et al. 2010). The scatter of RRL stars in the PS1 plot is due to non-simultaneous gP1 and rP1 observations by PS1 while the well-defined color region occupied by the RRL stars in the SDSS plot is due to the near-simultaneous imaging observations by the SDSS.

Standard image High-resolution image

We base our color cuts for selecting RRL candidates on colors from the SDSS DR7 photometric system and not on the colors from the PS1 photometric system due to the lack of the u band and of the true colors of RRL stars in the latter photometric system.

3.2. Pre-identified Sample of RRL Stars

We use 636 pre-identified RRL stars selected from the catalogs of RRL stars in the CRTS (Drake et al. 2013) and LINEAR (Sesar et al. 2013) surveys for a better characterization of the SDSS colors and PS1 variability properties of RRL stars.

These 636 RRL stars are chosen based on their clean photometry in the SDSS DR7 and PS1 photometric systems. These stars have photometric errors of less than 0.2 in u and less than 0.1 in g, r, i, z, gP1, and rP1. These are primary objects that are not blended or saturated in both surveys and that have been observed more than twice by PS1 in both gP1 (N$_{g_{P1} }\,{\ge}$ 3) and rP1 (N$_{r_{P1} }\,{\ge}$ 3), respectively. N$_{g_{P1}}$ and N$_{r_{P1}}$ represent the number of PS1 observations in the gP1 and rP1 filters, respectively. The two last cuts were applied in order to study the variability of RRL stars in the PS1 multi-epoch data.

We corrected the magnitudes for extinction using the recalibration of Schlegel et al.'s (1998) dust map by Schlafly & Finkbeiner (2011). Since the RRL stars used here are located in areas where the extinction is small (i.e., at high Galactic latitudes), such color corrections can be used. The color densities of the 636 RRL stars in the SDSS photometric system are shown in Figure 2 where red and blue regions reflect large and small numbers of RRL stars, respectively. A sample of non-RRL stars are also plotted as small white dots to demonstrate the colors of these contaminant stars (i.e., main-sequence stars and stars in eclipsing systems). RRL stars occupy small areas in the color–color diagrams in Figure 2 and are concentrated in well-defined regions, especially in the (ug) color, an advantage that helps in finding these stars.

Figure 2.

Figure 2. Different color–color diagrams of the 636 RRL stars in the SDSS photometric system where the red and blue regions reflect large and small numbers of RRL stars, respectively, as indicated by the color bars to the right of each panel. A sample of non-RRL stars are indicated as small white dots to demonstrate the colors of these contaminant stars. These colors are corrected for extinction using the recalibration of Schlegel et al.'s (1998) dust map by Schlafly & Finkbeiner (2011). RRL stars occupy small and well-defined regions in these plots.

Standard image High-resolution image

4. APPLYING AND TESTING OUR METHOD

It is important to test and maximize the completeness and efficiency levels of our method in selecting RRL stars before we apply our color and variability cuts to the whole area where the SDSS and PS1 data overlap.

For that reason, we define and apply our GMM color and variability boundary cuts to the stars found in Stripe 82. We then compare the Stripe 82 catalog of RRL stars, which has efficiency and completeness levels of ≳ 99% (Sesar et al. 2010) to the RRL stars that our method detects in the same region. RRL stars in Sesar et al.'s (2010) catalog span dh between ∼4 and ∼120 kpc and g magnitudes between ∼12.8 and ∼21.1 mag. There are 374 RRL stars in Sesar et al.'s (2010) catalog that are found in the overlapping area covered by PS1 and that are within our magnitude range (14.0 < gP1 < 20.0).

We base our comparison on these 374 RRL stars that are ≳ 99% efficient and complete in our magnitude range and sky coverage. We apply all our cuts and selection criteria step by step to stars found in Stripe 82. We then compute the efficiency and completeness levels for each step.

4.1. Stripe 82

  • 1.  
    We start by adopting initial rectangular color cuts from Sesar et al. (2010) to avoid downloading all the SDSS DR7 data in Stripe 82 (and later for the whole SDSS×PS1 footprint). Our RRL candidates must first pass the first four initial rectangular color cuts (Equations (6)–(9) in Sesar et al. 2010):
    Equation (1)
    Equation (2)
    Equation (3)
    Equation (4)
    These are single-epoch color ranges (Sesar et al. 2010) for RRab and RRc stars corrected for extinction using the Schlegel et al. (1998) dust map. The SDSS colors for RRL stars correspond to a random instant in their phase and depend on the time when the near-simultaneous SDSS photometry was obtained. It is thus safe to apply these color criteria to SDSS data but they are not suitable for PS1 data where the color range needs to be larger in order to account for the non-simultaneous observations in the PS1 filters. In order to avoid galaxies, these objects must be flagged as stars (${\tt type_{SDSS} = 6}$) in the SDSS. They should also be flagged as primary objects (${\tt mode_{SDSS} = 1}$) with clean photometry in the SDSS DR7 database (${\tt clean_{SDSS} = 1}$).Due to the noise and photometric errors resulting from the small number of PS1 epochs that we use in our method, some non-variable sources might appear as variables, especially faint sources with large photometric errors and bright sources that might saturate the CCD camera (see Section 4.3). To avoid this, we choose sources that are fainter than 14th and brighter than 20th magnitude in the PS1 gP1 filter (Equation (5)). Although PS1 will eventually observe each object around 12 times in each filter, the survey is not finished yet and the average number of detections per star is ∼8 epochs in each of the PS1 filters. Some of these detections were not taken under good photometric conditions and therefore were flagged as bad sources by the PS1 pipeline. To ensure the reliability of our variability cuts, only clean PS1 detections that are not saturated or blended, and are not flagged as cosmic rays are used in our study (Morganson et al. 2012).Thus, we only choose stars that have been more than two clean detections in both the gP1 and rP1 filters (Equations (6) and (7)) in order to reliably distinguish variable from non-variable stars:
    Equation (5)
    Equation (6)
    Equation (7)
    Variability cuts in the iP1, zP1, and yP1 filters are applied later.In the studied area of Stripe 82, we have ∼74,000 stars that passed the first four initial SDSS color cuts (Equations (1)–(4)), the PS1 magnitude cut (Equation (5)), and the PS1 threshold limit of the number of detections in both the gP1 and rP1 filters (Equations (6) and (7)).Although there are 374 RRL stars in the same area, we missed 85 of them. Around 92% of these 85 stars did not have more than two clean gP1 or rP1 PS1 detections (Equations (6) and (7)) while the rest 8% of the missed RRL stars did not pass all of the four SDSS color cuts (Equations (1)–(4)). This leaves us with 289 true RRL stars that we recovered in Stripe 82 (among the ∼74,000 stars that passed all the conditions in this step). The efficiency and completeness levels are then ∼0.39% (289/74, 000) and 77.3% (289/374), respectively.
  • 2.  
    In order to optimize our color selection of RRL candidates, we define color selection boundaries using the 636 RRL stars (see Section 3.2) in the SDSS (ug) versus (gr) and (gr) versus (ri) color–color diagrams with GMM (VanderPlas et al. 2012). GMM is a Bayesian generative classification method that fits different classes with simple non-correlated Gaussians. These Gaussians are then used to compute the likelihood of a point to belong to each class. The class with the highest likelihood is the predicted result. In our case, GMM uses the colors of the 636 pre-identified RRL stars and compares them to the colors of non-RRL stars to find the GMM color selection boundaries. We choose this method instead of adopting sharp rectangular cuts (e.g., Vivas et al. 2001; Sesar et al. 2007, 2010) in order to optimize our efficiency and completeness levels when light curve analyses are not possible due to the small number of PS1 observations.In Figures 3 and 4, the GMM color selection boundaries are applied and plotted in green in the (ug) versus (gr) and (gr) versus (ri) color–color diagrams, respectively, for a subsample of stars in Stripe 82. The colors of the 636 pre-identified RRL stars used to find the GMM color selection boundaries are shown with black open circles. Stars that fall inside our GMM selection boundaries are shown as blue dots while stars that fall outside are plotted as red dots. Only stars that fall inside the GMM color selection boundaries in both color–color diagrams ((ug) versus (gr) and (gr) versus (ri)) are retained for further analysis.This step significantly reduces the number of stars in our sample from ∼74,000 to 1820 stars, out of which 260 are true RRL stars.Although the GMM color boundaries are computed using more than 600 well identified RRL stars distributed around the sky, 29 true RRL stars from Step (1) did not pass one or both of these GMM color boundary cuts. These stars either have relatively large SDSS magnitude uncertainties that are reflected in their colors or they fall close to, but outside of, our GMM color boundaries.Because 1820 stars passed all the cuts in this step (and the cuts in the previous step), and because we were able to recover 260 out of the 374 RRL stars found in Stripe 82, our efficiency level is ∼14.3% (260/1820) while the completeness level is 70% (260/374).
  • 3.  
    After defining and applying the GMM selection boundaries for the SDSS colors in the previous section, we use the gP1, rP1, iP1, zP1, and yP1 multi-epoch data from PS1 to distinguish a variable from a non-variable star.Since we cannot rely on our small number of PS1 detections to phase the light curves and find their periods, we calculate low-order statistics (e.g., standard deviation) and use them to define a GMM selection boundary cut for the gP1 magnitudes as a function of the standard deviation in gP1 ($\sigma _{g_{P1}}$) plus the standard deviation in rP1 ($\sigma _{r_{P1}}$). In Figure 5, the GMM variability boundary plotted in green is computed by the GMM method (VanderPlas et al. 2012) which uses the ($\sigma _{g_{P1}}$ + $\sigma _{r_{P1}}$) values of the 636 pre-identified RRL stars compared to the ($\sigma _{g_{P1}}$ + $\sigma _{r_{P1}}$) values of non-variable stars to find the boundary of the variability cutoff. Although all of these 636 RRL stars are variable sources, only ∼90% of them fall above our variability boundary, while ∼10% show small or no variability due to the small number of epochs available from PS1. Only stars that fall above our GMM variability boundary are retained for further analysis. These stars have already passed the GMM selection boundaries for the SDSS colors discussed in the previous steps.In order to be considered as RRL candidates, stars that have more than two clean detections in the iP1, zP1, and yP1 filters must pass the following additional variability criterion:
    Equation (8)
    This threshold limit was adopted as more than 90% of the 636 pre-identified RRL stars (see Section 3.2) with more than two clean detections in the iP1, zP1, and yP1 filters have $\sigma _{i_{P1}}+\sigma _{z_{P1}}+\sigma _{y_{P1}}\,{\ge}\, 0.1$. This criterion is applied to stars with $N_{i_{P1}}\,{\ge}\, 3$, $N_{z_{P1}}\,{\ge}\, 3$, and $N_{y_{P1}}\,{\ge}\, 3$ that have already passed all of our GMM color and variability selection boundaries. Stars that passed our GMM color and variability selection boundaries and that do not have more than two good detections in the iP1, zP1, and yP1 filters are still considered RRL candidates.Applying the GMM variability selection cut (see Figure 5) for ($\sigma _{g_{P1}}$ + $\sigma _{r_{P1}}$) and the variability cut in the iP1, zP1, and yP1 filters (see Equation (8)) to the 1820 RRL candidates from Step (2) reduces the number of RRL candidates to 255 stars, out of which 195 are true RRL stars and 60 are contaminant stars. We discuss the nature of the 60 contaminant stars in Section 4.3.At the same time, 65 RRL stars were lost when moving from Step (2) to Step (3). These stars did not show a significant amount of variability compared to other variable stars because their number of PS1 epochs is small (∼3) and their magnitudes in different detections are not significantly different as they have likely been multiply observed at a relatively close phase.In this final step, the efficiency significantly increases to ∼77% (195/255) and the completeness drops to ∼52% (195/374). This step greatly increases our efficiency level as it gets rid of a large fraction of non-variable stars with colors close to the colors of RRL stars (i.e., main-sequence stars with colors close to the colors of RRL stars).
Figure 3.

Figure 3. (ug) vs. (gr) colors of the 636 pre-identified RRL stars used to find the GMM color selection boundary (plotted in green) are shown with black open circles. Stars that fall inside this boundary (blue dots) have RRL-like colors and are retained for further analysis. Stars falling outside the GMM boundary are plotted as red dots and are considered contaminant stars.

Standard image High-resolution image
Figure 4.

Figure 4. Same as Figure 3, but showing a (gr) vs. (ri) SDSS color–color diagram.

Standard image High-resolution image
Figure 5.

Figure 5. gP1 vs. ($\sigma _{g_{P1}}$ + $\sigma _{r_{P1}}$) of a small sample of the stars that passed the SDSS GMM color selection cuts. The green line shows the variability boundary computed by GMM using the 636 pre-identified RRL stars. Stars falling above the variability boundary are retained for further analysis.

Standard image High-resolution image

4.2. Applying Regular Rectangular Cuts

We apply the regularly used color and variability rectangular cuts (Sesar et al. 2010) to the stars in Stripe 82 and compare the results using this technique with the results we achieved using the GMM technique to test whether the latter technique improves the recovery of RRL stars.

The first step in this technique is similar to Step (1) from the previous section where the number of RRL candidates is ∼74,000 stars of which 289 are known RRL stars. This step requires the SDSS rectangular color cuts, the magnitude cut, and the PS1 threshold limit of the number of detections in both, the gP1 and rP1 filters. The efficiency and completeness levels are then ∼0.39% (289/74, 000) and 77.3% (289/374), respectively.

Since we are not using the GMM technique in this section, we directly apply straight-line variability cuts in the PS1 filters. Stars with ($\sigma _{g_{P1}}+\sigma _{r_{P1}}\,{\ge}\,0.22$) that have passed the previous cuts in this section are retained for further analysis. The 636 pre-identified RRL stars were once again used to set the latter cut as more than 90% of these stars have $\sigma _{g_{P1}}+\sigma _{r_{P1}}\,{\ge}\,0.22$. Just like in Step (3), an additional cut ($\sigma _{i_{P1}}+\sigma _{z_{P1}}+\sigma _{y_{P1}}\,{\ge}\,0.1$) is applied for the retained stars with $N_{i_{P1}}\,{\ge}\,3$, $N_{z_{P1}}\,{\ge}\,3$, and $N_{y_{P1}}\,{\ge}\,3$. Retained stars that do not have more than two good detections in the iP1, zP1, and yP1 filters are still considered RRL candidates.

There are ∼1600 stars that passed all of our cuts in this section of which 205 are known RRL stars. This yields efficiency and completeness levels of ∼13% (205/1600) and ∼54% (205/374), respectively.

The dependencies of the efficiency (dashed lines) and completeness (solid lines) levels in each step resulting from the GMM and rectangular cut techniques are plotted with red and blue lines in Figure 6, respectively. Although there was no significant change in the completeness level when using the rectangular cuts compared to the GMM technique, the efficiency level increased from ∼13% (205/1600, using rectangular cuts) to 77% ((195/255), using the GMM technique). Hence, we favor using the GMM technique in future studies.

Figure 6.

Figure 6. Comparison between the efficiency (dashed lines) and completeness (solid lines) levels on each step resulting from the GMM (in red) and rectangular (in blue) cuts techniques. The dependence of our GMM completeness and efficiency levels on Step (1): the magnitude, initial color, and number of detection cuts (Equations (1)–(7)), Step (2): the SDSS GMM color boundary cuts, and Step (3): the GMM variability boundary cut are also shown.

Standard image High-resolution image

4.3. Contaminant Stars

To understand the nature of the contaminant stars, we look for multi-epoch data in the CRTS database for the 60 contaminant stars we found in Stripe 82. Fifty six out of the 60 contaminant stars are found in the CRTS database and have been observed between ∼40 and 500 times.

Almost 40% of these stars showed no variability using the multi-epoch data from CRTS, which makes them non-variable stars that have passed our variability cuts. These stars were observed only three to four times with PS1 and have magnitudes close to our bright (gP1 ∼ 14.0 mag) and faint (gP1 ∼ 20.0 mag) magnitude cuts. Hence, it is not surprising that some non-variable sources passed our variability cuts as their variability statistics are based on a small number of observations where a single noisy epoch can bias the statistics and make a non-variable source appear as a variable one, and vice versa.

The remaining 60% of the contaminant stars in Stripe 82 appeared as non-RRL variable stars using the CRTS database. Their variability statistics reflected a change in their brightness over time but the shape of their light curve showed that most of them are W Ursae Majoris (W UMa), Algol binaries, δ Scuti, and SX Phe stars (Palaversa et al. 2013). Samples of the phased light curves for Algol binaries (P = 0.6684 day) and δ Scuti candidate (P = 0.11367021 day) stars that are contaminating our RRL stars are shown in panels (a) and (b) of Figure 7, respectively. We were able to recover the correct type and periods of these stars using the CRTS multi-epoch data. Using the current PS1 data available, there is no way of getting rid of all the contaminants.

Figure 7.

Figure 7. Phased light curves of (a): Stars in Algol binary systems ($\tt ID_{CRTS}$ = 18720940); (b): δ Scuti candidate ($\tt ID_{CRTS}$ = 1109081021295).

Standard image High-resolution image

With the ∼77% (195/255) efficiency level computed in the previous section, we know that ∼23% of our RRL candidates are non-RRL stars (mainly non-variable stars and stars in eclipsing systems). However, we show in Section 6.1 that we are still able to detect halo substructures with such a contamination level. Additionally, the efficiency and completeness levels will be improved when more PS1 epochs are available in the near future. Our method can be useful in detecting RRL stars in surveys other than the PS1 where the number of detections per star is also small. Our efficiency and completeness levels as a function of gP1 magnitudes are plotted in blue and red lines in Figure 8, respectively. The decrease in the efficiency and completeness levels as a function of magnitude reflects the increase in contamination for fainter stars.

Figure 8.

Figure 8. Decrease in the efficiency (blue dashed line) and completeness (red solid line) levels as a function of magnitude reflects the increase in contamination for fainter stars.

Standard image High-resolution image

5. RRL CANDIDATES

Knowing that our efficiency and completeness levels are 77% (195/255) and ∼52% (195/374), respectively, we apply our method to the whole SDSS×PS1 overlapping footprint.

In the mentioned area, around 130,000 stars passed the first four initial SDSS color cuts (Equations (1)–(4)), the PS1 magnitude cut (Equation (5)), and the minimum number of PS1 epoch cuts (Equations (6) and (7)). These stars have also passed the two GMM selection boundaries in the SDSS colors defined and applied in Step (2) of Section 4.1.

Finally, we apply the GMM variability selection cut from Step (3) of Section 4.1 to these 130,000 stars. To illustrate this, we plot the gP1 versus ($\sigma _{g_{P1}}$ + $\sigma _{r_{P1}}$) distribution for the sample of stars (spanning ∼100 deg2 of the sky) that passed our GMM color boundaries in the upper panel of Figure 9. Stars falling below our GMM variability boundary (green line) are plotted as blue dots and are considered non-variable stars. Stars passing the boundary are plotted as magenta dots and are considered RRL candidates. The lower panel of Figure 9 illustrates the distribution of the same stars, but showing a $\sigma _{g_{P1}}$ versus $\sigma _{r_{P1}}$ plot.

Figure 9.

Figure 9. Upper panel illustrates how we apply our GMM variability selection boundary (green line) cut to distinguish variable (magenta dots) from non-variable (blue dots) stars in a gP1 vs. ($\sigma _{g_{P1}}$ + $\sigma _{r_{P1}}$) plot. Stars falling above our GMM variability boundary are considered to be RRL candidates. The lower panel shows the distribution of $\sigma _{g_{P1}}$ vs. $ \sigma _{r_{P1}}$ of the same stars plotted in the upper panel.

Standard image High-resolution image

An additional variability cut was applied to all of our RRL candidates with $N_{i_{P1}}\,{\ge}\, 3$, $N_{z_{P1}}\,{\ge}\, 3$, and $N_{y_{P1}}\,{\ge}\, 3$. These stars must pass the iP1, zP1, and yP1 variability cut defined in Equation (8) ($\sigma _{i_{P1}}+\sigma _{z_{P1}}+\sigma _{y_{P1}}\,{\ge}\, 0.1$). Stars that passed all of our previous cuts and that do not have more than two good detections in the iP1, zP1, and yP1 filters are still considered RRL candidates.

Only 6% of the 130,000 stars passed these variability cuts which leaves us with 8115 RRL candidates. Based on the analysis in Section 4.3, we believe that ∼23% of these RRL candidates are non-RRL stars (mainly non-variable stars and stars in eclipsing systems).

6. DISTANCES OF RRL STARS

One of the advantages of RRL stars is their well defined mean absolute 〈V〉 magnitude which makes it straightforward to find estimates for their distances.

Ivezić et al. (2008) calculated the mean halo metallically and obtained [Fe/H] = −1.5 dex with rms[Fe/H] ∼ 0.32 dex. The mean halo metallically of [Fe/H] ∼ −1.5 dex has been also used and confirmed (e.g., Vivas & Zinn 2006; Sesar et al. 2010; Zinn et al. 2014) in different studies including Kollmeier et al.'s (2013) recent study of RRc stars by statistical parallax.

Thus, we adopt RRL star mean halo metallicity of −1.5 dex and use Equation (9) (Cacciari & Clementini 2003) to calculate the mean absolute magnitude of RRL stars:

Equation (9)

Adopting [Fe/H] = −1.5 ± 0.32 dex introduces ${\rm rms}_{M_v}$ of ∼0.1 mag. The 〈V〉 magnitudes are calculated using Equation (10), which was adopted from Ivezić et al. (2005):

Equation (10)

where the g and r SDSS measurements have been corrected for interstellar reddening (Schlegel et al. 1998; Schlafly & Finkbeiner 2011). Equation (10) corrects biases that come from the single SDSS epochs for RRL stars that were taken at unknown phases and computes 〈V〉 with rmsV ∼ 0.12 mag (Ivezić et al. 2008).

Finally, using Equation (11), the heliocentric distance (dh, in parsecs), is determined with a ∼7% fractional error after taking all the mentioned sources of uncertainties into account:

Equation (11)

Our 8115 RRL candidates have dh in the ∼3–70 kpc distance range.

6.1. Halo Structure

Using the 255 RRL candidates we detected in Stripe 82, we look for halo substructures in our covered distance range. We plot the number density distribution of these 255 RRL candidates in Figure 10. This plot includes our 60 contaminant stars in Stripe 82 (if we assume that Sesar et al.'s (2010) catalog of RRL stars is ≳ 99% complete).

Figure 10.

Figure 10. Number density distribution of the RRL stars in Stripe 82 is plotted with scaled density levels that are accentuated by the white contours. The Hercules-Aquila cloud appears at R.A. ∼ −40° and dh between ∼8 and ∼24 kpc while the Sagittarius dSph tidal stream is detected at R.A. ∼ 30° and dh ∼ 23 kpc. Negative values of R.A. were used for better visualization only (R.A. = R.A. + 360° when R.A. < 0°).

Standard image High-resolution image

The density of the points that is accentuated by the white contours is shown in scaled density levels. The smoothed surface regions with a high number of stars are indicated in red while regions with low number of stars are indicated in dark blue. We recover the Hercules-Aquila cloud (Belokurov et al. 2007) at R.A.6 ∼−40° and dh between ∼8 and ∼24 kpc. The trailing arm of the Sagittarius dwarf spheroidal's (dSph) tidal stream (Majewski et al. 2003; Law & Majewski 2010) is also recovered at R.A. ∼ 30° and dh ∼ 23 kpc.

Both of our recovered substructures were seen using the ∼99% complete and efficient catalog of RRL stars in Stripe 82 (Sesar et al. 2010). Although our method is not as efficient and complete as the mentioned catalog, Figure 10 proves that the efficiency and completeness levels we achieved are good enough to select RRL candidates to trace stellar streams and substructures in spite of the inclusion of contaminant stars. Stripe 82 was visited ∼80 times by the SDSS, which made it relatively easy to find its RRL stars using light curve analysis (Sesar et al. 2010). In contrast, it was more difficult to find RRL stars in our study using only the SDSS colors and PS1 variability because of the small number of multi-epoch data available from PS1.

Nevertheless, we recovered ∼52% of the RRL stars (dh < 70 kpc) not only in the Stripe 82 region, but in the whole SDSS×PS1 overlapping footprint. A detailed analysis of the distribution of the identified RRL candidates will be presented in a future paper.

Having additional PS1 epochs will improve the quality of our variability statistics which will improve the separation between variable and non-variable stars. Using the CRTS data, we showed in Section 4.3 that 40% of our contaminant stars are non-variable sources and have small number of PS1 epochs. We expect to get rid of at least 60% of these non-variable contaminant stars when more PS1 epochs (∼15 epochs in all filters) are available. However, it will not be possible to get rid of all of the contaminant stars as the number of PS1 epochs will not be sufficient to distinguish RRL from non-RRL variable stars using light curve and period analysis. Furthermore, having additional PS1 epochs will improve our completeness level as we missed many RRL stars due to the PS1 threshold limit of the number of detections in both the gP1 and rP1 filters (Equations (6) and (7)). We expect the efficiency and completeness level to increase to at least ∼83% and ∼65% when all of the PS1 epochs are available. Having additional epochs will also allow us to study stars in the halo that are further than 70 kpc (dh).

7. SUMMARY

In this study, we combine data from two different sky surveys (SDSS and PS1) to look for RRL candidates in the halo. We select the RRL candidates using SDSS color cuts and PS1 variability cuts. We show that using a GMM method to define GMM boundary cuts optimizes the efficiency and completeness levels to select RRL stars (or other type of variable stars) when light curve analyses are not available.

We start by adopting initial color cuts for RRL stars from Sesar et al. (2010). In order to optimize the selection of our RRL candidates, we use 636 pre-identified RRL stars from CRTS and LINEAR to define GMM color selection boundaries in the SDSS (ug) versus (gr) and (gr) versus (ri) color–color diagrams in addition to a GMM variability boundary cut for the (gP1 versus $\sigma _{g_{P1}}$ + $\sigma _{r_{P1}}$) diagram. We applied another variability cut in the iP1, zP1, and yP1 filters from PS1. A comparison between our efficiency and completeness levels using the GMM method to the efficiency and completeness levels using rectangular cuts that are commonly used yielded a significant increase in the efficiency level from ∼13% (205/1600) to ∼77% (195/255) and an insignificant change in the completeness levels. Hence, we favor using the GMM technique in future studies.

We used the multi-epoch data from the CRTS database to study the properties of our contaminant stars found in Stripe 82. Around 40% of the contaminant stars showed no sign of variability in the CRTS data. Because these stars have between ∼40 and 500 CRTS epochs compared to ∼8 epochs in gP1 and rP1, we favor the CRTS variability statistics and consider that these stars are contaminating our RRL candidates sample. Noisy detections, poor seeing, and non photometric conditions in the PS1 filters are the reasons for why these stars appeared to be variables in the latter photometric system. Although the remaining 60% of the contaminant stars in Stripe 82 showed variability using the CRTS data, their variability properties indicate that most of them are W UMa, Algol binaries, δ Scuti, and SX Phe stars.

Having achieved our best efficiency (77%) and completeness (52%) levels, we apply our selection criteria and cuts to the whole SDSS×PS1 overlapping footprint. Our technique yielded the detection of 8115 RRL candidates. From the analysis in Section 4.3, we believe that ∼23% of our RRL candidates are non-RRL stars (mainly non-variable stars and stars in eclipsing systems). Since light curve analysis is not possible in our study, we believe that achieving such a high efficiency and small contamination level reflects the success of our method. With the current PS1 data available, there is no way of getting rid of the contaminants. But it is plausible to assume that getting the remaining PS1 epochs yet to be observed (∼3 epochs per filter) would eliminate more contaminant stars and recover more RRL stars. Our method can be applied to data from any multi-band survey where the number of multi-epoch data is small.

We obtain distance estimates for our RRL stars to test if we are still able to detect halo stellar streams and substructures with our efficiency and completeness levels. Although ∼23% of the 255 RRL candidates in Stripe 82 are not true RRL stars and although we missed ∼50% of the known RRL stars within the magnitude range considered here, we were still able to recover the Hercules-Aquila cloud and the arm of the Sagittarius dSph tidal stream (see Figure 10). This proves that our method is good enough to detect some of the halo substructures and stellar streams in the halo.

The technique developed in this paper can be adopted to optimize the selection of a specific type of variable stars when light curve analyses are not possible while the technique developed in Paper I can be adopted when a large number of repeated observations are available. We used both techniques to find RRL stars in the halo that we will use in a forthcoming paper to present a more detailed map of halo substructure.

We thank the referee for comments and constructive suggestions that helped to improve the manuscript. We thank B. Sesar for helpful discussion that improved the quality of this paper. M.A., E.K.G., and N.F.M acknowledge support by the Collaborative Research Center "The Milky Way System" (SFB 881, subproject A3) of the German Research Foundation (DFG). N.F.M. gratefully acknowledges the CNRS for support through PICS project PICS06183.

The Pan-STARRS1 Surveys (PS1) have been made possible through contributions of the Institute for Astronomy, the University of Hawaii, the Pan-STARRS Project Office, the Max-Planck Society and its participating institutes, the Max Planck Institute for Astronomy, Heidelberg and the Max Planck Institute for Extraterrestrial Physics, Garching, The Johns Hopkins University, Durham University, the University of Edinburgh, Queen's University Belfast, the Harvard-Smithsonian Center for Astrophysics, the Las Cumbres Observatory Global Telescope Network Incorporated, the National Central University of Taiwan, the Space Telescope Science Institute, the National Aeronautics and Space Administration under grant No. NNX08AR22G issued through the Planetary Science Division of the NASA Science Mission Directorate, the National Science Foundation under grant No. AST-1238877, the University of Maryland, and Eotvos Lorand University (ELTE).

Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web site is http://www.sdss.org/. The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington.

The CRTS is supported by the U.S. National Science Foundation under grants AST-0909182 and CNS-0540369. The work at Caltech was supported in part by the NASA Fermi grant 08-FERMI08-0025 and by the Ajax Foundation. The CSS survey is funded by the National Aeronautics and Space Administration under grant No. NNG05GF22G issued through the Science Mission Directorate Near-Earth Objects Observations Program.

Footnotes

  • Add 360° to obtain the correct values of R.A. when R.A. < 0°. Negative values of R.A. were used for better visualization only.

Please wait… references are loading.
10.1088/0004-6256/148/1/8