ABSTRACT
Using a nonparametric function estimation methodology, we present a comparative analysis of the Wilkinson Microwave Anisotropy Probe (WMAP) 1-, 3-, 5-, and 7-year data releases for the cosmic microwave background (CMB) angular power spectrum with respect to the following key questions. (1) How well is the power spectrum determined by the data alone? (2) How well is the ΛCDM model supported by a model-independent, data-driven analysis? (3) What are the realistic uncertainties on peak/dip locations and heights? Our results show that the height of the power spectrum is well determined by data alone for multipole l approximately less than 546 (1-year), 667 (3-year), 804 (5-year), and 842 (7-year data). We show that parametric fits based on the ΛCDM model are remarkably close to our nonparametric fits in l-regions where data are sufficiently precise. In contrast, the power spectrum for an HΛCDM model is progressively pushed away from our nonparametric fit as data quality improves with successive data realizations, suggesting incompatibility of this particular cosmological model with respect to the WMAP data sets. We present uncertainties on peak/dip locations and heights at the 95% (2σ) level of confidence and show how these uncertainties translate into hyperbolic "bands" on the acoustic scale (lA) and peak shift (ϕm) parameters. Based on the confidence set for the 7-year data, we argue that the low-l upturn in the CMB power spectrum cannot be ruled out at any confidence level in excess of about 10% (≈0.12σ). Additional outcomes of this work are a numerical formulation for minimization of a noise-weighted risk function subject to monotonicity constraints, a prescription for obtaining nonparametric fits that are closer to cosmological expectations on smoothness, and a method for sampling cosmologically meaningful power spectrum variations from the confidence set of a nonparametric fit.
1. INTRODUCTION
The angular power spectrum of cosmic microwave background (CMB) temperature fluctuations is a measurable physical quantity that depends sensitively on the physics of the early universe. In particular, the shape of the angular power spectrum and the locations and heights of its peaks relate directly to parameters in the underlying cosmological models. As such, it has been used extensively as an acid test of the relative merit of competing cosmological models and as a rich source of information about cosmological parameters themselves.
Traditionally, and almost exclusively, cosmologists have resorted to model-based parametric statistical methods for estimating the CMB angular power spectrum from data. Parametric regression methods require the functional form of the unknown regression function f to be pre-specified.
The adjustable parameters in f, finite in number that is independent of the data size, are usually estimated by maximizing an appropriate likelihood function or a posterior distribution. In the cosmological context, it is conventional to employ parametric models that attempt to capture the essential physics of the problem via the pre-specified functional form, and any pre-existing knowledge about parameters is incorporated in the estimation process via appropriate prior distributions.
Nonparametric function estimation methods, on the other hand, assume no specific functional form for f, except for mild regularity conditions such as smoothness assumptions, membership to a function space, etc. In this approach, the number of parameters that define the unknown regression function f is either infinite or grows proportionally with the data size, and the estimate of f is obtained by balancing bias and variance of via optimal smoothing. Nonparametric methods are therefore model-independent and are based on fewer and less restrictive assumptions about f. This, in turn, implies that any inferences about f made from nonparametric regression analyses tend to be more data-driven as opposed to being primarily model-driven. In other words, to a greater extent nonparametric function estimation methods tend to infer what is rather than what should be. As such, nonparametric regression methods can be meaningfully employed as sanity-enforcing mechanisms on parametric analyses, thereby making the conclusions drawn more conservative. For example, a feature seen in a parametric fit that survives in a nonparametric analysis is likely to be a real and robust feature of the data itself, and not merely an artifact that is seen because a parametric model expects it to be there.
Alternative methodologies, such as the nonparametric methodology (Genovese et al. 2004; Bryan et al. 2007) used in this work, may allow posing inferential questions that are difficult to address using conventional methods. For example, this particular nonparametric methodology allows validating model-based, parametric fits against the confidence set for the nonparametric fit to the same data, possibly to rule them out as candidates for the true but unknown regression function. This methodology also has the formal advantage of being able to provide realistic uncertainties on any number of features of the angular power spectrum that are simultaneously valid at the same level of confidence. Such desirable features are arguably lacking in most mainstream methodologies, including Bayesian ones, that are commonly used in cosmology. (An incisive, insightful, and discerning discussion about the relative merits of this methodology over statistical methods conventionally employed in cosmology can be found in the two references cited above and is best read in the original.)
The four CMB angular power spectrum data sets (Hinshaw et al. 2003b, 2007; Nolta et al. 2009; Larson et al. 2011) released by the Wilkinson Microwave Anisotropy Probe (WMAP; Bennett et al. 2003) mission, representing cumulative observations at the end of 1, 3, 5, and 7 years of operation, present a unique opportunity for statistical analysis. For example, till date these are claimed to represent the most precise and extensive full-sky CMB measurements ever made (only to be superseded by the Planck mission (Tauber et al. 2010)). The four data sets represent the evolution of data over a period of about a decade, thereby making it possible to assess progressive and possibly systematic resolution of features of the spectrum. From a statistical perspective, each of these moderately large data sets (minimum of 899 data points for the WMAP 1-year release) is not only heteroskedastic, but also has substantial correlations that arise in typical data pipelines (Tegmark 1997; Hinshaw et al. 2003a, 2009; Jarosik et al. 2007; Jarosik et al. 2011).
In this paper, we present a comparative nonparametric analysis of the WMAP 1-, 3-, 5-, and 7-year data sets for the angular power spectrum. Specifically, for each data realization, we address the following three key questions. (1) How well is the angular power spectrum determined by the data alone? (2) How well is the ΛCDM model supported by a model-independent, nonparametric, data-driven analysis? (3) What are the realistic uncertainties on peak/dip locations and heights? Our analysis is based on a nonparametric function estimation methodology (Genovese et al. 2004; Bryan et al. 2007), which is discussed in Section 2 together with our extensions; i.e., a numerical formulation for minimization of a inverse-noise-weighted risk function subject to monotonicity constraints, a prescription for obtaining nonparametric fits that are closer to cosmological expectations on smoothness, and a method for sampling cosmologically meaningful power spectrum variations from the confidence set of a nonparametric fit. Results are presented in Section 3, and conclusion in Section 4.
2. METHODOLOGY
The nonparametric function estimation methodology (Genovese et al. 2004; Bryan et al. 2007) used in this work is an extension of the REACT methodology (Beran 2000a, 2000b), which is in turn founded on rigorous formal results in Beran & Dümbgen (1998) and Beran (1996).
Two early papers (The Pittsburgh Institute for Computational Astrostatistics 2003; Miller et al. 2002) used this methodology to analyze, under the assumption of homoskedasticity, a pre-WMAP data set that combined BOOMERanG, MAXIMA, and DASI data sets. A generalization of this formalism for dealing with heteroskedasticity via an inverse-noise-weighted loss function was developed in Genovese et al. (2004). More recently, using the WMAP 1-year data, Bryan et al. (2007) illustrated how confidence intervals on cosmological parameters and boundaries in the cosmological parameter space can be inferred from the confidence set for a nonparametric power spectrum fit.
In this section, we first present an operational outline of this methodology (Sections 2.1 and 2.2). This outline is entirely based on Genovese et al. (2004) and Bryan et al. (2007) and is included here for completeness. A pedagogic treatment of the central ideas and a simpler variant of the problem can be found in Wasserman (2006). Specific citations to other sources are provided wherever appropriate.
Our own numerical formulation of the monotone risk minimization problem, where the risk function is derived from an inverse-noise-weighted loss function, is presented in Section 2.3. In Section 2.4, we describe a systematic way of obtaining a monotone fit with smoothness that meets cosmological expectations. In Section 2.5, we describe our method for probing the confidence set; this is the basis for the results presented in Sections 3.1 and 3.3.
2.1. The Nonparametric Fit
We are given CMB angular power spectrum data of the form
consisting of N data points observed over multipole index range lmin ⩽ l ⩽ lmax. Here, Cl stands for the value of the true but unknown power spectrum at multipole index l. The noise variables l are assumed to have a mean-0 normal distribution with known covariance matrix Σ. In practice, any reasonable estimate/approximation of this covariance matrix, such as an inverse Fisher matrix for a pilot parametric fit, can be used in place of Σ.
This nonparametric regression method is based on expanding the unknown regression function f, assumed to belong to an appropriate L2 function space, in a complete orthonormal basis {ϕj(x)}, as
A basis that has proven useful in the CMB angular power spectrum context is the cosine basis defined over 0 ⩽ x ⩽ 1:
Assuming that f is sufficiently smooth, we take
and estimate it as
While the method is asymptotically (i.e., as the data size N → ∞) basis-independent, choice of the basis may matter in any finite-N application; see Beran (2000a, 2000b) for a detailed discussion. This basis satisfies a discrete orthogonality property when the data Yi are available over an equispaced grid {xi = (2i + 1)/2N, 0 ⩽ i ⩽ N − 1} consisting of zeros of ϕN(x). In the CMB context, any contiguous range of N integer multipole indices lmin ⩽ l ⩽ lmax can be formally mapped onto this equispaced grid, hence we will not make any categorical distinction between data index i and the corresponding multipole index l.
The true angular power spectrum Cl ≡ f(xl) is estimated as via coefficient estimates , which are estimated as
where
Here, U is the orthonormal matrix with elements and Y ≡ (Y0, ..., YN − 1)T.
The task of obtaining coefficient estimates , and thereby the fit , is now relegated to determining the shrinkage parameters λj. Assuming smoothness for f (which implies a rapid decay of the true coefficients βj with j), the shrinkage parameters λj are constrained to be monotonically decreasing
A special, discrete subset of the monotone shrinkage defined above is the nested subset selection (NSS) shrinkage, defined as
Either shrinkage type results in selective damping of high-frequency harmonics in the data Yi, which results in smoothing of the fit . A useful interpretation of shrinkage parameter λj is that it represents the effective degree of freedom for the jth coefficient estimate . One can thus define the effective degrees of freedom (EDoF) for the entire fit as
This definition follows from the fact that for a linear smoother, EDoF of a fit is formally defined as , where is the matrix that connects the fitted values to the data Y as . For the present nonparametric regression method, , where U is the orthonormal basis matrix (Equation (5)), and D ≡ diag(λ0, ..., λN − 1). This implies .
In the present formalism, the discrepancy between the (unknown) regression function f and its estimator is measured by the inverse-noise-weighted squared loss function L(λ), defined as
Here, σ2(x) is the (known) variance of the data Y at x, which accounts for the heteroskedasticity of the data Y. The loss L is considered a function of the vector of shrinkage parameters λ ≡ (λ0, ..., λN − 1)T that determine the regression estimator via Equation (4). Risk R(λ), which is the expected value of L(λ), can be written as a sum of two non-negative terms; namely,
These two terms represent, respectively, the integrated squared bias and the integrated variance of , both weighted by 1/σ2(x). Optimal smoothing is achieved, in principle, by minimizing R(λ) with respect to λ. Generally speaking, too little smoothing leads to a fit with low bias and high variance, and too much smoothing yields a fit with high bias and low variance. Minimal risk or optimal smoothing therefore can be thought of as a balance between the bias of and its variance.
The risk function R(λ), unfortunately, depends on the unknown regression function f, and therefore needs to be estimated. A particular estimator of this risk, which is of the SURE (Stein's unbiased risk estimator; Stein 1981) kind, takes the following form:
where Z ≡ (Z0, ..., ZN − 1)T, D ≡ diag(λ0, ..., λN − 1), , B = UTΣU/N is the covariance of Z, and IN is the N × N identity matrix. The positive (semi)definite weight matrix W is defined as
where wl is the lth coefficient in the expansion (1/σ2(x)) ≈ ∑N − 1j = 0wjϕj(x) and, for the cosine basis (Equation (2)),
We denote the optimal shrinkage obtained by minimizing risk by . The best NSS shrinkage is obtained simply by evaluating risk for each of the (N + 1) NSS shrinkage vectors and choosing the one with the least risk. Monotone shrinkage usually results in a lower risk than the NSS shrinkage because of the greater freedom available in the monotone set of shrinkages. We will discuss risk minimization subject to monotonicity constraints (Equation (6)) in Section 2.3.
Figure 1 illustrates the contrasting behavior of the nonparametric risk (red curve) and the WMAP 1-year likelihood function (green curve; Verde et al. 2003) for the WMAP 1-year data (Hinshaw et al. 2003b), as a function of the EDoF of all NSS fits. This figure is motivated by the fact that cosmologists, by and large, are better-acquainted with parametric likelihood-based methods. Each integer value on the horizontal axis represents one NSS fit, from the zero function at EDoF = 0 to the fit that simply interpolates through the data (EDoF = N). Optimal smoothing for NSS shrinkage occurs at EDoF = 12 where the nonparametric risk function attains its minimum over the NSS set of fits. The likelihood function, on the other hand, keeps on improving with the EDoF indefinitely.
Figure 1. Nonparametric risk (red curve) and −2log (likelihood) (blue curve) as functions of the EDoF (Equation (8)) for the NSS fits to the WMAP 1-year data. This illustrates the contrasting behavior of the two quantities. Optimal smoothing occurs at EDoF = 12 where the nonparametric risk attains its minimum over the NSS set of fits. The likelihood function, on the other hand, keeps on improving with the EDoF indefinitely. log (likelihood) values were computed using the WMAP 1-year likelihood code (Verde et al. 2003). The blue (left) and red (right) vertical scales on the plot are associated with the nonparametric risk and the −2log (likelihood), respectively.
Download figure:
Standard image High-resolution image2.2. Confidence Set Around the Fit
Conventional regression methods provide a confidence band around the fit that quantifies the uncertainty in the fit. In contrast, this nonparametric methodology quantifies the uncertainty surrounding the nonparametric fit in the form of an elegant construct, namely, a (1 − α) confidence set at a pre-specified confidence level 0 ⩽ (1 − α) ⩽ 1. The (1 − α) confidence set for the coefficient vector β is defined as
which is centered at the vector of estimated coefficients , and the confidence radius rα is given by
Here, zα is the upper α quantile of the standard normal distribution, and
with Q = 4(ABA + WDBDW − 2ABDW) and A = DW + WD − W. In practice, the risk estimator (Equation (9)) and/or (Equation (13)) may turn out to be negative for particular data/covariance matrix realizations. In such cases, the squared confidence radius (Equation (12)) may be negative (or may not be 0 for α = 0). For minimization purposes, the risk estimator (Equation (9)) is adequate and appropriate (Beran & Dümbgen 1998). For confidence radius purposes, we suggest the following modifications to avoid the negativity problem:
At worst, this adjustment will make the confidence radius bigger, resulting in, e.g., wider confidence intervals, but more conservative inferences. Similar modifications have been suggested in Beran & Dümbgen (1998), Beran (2000a), and Genovese et al. (2004).
The corresponding confidence set on the true regression function f is given by
The quadratic form of the inverse noise-weighted loss function and the fact that the weight matrix W is positive (semi)definite implies that both confidence sets and are ellipsoidal in shape. For any functional T of the spectrum f, such as location or height of a peak or a dip, the (1 − α) confidence interval is defined as
Moreover, prior information that the true regression function f belongs to a subset of the confidence set (e.g., f has k peaks over the range of x-values represented in the data) can be incorporated in the analysis by replacing with . This methodology further provides the formal assurance that, asymptotically,
- 1.() will contain the true spectrum f (true coefficient vector β) with probability ⩾(1 − α), and
- 2.confidence intervals on any number of functionals T(f), computed from the confidence set , will be simultaneously valid at the same confidence level (1 − α), and that these will trap their corresponding true but unknown values with probability ⩾(1 − α).
2.3. Risk Minimization Subject to Monotonicity Constraints
In this section, we show how the risk function (Equation (9)) can be minimized subject to the monotonicity constraints 1 ⩾ λ0 ⩾ λ1 ⩾ ... ⩾ λN − 1 ⩾ 0. The risk function corresponding to the unweighted loss function (W = IN) has a simple weighted-sum-of-squares form, and can be minimized exactly and efficiently using the pooled adjacent violators (PAV) algorithm (Robertson et al. 1988). While the risk function corresponding to the inverse-noise-weighted loss function (W ≠ IN) is still quadratic in λ, it can no longer be expressed as a weighted sum-of-squares and the PAV algorithm cannot be used to minimize it.
It can be shown that, disregarding terms that do not depend on λ, the risk function (Equation (9)) can be written as
where Hjk = 2zjzkWjk, h = (H − V)(1, 1, ..., 1)T, and Vjk = 2WjkBkj. H and V are both manifestly symmetric. Positive (semi)definiteness of W implies that H is a positive (semi)definite matrix, implying that is a convex function. The system of linear inequality constraints 1 ⩾ λ0 ⩾ λ1 ⩾ ... ⩾ λN − 1 ⩾ 0 implies that the constrained region (Figure 2) has a convex trianguloidal shape determined by flat surfaces. The original risk minimization problem can therefore be formulated as the following equivalent convex quadratic minimization problem:
where C is the (N − 1) × N matrix
An additional linear inequality or equality constraint of the form
where q constrains the EDoF of the fit (0 < q ⩽ N), can easily be accommodated in this formulation. This reformulation of the monotone risk minimization problem makes it possible to use standard minimization methods (Pshenichny & Danilin 1978; Powell 1985; Goldfarb & Idnani 1983) and computational tools (Schittkowski 2007; Vanderbei 1999) for convex quadratic minimization problems subject to linear constraints and simple bounds.
Figure 2. Trianguloid-shaped admissible regions, marked by red lines, for the monotonicity constraint 1 ⩾ λ0 ⩾ λ1 ⩾ ... ⩾ λN − 1 ⩾ 0 for N = 2 (left) and 3 (right). The (N + 1) vertices of the trianguloid correspond to the (N + 1) NSS fits, with the origin corresponding to the zero function, and the vertex (1, 1, ..., 1) corresponding to the function that exactly interpolates through the data. Surfaces with constant value p of the EDoF (Equation (8)) are hyperplanes of the form ∑N − 1i = 0λi = p.
Download figure:
Standard image High-resolution image2.4. Obtaining a Smoother Monotone Fit That is Closer to Cosmological Expectations
To motivate the discussion in this section, consider the full-freedom monotone fit to the WMAP data sets obtained as above (green curves in Figures 3 and 4). By full-freedom monotone fit, we mean the fit that minimizes risk over the entire monotone-admissible region (Figure 2) without any restriction on the EDoF of the fit. This fit turns out to be quite wiggly especially at the high-l end because of the high noise variance here. Such wiggliness implies the presence of high-frequency components in the fit, which in turn implies a large number of EDoF in the fit.
Figure 3. Nonparametric fits for WMAP 1- (top) and 3-year (bottom) data sets. x- and y-ranges are identical across plots. Green: full-freedom monotone fit (EDoF ≈ 80.2, 76.5, respectively); blue: NSS fit (EDoF = 12, 10, respectively); red: restricted-freedom monotone fit (EDoF ≈ 9.4, 9.5, respectively); solid gray: best ΛCDM-based parametric fit; dashed gray: power spectrum for an HΛCDM model. See Table 1 for details of model-based power spectra.
Download figure:
Standard image High-resolution imageFigure 4. Nonparametric fits for WMAP 5- (top) and 7-year (bottom) data sets. x- and y-ranges are identical across plots. Green: full-freedom monotone fit (EDoF ≈ 60.4, 102.9, respectively); blue: NSS fit (EDoF = 13, 20, respectively); red: restricted-freedom monotone fit (EDoF ≈ 14.4, 14.1, respectively); solid gray: best ΛCDM-based parametric fit; dashed gray: power spectrum for an HΛCDM model. See Table 1 for details of model-based power spectra.
Download figure:
Standard image High-resolution imageWithout cosmological pre-conditioning (i.e., from a completely agnostic and data-driven perspective) and when viewed in relation to the data, it is clear that this fit is not at all unreasonable, given the high noise levels at the high-l end. However, all cosmological models suggest far smoother shapes for the angular power spectrum. In the context of the present methodology, one candidate for a smoother fit is the NSS fit (i.e., one that has the minimal risk over the set of (N + 1) NSS fits). Indeed, this possibility has been exploited, e.g., in Genovese et al. (2004). However, the NSS fit (see the blue curve in Figures 3 and 4) may also turn out to be somewhat unsatisfactory with respect to cosmological expectations (and, some times, also with respect to trends reflected in the full-freedom monotone fit). This is primarily because of the limited freedom available in the NSS class. Note again that the NSS fit is not entirely unreasonable from an agnostic viewpoint.
The monotone set, on the other hand, offers the possibility of harnessing local minima in the risk function that are constrained to lie in appropriate "smoother" subsets of the full monotone set. This may be achieved in two distinct ways that may be combined for greater effect.
- 1.
- 2.By truncating the expansion (Equation (3)) to p number of coefficients (p < N) and then performing monotone risk minimization over this subset of the full monotone set.
In practice, such a smoother restricted-freedom monotone fit can be obtained by gradually reducing the value of q (Equation (18)) starting from the EDoF of the NSS fit until all low-amplitude, high-frequency wiggles in the fit disappear. Generally, the resulting fit has a lower risk than the NSS fit with EDoF = q and is manifestly consistent with trends captured by the full-freedom monotone fit. We find it useful to present (or consider) all three fits (NSS, full-freedom monotone, and restricted-freedom smoother monotone) together; this helps build a realistic picture about estimated trends in the data and thereby about the shape of the underlying true spectrum. Like the NSS fit, the smoother restricted-freedom monotone fit will generally be more biased than the full-freedom monotone fit. This greater bias, however, is partially compensated for by a larger risk which results in a larger confidence radius value (Equation (12)), a larger confidence set, and therefore more conservative inferences.
2.5. Probing the Confidence Set for Uncertainties on Features of the Fit
In this paper, we need to probe the confidence set for a fit for two purposes: (1) for validating cosmological models against a nonparametric fit (see Section 3.2) and (2) for finding the uncertainties on specific features of the fit such as peak heights and locations, and below we describe our method to scan and sample the confidence set for the latter. Our particular method for probing the confidence set for determining uncertainties on features of the fit is based on the following observations.
- 1.The confidence set on the vector of coefficients (β0, ..., βN − 1), by construction, is centered at the vector of estimates .
- 2.The confidence interval defined in Equation (16) requires locating extreme variations in any functional T of the power spectrum f; e.g., location or height of a peak. The largest possible variations in T will be located as far away from the center of the confidence set as possible, i.e., on its surface.
- 3.Cosmologically meaningful and sufficiently smooth variations in the fitted spectrum are most likely to be located in the projection of the full confidence set onto the lowest M dimensions.
We therefore generate a uniform sample from the projection of the full confidence set surface onto the lowest d dimensions, where 2 ⩽ d ⩽ M, with M ≲ 23. For convenience, we use the smoother restricted-freedom monotone fit for this purpose, with the justification that the confidence set corresponding to the full-freedom monotone fit happens to be nested inside that for this fit, for all four data realizations. The Cholesky factorization W = uTu is used to transform the original confidence ellipsoid (whose principal axes may not be aligned with coordinate directions in the β-space) into a sphere of the form , where ψ = uβ and . The surface of this ψ-sphere can be efficiently sampled with uniform density using a standard algorithm (Section 3.4.1.E.6 on p.130 of Knuth 1981), and then transformed back into β-space, preserving uniformity of density because of the linearity of the transformation. From a sufficiently large sample of such variations of the power spectrum, we further selected those functions for which successive peaks and dips are separated by at least 50 multipole moments l. This cosmologically motivated selection criterion ensures that (1) the sampled functions are sufficiently smooth, and (2) high-frequency wiggles are not counted as peaks or dips when estimating uncertainties on locations and heights of peaks and dips (see Section 3.3). Based on cosmological considerations, we restricted the search to functions with three peaks (WMAP 1-, 3-, and 5-year data) or four peaks (7-year data). The set of functions thus sampled is used to estimate uncertainties on specific features of the fit. As an aside, we note that the confidence set construct and the formal guarantees related to confidence intervals (Equation (14)) do not necessarily imply a uniform density over the confidence set; uniform sampling is used here as a convenient computational device for scanning the confidence set surface in an unbiased fashion.
3. RESULTS AND DISCUSSION
The four WMAP angular power spectrum data sets used in this work, ΛCDM parametric fits for the CMB angular power spectrum, and likelihood codes that produce their respective Fisher (inverse covariance) matrices are obtained from the WMAP data archive http://lambda.gsfc.nasa.gov/. For all four data realizations, it turned out that (1) the weight matrix W (Equation (10)) is numerically positive definite, and (2) the confidence set for the full-freedom monotone fit is completely nested inside that for the smoother restricted-freedom monotone fit. It is worth noting that our nonparametric fits and confidence sets are not too sensitive to the details of the covariance matrix (this was also pointed out in Bryan et al. 2007). Most computations were done using the R statistical computing environment (R Development Core Team 2010). We used the QL codes (Schittkowski 2007) for the monotone risk minimization problem (Equation (17)). Our nonparametric fits, obtained using the method outlined in Section 2.3, are shown in Figures 3 and 4 for the WMAP 1-, 3-, 5-, and 7-year data sets.
3.1. How Well is the Power Spectrum Determined by Data Alone?
Considering that the noise in all four data sets is highly heteroskedastic and noise levels are especially high for large l, it would be useful to make an assessment of how such noise in the data affects local uncertainties in the fitted spectrum and to quantify how well the angular power spectrum value at each l is determined by the data.
To this end, we compute, for each data set, the approximate 95% confidence interval on each using 5000 function variations from the confidence set as outlined in Section 2.5. The length of this vertical confidence interval at given l, divided by the absolute value of the fit, , provides an approximate indication of how well each is determined via the following interpretation: a value ≪ 1 indicates that the fit is well determined by the data, whereas values ≳ 1 imply that the data contain little or no information about the height of the power spectrum. This approach, which is inspired by the boxcar probe approach of Genovese et al. (2004), has the practical advantage of not having adjustable parameters (i.e., the boxcar width) in the procedure.
In Figure 5, we plot this height, scaled by the value of the fit, as a function of the multipole index l, for all four data realizations. We see that the range of l-values over which the fit is well determined expands consistently between 1-, 3-, and 5-year data realizations, from l ≈ 546 (1-year), to l ≈ 667 (3-year), to l ≈ 804 (5-year). On the other hand, while the l-range of the data expanded substantially between WMAP 5 and 7, the information contained in the data does not appear to have grown proportionately beyond l ≈ 842 for the 7-year data.
Figure 5. Results of a probe of the confidence sets for the WMAP 1- (black), 3- (blue), 5- (red), and 7-year (green) nonparametric restricted-freedom monotone fits (Figures 3 and 4) to determine how well the angular power spectrum is determined by the data alone. The quantity plotted for each data realization is the total vertical variation at each l within the respective 95% (2σ) confidence set, divided by the absolute value of the fit. This quantity is an approximate measure of how well the angular power spectrum is determined by the data: values ≪1 indicate that the fit is tightly determined by the data, whereas values ≳ 1 indicate that the data contain little or no information about the height of the angular power spectrum for that l. Disregarding the low-l region, the color-coded vertical lines indicate the approximate l-value at which each curve rises above 1.
Download figure:
Standard image High-resolution image3.2. How Well is the ΛCDM Model Supported by a Model-independent, Nonparametric, Data-driven Analysis?
In each of the four plots in Figures 3 and 4, we have also included parametric fits based on the ΛCDM (Hinshaw et al. 2003b, 2007; Nolta et al. 2009; Larson et al. 2011) and HΛCDM (Primack et al. 1995; Primack & Gross 2001) models (see the figure caption for details). The parametric ΛCDM-based fits turn out to be quite close to the respective nonparametric fits wherever the data are precise. This is remarkable considering that our nonparametric method does not rely on any cosmologically motivated prior information whatsoever. Moreover, the parametric ΛCDM-based fit appears to get closer to the respective nonparametric fit across the four WMAP data releases. The closeness of a parametric fit to the corresponding nonparametric fit can be measured through the distance
Using Equation (12), this distance can be further expressed as the smallest confidence level α beyond which the parametric fit is rejected as a candidate for the true spectrum.
Table 1 lists distances of parametric fits based on the two cosmological models (ΛCDM and HΛCDM) from the nonparametric full-freedom monotone fit for the corresponding data realization (see the caption for specific details and description). The progression of distance values between a parametric ΛCDM fit and the corresponding nonparametric fit clearly shows that the two are getting closer as the data become precise. In contrast to this, the angular power spectrum generated by the particular HΛCDM model considered, which is almost as strong a contender for the true power spectrum as the ΛCDM fit with respect to the 1-year confidence set, is progressively pushed away to the boundary of the 95% confidence set for the 7-year confidence set. Visually, this trend can be understood on the basis of the differences between the HΛCDM power spectrum and the nonparametric fit (e.g., differences in the heights of the first peak; see Figures 3 and 4) that result in pushing this particular HΛCDM model out of the confidence set. Given the formal guarantees of this methodology (Section 2.2), the WMAP 7-year data thus rules out, at ≈95% confidence level, the particular HΛCDM model considered.
Table 1. Distances of Two Model-based Power Spectra from Our Nonparametric Fits
Data | ΛCDM | HΛCDM |
---|---|---|
1-year | 0.1540 | 0.1493 |
rα = 0.3679 | 16.70% | 15.87% |
3-year | 0.1462 | 0.2057 |
rα = 0.3653 | 14.52% | 29.03% |
5-year | 0.1419 | 0.3552 |
rα = 0.3563 | 19.66% | 94.82% |
7-year | 0.1238 | 0.3550 |
rα = 0.3551 | 9.08% | 94.98% |
Notes. These are the distances of two model-based (ΛCDM and HΛCDM) power spectra from our nonparametric (full-freedom monotone) fits. The HΛCDM model (Primack et al. 1995; Primack & Gross 2001) considered here for illustrative purposes is defined by a small neutrino fraction (Ωνh2 = 0.00275) with corresponding adjustment to the dark energy content (), and the rest of the parameters (including zero curvature) being identical to that of the best ΛCDM model (Larson et al. 2011) for the 7-year data (power spectrum generated using the CAMB software; Lewis et al. 2000). ΛCDM-based fits used are the best parametric fits for the corresponding data realization (Hinshaw et al. 2003b, 2007; Nolta et al. 2009; Larson et al. 2011). rα is the confidence radius at α = 0.05 (i.e., 95% confidence level ≡ 2σ). Percentages reported are the confidence levels corresponding to these distances; these can be interpreted as asymptotic probabilities with which the corresponding parametric fit is ruled out as a candidate for the true but unknown spectrum. Note the dramatic progression, as the data become precise, of (1) how this HΛCDM model is pushed to the boundary of the confidence set, and (2) how the ΛCDM model gets closer to the nonparametric fit.
Download table as: ASCIITypeset image
3.3. Uncertainties on Locations and Heights of Peaks and Dips
We now consider the problem of determining uncertainties on the locations and heights of peaks and dips in the nonparametrically fitted spectrum. The motivation for this exercise comes from the fact that peak locations and heights contain valuable information about cosmological models and parameters (Doran & Lilley 2002; Durrera et al. 2003).
Following the prescription outlined in Section 2.5, we sampled a set of 5000 function variations from the confidence set for each data realization. Peak and dip locations and heights were recorded for each peak and dip over this set of functions. This results in an empirical scatterplot that is indicative of the joint distribution of location and height for each peak or dip, under the assumption of uniform surface density on the confidence set .
Figure 6 shows the results of probing the 95% confidence sets for uncertainties on peaks and dips, as outlined above, with 5000 acceptable function variations for each data realization. The box around a peak or a dip represents the largest horizontal and vertical variations in the scatter. In accordance with the confidence interval defined in Equation (16), these form the 95% confidence intervals on the location and height of a peak/dip. Table 2 lists these confidence intervals together with 95% confidence intervals on peak height ratios.
Figure 6. Nonparametric uncertainties on peak and dip locations and heights for the WMAP 1- (top left), 3- (top right), 5- (bottom left), and 7-year (bottom right) data sets. The nonparametric fit displayed for reference is the restricted-freedom monotone fit in Figures 3 and 4. The number of acceptable function variations sampled from the confidence set for each data realization is 5000.
Download figure:
Standard image High-resolution imageTable 2. 95% Confidence Intervals on Several Features of the Angular Power Spectrum
Data | Peak Location | Peak Height | Dip Location | Dip Height | Peak Location Ratio | Peak Height Ratio |
---|---|---|---|---|---|---|
1-year | ⋅⋅⋅ | ⋅⋅⋅ | ||||
3-year | ⋅⋅⋅ | ⋅⋅⋅ | ||||
5-year | ⋅⋅⋅ | ⋅⋅⋅ | ||||
7-year | ⋅⋅⋅ | ⋅⋅⋅ | ||||
Notes. Negative values for heights and height ratios are a reflection of the high noise at the high-l end. Here, lm (hm) stands for the location (height) of the mth peak, and () denotes the location (depth) of the mth dip.
Download table as: ASCIITypeset image
The following features of these results are worth pointing out. As is well known, the first peak was very clearly resolved in the 1-year data itself. Our results are manifestly consistent with this observation, in the sense that its box does not overlap with any other box. However, the uncertainty on the first peak does not shrink appreciably across the four data realizations. Further, our results clearly indicate that the second peak is resolved cleanly only in the 5-year data, whereas the third and fourth peaks are not resolved completely even in the 7-year data.
3.4. Uncertainties on the Acoustic Scale (lA) and Peak Shift (ϕm) Parameters
Consider the following relationship (Hu et al. 2001; Doran & Lilley 2002) between the location lm of the mth peak, the acoustic scale lA, and the shift parameter ϕm:
Substituting the end-points of the 95% confidence interval for the mth peak location, this relationship results in a hyperbolic band of allowed values in the lA–ϕm plane. Such bands, derived from 95% confidence intervals on the first three peaks (Table 2), are shown in Figure 7. Additional information from other sources is required to constrain these bands to physically meaningful regions in the lA–ϕm plane. For example, if we assume lA = 300 (Page et al. 2003) then, based on the 7-year data, the 95% confidence intervals for ϕm will be ϕ1: (0.1600, 0.3767), ϕ2: (0.0367, 0.3600), ϕ3: (− 0.2167, 0.7300). Conversely, additional constraints on ϕm could be used to generate a confidence interval on lA. From a model-independent point of view, we note that the (lA, ϕm) bands for different peaks m appear to overlap around ϕm ≈ 0 and 200 ≲ lA ≲ 400. We interpret this as a nonparametric realization of the nearly harmonic structure of peaks in the CMB power spectrum.
Figure 7. Confidence "bands" for the acoustic scale lA and the shift ϕm for the mth peak, as derived from the 95% confidence intervals on the first three peak locations (Table 2) and Equation (19). Blue: ϕ1, red: ϕ2, and green: ϕ3. Top left: WMAP 1-year, top right: 3-year, bottom left: 5-year, and bottom right: 7-year data sets. Note that these (lA, ϕm) bands for different peaks m appear to overlap around ϕm ≈ 0 and 200 ≲ lA ≲ 400: We interpret this as a nonparametric realization of the nearly harmonic structure of peaks in the CMB power spectrum.
Download figure:
Standard image High-resolution image3.5. The Low-l Upturn from a Nonparametric Viewpoint
Another interesting feature in Figure 6 is the tiny but clearly observable scatter for the very first dip at the low-l end. This scatter corresponds to extreme power spectrum variations that reside on the surface of the 95% confidence set and have an upturn at low l values. In the ΛCDM cosmology, such upturn at the low-l end is primarily the result of the integrated Sachs–Wolfe (ISW) effect and is seen in all parametric ΛCDM fits in Figures 3 and 4. It would therefore be interesting to see what could be said about the low-l upturn (and thereby about the ISW effect) based on the nonparametric confidence set.
Note that our nonparametric fits, which are at the center of their respective confidence sets, do not show a low-l upturn. However, the 7-year parametric ΛCDM fit, e.g., does show a clear upturn at the low-l end. This parametric fit is at a distance corresponding to a confidence level of about 10% (≈0.12σ; see Table 1) from our 7-year nonparametric full-freedom monotone fit. This means that the confidence set for the 7-year nonparametric fit contains spectra with a low-l upturn at most as far away as the 7-year parametric fit. We therefore conclude conservatively that the low-l upturn as a feature of the CMB angular power spectrum cannot be ruled out at any confidence level in excess of about 10% Actually, there are indications in our results (not shown) that such upturned variations of the spectrum may be much closer to the center of the confidence set for the 7-year full-freedom monotone fit; this needs further investigation.
4. CONCLUSION
In this paper, we have presented a comparative nonparametric analysis of the WMAP 1-, 3-, 5-, and 7-year data releases for the CMB angular power spectrum, using a nonparametric function estimation methodology (Genovese et al. 2004; Bryan et al. 2007). In the context of this methodology, we have also presented our own numerical formulation for minimization of the inverse-noise-weighted risk function subject to monotonicity constraints, and a prescription for obtaining monotone nonparametric fits that are closer to cosmological expectations on smoothness. For all data realizations, we have presented results pertaining to the following questions. (1) How well is the angular power spectrum determined by the data alone? (2) How well is the ΛCDM model supported by a model-independent, nonparametric, data-driven analysis? (3) What are the realistic uncertainties on peak/dip locations and heights?
The motivation for the analysis presented here was to explore what could be inferred about the CMB angular power spectrum in a model-independent, data-driven manner. On the other hand, the basic physics of the CMB is quite well established. It would therefore be useful to connect a nonparametric/model-independent analysis such as ours with the known physics of the CMB angular power spectrum. This is reserved for the future.
To conclude, we have demonstrated in this paper the threefold utility of the nonparametric methodology used here for cosmological function estimation problems: as a method with sound formal guarantees, as a sanity-enforcing mechanism on parametric model-based analyses, and as a method that allows interesting inferential questions to be addressed and answered in a data-driven manner.
M.A. is deeply indebted to Christopher R. Genovese and Larry Wasserman for many enlightening discussions covering all of statistics. Our R codes for computing the nonparametric fit are based on original codes by Christopher R. Genovese. T.S. acknowledges support from the DST Swarnajayanti Fellowship. Insightful questions and suggestions from an anonymous referee helped improve the overall presentation aspects of the paper and, specifically, the method used in Section 3.1.