THE APPLICATION OF CONTINUOUS WAVELET TRANSFORM BASED FOREGROUND SUBTRACTION METHOD IN 21 cm SKY SURVEYS

Junhua Gu; Haiguang Xu; Jingying Wang; Tao An; Wen Chen

doi:10.1088/0004-637X/773/1/38

1. INTRODUCTION

In the end of the dark ages, with the formation of the first generation of stars and/or quasars, the neutral hydrogen in the universe began to reionize. The redshifted 21 cm signal is one of the most important signatures for detecting the epoch of reionization (EoR), which has been a new frontier of astrophysics in recent years. The 21 cm line emission/absorption from the EoR has been redshifted to meter wave band. Several facilities that cover the meter wave band have been or will be built to explore EoR through the redshifted 21 cm signal, such as the 21 Centimeter Array (21CMA⁵), the Low Frequency Array (LOFAR⁶), the Murchison Wide-field Array,⁷ and the Square Kilometre Array (SKA⁸). The predicted brightness temperature of the EoR 21 cm signal is a few ×10⁻² K (e.g., Bowman et al. 2008; Pritchard & Loeb 2008; Furlanetto et al. 2004), while that of the foreground emission from the Milky Way and extragalactic radio sources reaches 10² K and even higher (e.g., Gleser et al. 2008; Wang et al. 2006), i.e., brighter than the EoR 21 cm signal by four orders of magnitude. In order to detect the EoR 21 cm signal, the biggest challenge that one must overcome is how to subtract the strong foreground emission.

A dozen foreground subtraction methods have been proposed. A typical one is the polynomial fitting method (e.g., Wang et al. 2006), the principle of which is rather straightforward. The spectrum of the total signal is fitted with a low-order (e.g., second or third) polynomial in logarithmic space, and the residual is regarded to be the sum of the 21 cm signal and the instrumental noise. This kind of method can be applied either in uv space or real space. Some derivative methods have been released, for example, Wang et al. (2013). Another important sort of foreground subtraction method is the non-parametric method (e.g., Harker et al. 2009; Chapman et al. 2013). Unlike the fitting-based method, the non-parametric method does not assume the detailed form (usually a polynomial-like form in logarithmic space) of the foreground spectrum.

Both kinds of methods assume that the foreground spectrum is smooth in frequency space or, more generally, the emissions of different frequencies are strongly correlated, while the EoR 21 cm signal is full of saw-tooth-like structures. In other words, the characteristic scales of the foreground and EoR signals are different in their radio spectrum space. The power spectral density of the foreground signal is mainly contributed by large-scale components in radio frequency space, while that of the EoR signal is mainly contributed by small-scale components (e.g., ≲ 1 MHz at z = 8, corresponding to ≲ 8 comoving Mpc; Mesinger & Furlanetto 2007). This difference can be used to distinguish between them. For general one-dimensional signals like time series and spectrum, mature mathematical tools have been invented, and the wavelet transform is one among them.

In this work, we study the feasibility of using continuous wavelet transform (CWT; e.g., Daubechies 1992) to subtract the strong foreground emission in radio spectrum space. We describe the simulation of the foreground, 21 cm, and thermal noise signals in Section 2, give a brief introduction to the CWT that we use in Section 3, test the foreground subtraction with simulated signals in Section 4, discuss our results in Section 5, and conclude our work in Section 6. Throughout the paper, we adopt H₀ = 100 h km s⁻¹ Mpc⁻¹, where h = 0.71, Ω_M = 0.27, Ω_Λ = 0.73, and Ω_b = 0.044 (e.g., Spergel et al. 2003).

2. SIMULATION OF THE LOW-FREQUENCY RADIO SPECTRA

2.1. Redshifted 21 cm Signal from the Epoch of Reionization

According to Wang et al. (2006) and references therein, we simulate the redshifted 21 cm signal from the EoR based on the theoretical three-dimensional power spectrum, which is represented as

$\begin{eqnarray} &&P_{\rm 3D, 21 \,cm}(k,z)=(16\,{\rm mK})^2\frac{1}{h^2}\left(\frac{\Omega _{\rm b}h^2}{0.02}\right)^2\frac{1+z}{10}\frac{0.3}{\Omega _{\rm M}}\nonumber \\ &&\quad\times \left\lbrace \left[1-x_{\rm e}^2(z)\right]^2+b^2(z)e^{-k^2R^2(z)}x_{\rm e}^2(z)\right\rbrace P_{\rm 3D, matter}(k,z),\nonumber\\ \end{eqnarray} \tag{ 1 }$

where P_{3D, matter} is the three-dimensional matter power spectrum at redshift z, x_e(z) is the average ionization fraction, b(z) is the mean halo bias, and R(z) = 100[1 − x_e(z)]^−1/3 kpc is the mean radius of the ionized patches in H ii regions. We use the b(z) and x_e(z) values calculated from the fiducial reionization model of Santos et al. (2003, 2005). In this work, we consider a redshift interval Δz ≪ 1, so that the power spectrum can be regarded as uniform. In the narrow redshift range, we are able to calculate the one-dimensional power spectrum by the formula⁹

$\begin{equation} P_{\rm 1D, 21 \,cm}(k,z)=\frac{1}{2\pi }\int _{k}^{\infty }P_{\rm 3D, 21 \,cm}(k^\prime ,z)k^\prime d k^\prime , \end{equation} \tag{ 2 }$

according to Peacock (1999). The line-of-sight distribution of the redshifted 21 cm line brightness temperature, i.e., the radio spectrum in an interval (ν₀ − Δν/2, ν₀ + Δν/2) (corresponding to a redshift interval (z₀ − Δz/2, z₀ + Δz/2), where z = 1420.4 MHz/ν − 1) can be represented as the sum of a series of Fourier bases as

$\begin{equation} T_{\rm 21 \,cm}(x_n)=\sum _{q=0}^{N-1}\left[ A_q\cos \left(\frac{2\pi q}{L}x_n\right)+B_q\sin \left(\frac{2\pi q}{L}x_n\right)\right], \end{equation} \tag{ 3 }$

where L is the comoving space scale corresponding to Δz, A_q and B_q are random variables that independently follow an identical normal distribution $\mathcal {N}(0,\sqrt{2P_{\rm 1D, 21 \,cm}(k,z_0)/L})$ , k = 2πq/L is the wave number, x_n (n = 1, 2, ..., N) is the line-of-sight location related to frequency ν_n under linear approximation, and N is the number of frequency channels involved. N is chosen to be large enough so that most power of the power spectrum P_{1D, 21 cm}(k, z) is enclosed within the wave number interval 2π/L ⩽ k ⩽ 2πN/L. In this work, we take z₀ = 8 as an example, and let Δν = 20 MHz, N = 500, so that the frequency resolution is dν = Δν/N = 40 kHz, which is a realistic value that can be achieved by most existing and upcoming EoR probing facilities.

2.2. Foreground Emission

Compared with relatively high frequency radio observations (⩾1.4 GHz), low-frequency radio observations are far from mature. Detailed properties of radio emission from radio sources are still not clear. It is not appropriate to place too strong assumptions over the foreground emission. Nevertheless, the smoothness of the foreground radio spectra is widely accepted (e.g., Shaver et al. 1999; Petrovic & Oh 2011). We also accept this assumption in the present work. To study the effect of our proposed method, we decide to test two kinds of foregrounds (Figure 1). One is based on high-frequency and relatively limited low-frequency observations of radio sources, which has a set of analytic formulae deduced from clear physical mechanisms. Another foreground model that we use is composed purely mathematically, which is used to test the tolerance of the foreground subtraction method. The two kinds of foregrounds are described in detail as follows.

**Figure 1.** Spectra of the FG Is and FG IIs with different conjunction bandwidth ν_w. The solid box in (a) is enlarged to be shown in (b).
Download figure:
Standard image High-resolution image

For the first kind of foreground, we simulate spectra following our previous work (Wang et al. 2010, and references therein), which includes the radio emission from the Milky Way, galaxy clusters, and discrete extragalactic radio sources (i.e., star-forming galaxies, radio-quiet AGNs, and radio-loud AGNs). We call it "FG I" in the following sections.

For the second kind of foreground, we assume the spectrum to possess two different power-law indices (α₁ = 1 and α₂ = −2.7) in lower and higher frequency bands, respectively. The spectrum turnover feature in this foreground model originates from the absorption in many sources (Godambe et al. 2008; Edwards & Tingay 2004; Tschager et al. 2003; Dallacasa et al. 2000; Di Matteo et al. 1999; An & Baan 2012). Several absorption mechanisms have been proposed, and their detailed spectra are different. Nevertheless, we just compose a pure mathematical model to represent the general feature when absorption happens. Note that in the simulation of FG I, absorption mechanisms have been considered for different kinds of sources; we are only making the feature of absorption more obvious to make the foreground more complex for the test purpose. The slope changes smoothly around the turnover frequency ν_t = 157 MHz (ν_t is randomly chosen here only for the test purpose), and the sharpness of the turnover point is described by a width parameter ν_w. The spectrum is described by following equations as

$\begin{equation} T_{\rm b}(\nu)=T_{\rm b}(\nu _t)\left(\frac{\nu }{\nu _t}\right)^{(1-w)\alpha _1+w\alpha _2}, \end{equation} \tag{ 4 }$

where

$\begin{eqnarray} &&w=\frac{2\arctan [(\nu -k\nu _t)/\nu _w]/\pi +1}{2}, \end{eqnarray} \tag{ 5 }$

$\begin{eqnarray} &&k=1-\frac{\nu _w}{\nu _t}\tan \frac{\pi (\alpha _1+\alpha _2)}{2(\alpha _1-\alpha _2)}, \end{eqnarray} \tag{ 6 }$

and T_b(ν_t) is the brightness temperature at the turnover frequency. The above equation ensures that the spectrum reaches a peak at the turnover frequency. We call it "FG II" in the following sections.

2.3. Instrumental Noise

Instrumental noise is crucial for the detection of the EoR 21 cm signal. When the instrumental noise is significantly lower than the 21 cm signal, the detailed form of the 21 cm signal can be derived, and only statistical information can be obtained otherwise. We will test the method with two different noise levels; one is higher than the 21 cm signal, and another one is lower. We use 60 mK as the higher noise level like we have used in our previous work (Wang et al. 2013), which is based on the parameters of the 21CMA. Other working facilities such as LOFAR can also reach this noise level. For example, according to the parameters used in Harker et al. (2010) and Wang et al. (2006), when the core area of LOFAR is configured with a channel bandwidth of several 10¹ kHz, and the observing time reaches month level, its noise level will reach several 10¹ mK. The lower noise level is taken to be 6 mK, which is calculated according to the future SKA core region parameters as follows.

According to Thompson et al. (2001), the brightness temperature measurement error of an interferometer is calculated as

$\begin{eqnarray} \Delta T_{\rm b}&=&\frac{\lambda ^2 T_{\rm sys}}{A_{\rm e}}\frac{1}{\Omega _{\rm beam}\sqrt{d\nu \tau n(n-1)}}\nonumber \\ &\approx& \frac{\lambda ^2T_{\rm sys}}{nA_{\rm e}}\frac{1}{\Omega _{\rm beam}\sqrt{d\nu \tau }}, \end{eqnarray} \tag{ 7 }$

where λ is the wavelength, T_sys is the system temperature, A_e is the effective area of one antenna, Ω_beam is the solid angle of the synthesized beam, dν is the channel bandwidth, τ is the observing time, and n is the number of antennas. According to the design of SKA, the fraction of the total collecting area (n_totalA_e) within the central 1–2 km is η = 30%. Here, we assume the baseline length to be L = 1.5 km, so that Ω_beam ≈ (π/4)(λ²/L²) = 1.40 × 10⁻⁶ sr. We rewrite Equation (7) as

$\begin{equation} \Delta T_{\rm b}\approx \frac{\lambda ^2T_{\rm sys}}{\eta n_{\rm total}A_{\rm e}}\frac{1}{\Omega _{\rm beam}\sqrt{d\nu \tau }}. \end{equation} \tag{ 8 }$

For the SKA, the parameter (n_totalA_e)/T_sys = 5000 m² K⁻¹. Finally, we have

$\begin{eqnarray} &&\Delta T_{\rm b}\approx \frac{4L^2}{\pi }\frac{T_{\rm sys}}{n_{\rm total}A_{\rm e}}\frac{1}{\eta \sqrt{d\nu \tau }}\qquad\qquad\ \ \end{eqnarray} \tag{ 9 }$

$\begin{eqnarray} &&\quad\ \ \ =6 \,{\rm mK}\left(\frac{40\,{\rm kHz}}{d\nu }\right)^{1/2}\left(\frac{30\,{\rm days}}{\tau }\right)^{1/2}. \end{eqnarray} \tag{ 10 }$

Given a channel bandwidth of 40 kHz and a total observing time of 1 month, the brightness temperature noise level can reach a few mK. In the following analysis, we use 6 mK as the lower noise level.

3. CONTINUOUS WAVELET TRANSFORM

We make a brief introduction in the context of our problems, according to the detailed review by Daubechies (1992). The spectra from different sorts of sources (i.e., continuous spectra mainly from synchrotron emission and line emission/absorption from the neutral hydrogen in different epochs of the universe) possess different characters. Continuous spectra are smooth in radio frequency space, while line emission/absorption spectra are full of saw-tooth-like structures. One can quantify the above difference and separate them in frequency space.

3.1. From Short-time Fourier Transform to Continuous Wavelet Transform

For stationary signal, one possible method to quantify the above difference is Fourier transform. Continuous spectra are composed of more low-frequency components, while saw-tooth-like line emission/absorption spectra are composed of more high-frequency components. So in principle, the smooth spectra and the saw-tooth-like spectra can be separated with a pair of low-pass and high-pass filters. However, the assumption that the spectra are stationary signals is not safe enough.

To handle non-stationary signal, the real-space resolution should be kept. The short-time Fourier transform (STFT) is a modification of traditional Fourier transform, which partly keeps the real-space resolution. The STFT of a real-space signal h(t) can be defined as

$\begin{equation} (\mathcal {F}h)(\tau ,\omega)=\int _{-\infty }^{\infty }h(t)w(t-\tau)e^{-i\omega t}dt, \end{equation} \tag{ 11 }$

where w(t) is the window function, which should meet the normalization requirement

$\begin{equation} \int _{-\infty }^{\infty }w(\tau)d\tau =1. \end{equation} \tag{ 12 }$

As a commonly used window function, the Gaussian window function is usually defined as

$\begin{equation} w_{g}(x)=\frac{1}{\sqrt{2\pi }s}e^{-\frac{x^2}{2s^2}}, \end{equation} \tag{ 13 }$

where the parameter s determines the real-space resolution. By using the Gaussian window, the STFT becomes

$\begin{equation} (\mathcal {F}h)(\tau ,\omega)=\int _{-\infty }^{\infty }h(t) \frac{1}{\sqrt{2\pi }s}e^{-\frac{(t-\tau)^2}{2s^2}} e^{-i\omega t}dt. \end{equation} \tag{ 14 }$

Although the STFT partly keeps the real-space resolution, it is not adaptively determined by the frequency ω, i.e., a global and fixed parameter s, which represents that the width of the window function is used for all frequencies.

The CWT can overcome the above difficulty. According to, e.g., Torrence & Compo (1998), the one-dimensional CWT is defined as

$\begin{equation} W_{x,\psi}(\tau ,s)=\int _{-\infty }^{\infty }h(t)\frac{1}{\sqrt{|s|}}\psi ^*\left(\frac{t-\tau }{s}\right)dt, \end{equation} \tag{ 15 }$

where h(t) is the real-space signal to be transformed, ψ is called the mother wavelet function, and τ and s represent the real space and scale indices of the wavelet coefficient W_{x, ψ}(τ, s), respectively. The mother wavelet function $\psi \in L^2(\mathbb {R})$ (quadratically integrable function) should meet the requirement

$\begin{equation} 0<C_{\psi }\equiv \int _{-\infty }^{\infty }\frac{|\Psi (\omega)|}{|\omega |}d\omega <\infty , \end{equation} \tag{ 16 }$

where Ψ is the Fourier transform of ψ, and C_ψ is called the admissibility constant. According to Equation (15), given a certain scale s, the wavelet transform is actually the cross-correlation between $\psi _{s}(t)=\psi (t/s)/\sqrt{|s|}$ and the real-space signal h(t), so that according to the cross-correlation theorem, it can be calculated efficiently in Fourier space as

$\begin{equation} W_{x,\psi }(\tau ,s)=\mathcal {F}^{-1}\lbrace \Psi _s^*\cdot H\rbrace , \end{equation} \tag{ 17 }$

where Ψ_s and H are the functions ψ_s and h in Fourier space. In practice, the Fourier transform can be calculated with any fast Fourier transform package discretely. In this work, we implement the transform with the FFTW3 package (Frigo & Johnson 2005).

The most commonly used mother wavelet functions include Morlet (Morlet et al. 1982), Paul (e.g., Torrence & Compo 1998), DOG (e.g., Torrence & Compo 1998), etc. We prefer the Morlet mother wavelet function, which is defined as

$\begin{equation} \psi (x)=\pi ^{-1/4}e^{2\pi if_0 x}e^{-x^2/2}, \end{equation} \tag{ 18 }$

where f₀ is the frequency parameter, for the following reasons. First, according to the comparison among a variety of wavelet functions in Torrence & Compo (1998), the Morlet mother wavelet function has the advantage that it can obtain better frequency resolution (not to be confused with the radio frequency, here the term "frequency" represents the scale of the signal component), but poorer real-space resolution (in our context, it is radio frequency resolution). In this work, we are more interested in the separation between smooth continuum emission and saw-tooth-like line emission according to their different characteristic scales; in other words, the frequency resolution (i.e., the scale resolution) is relatively more important to us. Second, the form of the Morlet mother wavelet function can be obviously regarded as the product of a Gaussian window function and the Fourier base, so that the Morlet wavelet transform can be treated as an STFT that has a frequency-dependent window function. In other words, the Morlet wavelet transform can be smoothly introduced by slightly modifying the Gaussian window STFT.

The f₀ parameter is chosen to be 1 channel⁻¹ (i.e., 25 MHz⁻¹) in this work to match the radio frequency resolution. We plot the Morlet mother wavelet function that we use here in Figure 2. According to Daubechies (1992), the inverse CWT is defined as

$\begin{equation} h(t)=\frac{2}{C_{\psi }}\int _0^\infty \left[\int _{-\infty }^{\infty }W_{x,\psi }(\tau ,s)\frac{1}{\sqrt{|s|}}\psi \left(\frac{t-\tau }{s}\right)d\tau \right]\frac{ds}{s^2}. \end{equation} \tag{ 19 }$

Given a certain scale s, the inner part of the double integration is actually a convolution between W_{x, ψ} and ψ_s, so that can also be calculated efficiently in Fourier space, just like what we have done in Equation (17), as

$\begin{equation} h(t)=\frac{2}{C_{\psi }}\int _0^{\infty }\mathcal {F}^{-1}\lbrace \mathcal {F}\lbrace W_{x,\psi }\rbrace \cdot \Psi _{s}\rbrace (t,s)\frac{ds}{s^2}. \end{equation} \tag{ 20 }$

After the signal is filtered in the wavelet coefficient space, we will use this equation to transform it back to real space.

**Figure 2.** Morlet mother wavelet function with f₀ = 1 channel⁻¹.
Download figure:
Standard image High-resolution image

3.2. Boundary Effects

According to the definition of CWT (Equation (15)), the input signal is assumed to be infinite in real space, which however does not hold in practice. In other words, when trying to use CWT to subtract the foreground in radio frequency space, the radio bandwidth is not infinitely broad. There are several methods to extend a finite signal to an infinite one to meet the definition of CWT. Filling zeros, period extension, and symmetric extension are the most common ones. We have tested all three extension methods above and find no significant difference among them. We provide a further discussion about handling the boundary effect in Section 5.6. Although general extension methods will introduce discontinuity and may contaminate the transformed signal, it will not significantly affect the final results for the following reason. Almost all working and upcoming facilities that aim to detect the 21 cm signals from the EoR have a much broader bandwidth than is required in most conditions. For example, when calculating the one-dimensional H i power spectrum of a certain redshift from the radio spectra, the adopted bandwidth is usually limited to several MHz to ensure the uniformity of the power spectrum, and we can perform the subtraction over a larger bandwidth than needed and only use the bandwidth section that is less affected by the boundary effects. So, we simply use the period extension method to handle the finite signal bandwidth.

4. SUBTRACTION OF FOREGROUND SIGNAL

The difference between the distribution of significant wavelet coefficients of the foreground and 21 cm signals can be used to distinguish between them. In the following sections, we first study the characters of the wavelet coefficients of the foreground and 21 cm signals, respectively. After that, we test the wavelet-based method of subtracting a strong foreground.

4.1. Wavelet Coefficients of Different Kinds of Sources

We first study the characters of the wavelet coefficients of the foreground that have been simulated above (Section 2.2). We show the absolute value of the wavelet coefficients of the FG Is and FG IIs with ν_w = 5, 10, and 20 MHz in Figure 3.¹⁰ The most impressive character of the coefficients of the four foregrounds is that the most significant coefficients are contributed by the boundary effect of the data. In our tests, it is hard to disentangle the boundary effect and the contribution from the foreground signal itself because both of them are more prominent on large scales. However, this is not a serious problem, since what we actually want to obtain is not the foreground signal itself.

**Figure 3.** (a)–(d) The absolute values of the wavelet coefficients of FG Is and FG IIs with ν_w = 5, 10, and 20 MHz, respectively.
Download figure:
Standard image High-resolution image

Then, we study the behavior of the 21 cm signal (Section 2.1) with the CWT. We randomly choose one realization of the simulated 21 cm signal and calculate the wavelet coefficients. The result is shown in Figure 4. Different from those of the foregrounds, the coefficients of the 21 cm signal appear to be much more prominent in small-scale regions and much less affected by the boundary effects.

**Figure 4.** Absolute values of the wavelet coefficients of one realization of the simulated EoR 21 cm signal.
Download figure:
Standard image High-resolution image

4.2. Filtering Out the Foreground Signal

As we have noted that the distributions of the significant coefficients of the foreground signal and the 21 cm signals are different (Section 4.1), we can utilize this character to filter out the foreground signals. Because the significant coefficients of the smooth foregrounds are mainly contributed by the discontinuity of the data boundary, the simplest way is to exclude the regions affected by the boundary effect. To determine the regions to be excluded, we calculate the wavelet coefficients of function

$\begin{equation} T_{d}(\nu)=\delta _c(\nu -\nu _{\min }), \end{equation} \tag{ 21 }$

as shown in Figure 5(a), where δ_c is the Dirac delta function and ν_min = 147.8 MHz is the lower limits of our test band. Strictly speaking, the step function is more suitable for representing the discontinuity near the boundary; however, in our practical condition, we choose the Dirac delta function due to its localization property. In detail, because we use the period signal extension method, it is impossible to compose such a step function with the jumps at ν_min and ν_max only. On the other hand, the Dirac delta function can be regarded as the derivative of the step function, and according to the property of Fourier transform, the difference between the Morlet wavelet transforms of Dirac delta and step functions is only a slowly varying factor, which only has a minor impact on our results. Note that since we use a period signal extension method, the discontinuity of the upper limit of the signal will also be reflected by Equation (21). For any given scale s, the absolute value of the wavelet coefficient peaks at ν_min and ν_max, i.e., the boundary of the data series. The wavelet coefficients of the T_d(ν) can be used to recognize the region that is significantly affected by the boundary effect. We can empirically define a threshold for each scale to be 10⁻² of the peak value. The regions, where the absolute value of the wavelet coefficient is above the threshold, should be marked to be excluded. Because the absolute value of the coefficient decreases exponentially as the distance from the boundary increases, the masked region is relatively insensitive to the value of the threshold. With this standard, we generate the mask for filtering out the foreground (Figure 5(b)).

4.3. Results

Multiplying the wavelet coefficients of the total signal (Figure 6) by the mask (Figure 5(b)), we derive the filtered coefficients as shown in Figure 7. Then by using Equation (19), we reconstruct the filtered 21 cm signal. To further avoid the boundary effect, we exclude the signal with ν < ν_min + 5 MHz and ν > ν_max − 5 MHz. Note that the bandwidth that is cut here is chosen empirically, considering the tradeoff between the available bandwidth and the boundary effect. Although it seems that we have wasted half of the total band, actually in real observations we can move the subtraction band, i.e., (ν_min, ν_max) continuously in the range of the total instrument band, so that most of the frequency range can be used. Our result shows that a total observation bandwidth of 20 MHz enables us to detect the H i 21 cm line emission distribution in one redshift period. Broader bandwidth should enable us to study wider redshift range.

Given the noise level calculated for the SKA core region (i.e., ΔT_b = 6 mK in Section 2.3), we test our method on the simulated foregrounds and 21 cm signals. Supposing that the noise level is significantly lower than the EoR 21 cm signal, we can obtain the spectrum or the distribution of H i along the line of sight of a certain sky region covered by a single beam. Typical reconstructed results with the above four foregrounds are shown in Figure 8.

**Figure 8.** Foreground subtraction results of the wavelet-based method with ΔT_b = 6 mK. (a)–(d) The results of using FG Is and FG IIs with ν_w = 5, 10, and 20 MHz, respectively. The black, red, and green lines show the input 21 cm signal, the output 21 cm signal, and the residual, respectively.
Download figure:
Standard image High-resolution image

If the brightness temperature noise is comparable with the EoR 21 cm signal, we can only obtain its statistical properties such as the power spectrum. It is obvious that the information we can extract from the one-dimensional power spectrum is relatively limited when compared with the three-dimensional power spectrum, but we still test the one-dimensional power spectrum here. There are two reasons. The first reason is that, in this work, we only simulate the 21 cm spectrum on each pixel, rather than a three-dimensional data cube, so that with our simulated data, we cannot calculate the three-dimensional power spectrum. Future simulations by using codes such as 21CMFAST (Mesinger et al. 2011) may enable us to perform more complete tests, which will be a part of our future work. The second reason is that the calculation of the one-dimensional power spectrum is relatively less dependent on instrument parameters and is relatively simple. To calculate the three-dimensional power spectrum, one must compose the data cube in real space and transform it into Fourier space, during which the survey strategy, especially the shape and area of the sky coverage, must be considered, while the one-dimensional power spectrum only requires measuring the radio spectrum on each interesting pixel, and does not need to consider how these pixels are distributed, so that the results should be more general. Assuming a noise level of 60 mK (Section 2.3), we calculate the one-dimensional 21 cm power spectrum by averaging the line-of-sight power spectra from 1000 beams and subtract the predicted instrumental noise power spectrum. We show the results in Figure 9. We find that there is some power leakage, which is especially severe in the small wave number end (k < 0.4 h Mpc⁻¹). This is mainly caused by the filtering strategy and can be corrected as described in the following.

**Figure 9.** Reconstructed power spectra of the EoR 21 cm signals by using the wavelet-based method, with ΔT_b = 60 mK. (a)–(d) The results of using FG Is and FG IIs with ν_w = 5, 10, and 20 MHz, respectively. The solids line and the data points are the theoretical and estimated power spectra, respectively.
Download figure:
Standard image High-resolution image

There are at least two methods to correct the power leakage, which is especially severe at the small wave number end. Both of the two methods work by multiplying a correction factor with the produced power spectrum, which is the function of wave number k. In the first method, we can feed a standard signal, the power spectrum of which is known in advance, into the foreground subtraction program, and calculate the power spectrum of the output signal, which is then compared with that of the input signal, and calculate the correction factor. This method can be named as the closed loop method. In the second method, the correction factor is calculated as the ratio of the total bandwidth of the input signal to that of the bandwidth after masked for a certain scale, i.e., the ratio of the total bandwidth to the width of the white region in Figure 5(b) at different scales. Then according to Kirby (2005), the wavelet scale s of the Morlet wavelet transform has a Fourier wave number counterpart 2πf₀/s, so that the correction factor can be converted to a function of the wave number and can be applied to correct the power spectrum. This method can be named as the open loop method. In principle, the closed loop method should be more precise since it avoids the issue of converting the wavelet scale s to the Fourier wave number k, which according to Kirby (2005) has more than one conversion standard. Nevertheless, we have tested both methods and find no significant difference between them. We show the result that is corrected with the closed loop method in Figure 10. We find that, after the correction, the power leakage has been significantly eliminated.

**Figure 10.** Same as Figure 9, but corrected for the power leakage in the small wave number end.
Download figure:
Standard image High-resolution image

5. DISCUSSION

5.1. Comparison with the Polynomial Fitting Based Method

Wang et al. (2006) proposed a polynomial fitting based method for the foreground subtraction, which can be regarded as a representative example of parametric methods. The basic idea of this method is to fit the total spectrum with a logarithmic space nth-order polynomial

$\begin{equation} \log T(\nu)=\sum _{m=0}^{n} a_m (\log \nu)^m, \end{equation} \tag{ 22 }$

where n is often chosen to be 2 or 3 and the residual is regarded as the reconstructed EoR 21 cm signal. Although in our previous work (Wang et al. 2013) we have tested both n = 2 and 3 and find that n = 2 is sufficient for FG I, we use n = 3 here to subtract more complex foregrounds. We present the reconstructed EoR 21 cm signal with ΔT_b = 6 mK in Figure 11 and the estimated power spectrum of the signal with ΔT_b = 60 mK in Figure 12.

**Figure 11.** Same as Figure 8, but the results are obtained by using the polynomial fitting based method.
Download figure:
Standard image High-resolution image

**Figure 12.** Same as Figure 9, but the results are obtained by using the polynomial fitting based method.
Download figure:
Standard image High-resolution image

For FG I, we find that both methods work and can derive consistent results. For FG II, when the aim is to reconstruct the EoR 21 cm signal in a single beam, we find that the subtraction effect of the polynomial fitting based method in Wang et al. (2006) is strongly related to ν_w, while our method appears more stable. When we only aim to estimate the one-dimensional power spectrum, the method of Wang et al. (2006) works poorly for ν_w = 5 MHz, and for ν_w with larger values, the estimated power spectrum is less affected by the complexity of the foreground. For FG II with ν_w = 5 MHz, the power spectral density at small k is significantly overestimated, which is obviously caused by the contamination from the foreground. On the other hand, as has been pointed out in our previous work (Wang et al. 2013), a simple polynomial fitting method over a narrow band (e.g., 2 MHz in Wang et al. 2006) will also lead to the leakage of power spectral density in the small wave number end.

To make a quantitative comparison, we estimate the rms deviation between the input and reconstructed 21 cm signals, which is defined as

$\begin{equation} Q\equiv \sqrt{\frac{1}{N}\sum _{i=1}^{N}[T_{\rm 21\,cm}^\prime (\nu _i)-T_{\rm 21\,cm}(\nu _i)]^2}, \end{equation} \tag{ 23 }$

where T_21 cm and $T^{\prime }_{\rm 21 \,cm}$ are the input and reconstructed 21 cm signals, respectively, and N is the number of frequency channels. A smaller Q means a better subtraction effect. Given the noise ΔT_b = 6 mK, for different ν_w, we compare the rms deviation of the results obtained with the method of Wang et al. (2006) and ours as shown in Table 1 and Figure 13. The errors of the estimation of Q are calculated by using the standard deviation of 1000 times the Monte Carlo simulation. The comparison is performed with the noise considered (Figure 13(a)) and ignored (Figure 13(b)), respectively, and the result is insensitive to the existence of noise. We find that when ν_w > 1 MHz, the Q of our method is rather stable, and almost independent of ν_w, while the effect of the method of Wang et al. (2006) seems rather sensitive to ν_w. When ν_w < 20 MHz, our method works significantly better than that of Wang et al. (2006), and when ν_w > 30 MHz, the polynomial fitting based method becomes better.

**Figure 13.** Comparison between the subtraction effects of the wavelet-based method and the polynomial fitting based method, with noise considered (a) and ignored (b).
Download figure:
Standard image High-resolution image

Table 1. The rms Deviation Q of Our Method and That Based on Polynomial Fitting of Wang et al. (2006) and Wp Smoothing with ΔT_b = 6 mK under Different Test Conditions

Test Condition		Our Method	Polynomial Fitting	Wp Smoothing^a
FG I		9.9 ± 1.5 mK	6.8 ± 0.6 mK	≃ 7 mK

	ν_w = 5 MHz	9.7 ± 1.4 mK	398 ± 1.8 mK	≃ 10 mK

FG II	ν_w = 10 MHz	9.6 ± 1.4 mK	39.1 ± 2.2 mK	⋅⋅⋅

	ν_w = 20 MHz	9.7 ± 1.5 mK	12.5 ± 1.4 mK	⋅⋅⋅

Note. ^aBecause the Wp smoothing based method is rather time consuming, we did not perform the error estimation through the Monte Carlo method.

Download table as: ASCII Typeset image

5.2. Comparison with the Wp Smoothing Method

Harker et al. (2009) suggested a method based on the Wp smoothing algorithm (originally described by Mächler 1993; Mächler 1995), which is also a non-parametric method. We implement their method according to an implementation note written by the author, i.e., Implementation of the "Wp" smoothing for EoR foreground fitting (the Implementation note hereafter). This method requires solving a boundary value problem (BVP) with nonlinear terms, and when implementing the solver numerically, it actually solves a multivariate nonlinear system of equations. Most algorithms for solving this kind of system of equations are based on iteration, so they have the risk of instability and may not finally reach the optimal solution. We have tested the Hybrid and Broyden algorithms and find that the solution is sensitive to the initial guess.

To analyze the behavior of the iteration for solving the above BVP, we list Equations (8)–(15) in the Implementation note. The BVP is described as follows:

$\begin{eqnarray} &&h^\prime (x)=g(x) \end{eqnarray} \tag{ 24 }$

$\begin{eqnarray} &&g^\prime (x)=p_{\mathbf {w}}(x)e^{h(x)}\left[-\frac{1}{2\lambda }\sum _{i=1}^n(x-x_i)_+\psi _i(y_i-f(x_i))\right]\nonumber\\ \end{eqnarray} \tag{ 25 }$

$\begin{eqnarray} &&f^{\prime }(x)=k(x) \end{eqnarray} \tag{ 26 }$

$\begin{eqnarray} &&k^{\prime }(x)=p_{\mathbf {w}}(x) e^{h(x)}, \end{eqnarray} \tag{ 27 }$

and the boundary conditions

$\begin{eqnarray} &&g(x_1)=0 \end{eqnarray} \tag{ 28 }$

$\begin{eqnarray} &&g(x_n)=0 \end{eqnarray} \tag{ 29 }$

$\begin{eqnarray} &&\sum _{i}\psi _i(y_i-f(x_i))=0 \end{eqnarray} \tag{ 30 }$

$\begin{eqnarray} &&\sum _{i}x_i\psi (y_i-f(x_i))=0, \end{eqnarray} \tag{ 31 }$

where λ is the Lagrange multiplier, y_i is the measured brightness temperature at frequency x_i, f(x_i) is the solution, which represents the smooth foreground component, and function ψ(x) := x → x. As described in the Implementation note, there is no "natural" value for λ, and in practice, the authors simply choose a reasonable-looking value for λ; we set λ = 1 in our implementation. Then the recovered 21 cm signal is obtained by y_i − f(x_i). We find that if during the iteration the function h(x) becomes negative numbers with a relatively large absolute value, then the right-hand sides of Equations (25) and (27) vanish. Then, g'(x) and k'(x) become zero, and finally the solution of f(x) will degenerate to a first-order polynomial and g(x) becomes zero. The f(x) has two free parameters, i.e., the slope and the intercept, which can be solved by Equations (30) and (31). Obviously the following functions,

$\begin{eqnarray} &&h(x)=-C_1 \end{eqnarray} \tag{ 32 }$

$\begin{eqnarray} &&k(x)=C_2 \end{eqnarray} \tag{ 33 }$

$\begin{eqnarray} &&f(x)=C_2x+b \end{eqnarray} \tag{ 34 }$

$\begin{eqnarray} &&g(x)=0, \end{eqnarray} \tag{ 35 }$

where C₁ is a large positive number, and C₂ and b are the two constants that can be solved with the linear equation set (Equations (30) and (31)), can be an approximate solution to the above BVP, which however takes no information from the observed spectrum y_i.

In order to test the above method, we make a small modification to prevent the iteration from reaching an obviously wrong solution, such as Equations (32)–(35). We set a lower limit of the function h(x). We test this method by using both FG I and FG II with ν_w = 5 MHz. For FG I, this method can obtain a result as good as that of the polynomial fitting based method, and for FG II, the Q is about 0.01 K (Figure 14), which is close to our method. The comparison results are summarized in Table 1. However, we should point out that because the solution of the BVP relies on iteration based methods and that the size of the system of nonlinear equations is not less than the number of channels, the process of solution is rather time consuming. As a rough comparison, we implement this method by using the GNU Scientific Library (Galassi & Gough 2005) and run the program on a workstation with an Intel Xeon 1.87 GHz CPU. It takes about 30 minutes to obtain the result, while with our wavelet-based method, it takes about 60 s to run 1000 rounds of subtractions. For the above reason, we were not able to calculate the errors of the estimation of Q with the Monte Carlo method.

**Figure 14.** Simulated (black), reconstructed (red) EoR 21 cm signals, and the residual (green) obtained with the Wp smoothing method by using FG I (a) and FG II with ν_w = 5 MHz (b).
Download figure:
Standard image High-resolution image

5.3. Risk of Instrumental Calibration Uncertainty

All above tests are based on the assumption that the instrument is perfectly calibrated. However, calibration uncertainty, more or less, always exists, so it is valuable to consider this effect when we test the foreground subtraction method. As a simple test, we consider a relative calibration error of 10⁻³ between different frequency channels. We assume the uncorrected relative gain at frequency ν to be described as

$\begin{equation} g(\nu)=1+10^{-3}\cos \left(2\pi \frac{\nu -\nu _{\min }}{10\,{\rm MHz}} \right), \end{equation} \tag{ 36 }$

as shown in Figure 15(a). This is a rough model composed only for the test purpose; nevertheless, according to Jelić et al. (2010), a polarization-sensitive instrument that is improperly calibrated may possess calibration uncertainty. This calibration uncertainty appears to be an oscillation structure along the radio frequency axis, which is significant at several MHz scales, just like our simple model above. In this test, we only use the above FG I.

We apply the uncorrected instrumental calibration uncertainty to the total signal (the sum of FG I, the 21 cm signal, and the instrumental noise) and use this signal to test the subtraction method. We show the subtraction result of the wavelet-based method in Figure 16(a). As a comparison, we test the polynomial fitting based method and show the result in Figure 16(b). It is obvious that the result of the wavelet-based method is affected a little by the instrumental calibration uncertainty, while the polynomial fitting based method becomes much worse. This phenomenon can be explained by the fact that when the instrumental calibration uncertainty is involved, the foreground spectrum is no longer low-order polynomial shaped. Although we can use a higher-order polynomial to approximate the calibration-involved foreground, it will overfit the background 21 cm signal. On the other hand, the wavelet-based method does not place such a strong assumption over the foreground spectrum, i.e., polynomial-like, so the deviation of the spectrum from a polynomial shape does not have a significant influence on the subtraction effect.

**Figure 16.** Foreground subtraction result with the condition that a relative system calibration error of 10⁻³ is introduced. (a) The result of our wavelet-based method. (b) The result of the polynomial fitting based method in Wang et al. (2006). The black, red, and green lines represent the input, reconstructed signal, and the residual, respectively.
Download figure:
Standard image High-resolution image

5.4. Application to an Extremely Sharp Turnover Condition

From above discussions we have found that the wavelet-based method appears to be more tolerant to complex conditions for the foregrounds, and its numerical stability and computing efficiency are much higher than in the Wp smoothing based method. In this section, we will test our method with an extreme condition, i.e., FG II with ν_w = 1 MHz.

By using the filtering method described in Section 4.2, we obtain the result, which is shown in Figure 17(a). From the wavelet coefficients of the total signal, which is shown in Figure 18(a), we find that the sharp turnover of the foreground spectrum has significant contributions to small scales, which are not filtered out by the above filtering method. We manually draw a mask (Figure 18(b)) to check whether the sharp turnover feature can be filtered. The filtered wavelet coefficients are shown in Figure 18(c), which is transformed to real space. The recovered EoR 21 cm signal is shown in Figure 17(b). The result has been significantly improved by using the manual filtering method.

**Figure 18.** (a) The absolute values of the wavelet coefficients of the total signal with FG II with ν_w = 1 MHz. (b) Manually generated filtering mask for FG II with ν_w = 1 MHz. (c) Corresponding filtered total signal.
Download figure:
Standard image High-resolution image

Although the above method is based on a subjective standard, it shows that the wavelet-based method can be further improved to handle more complex conditions. In our future work, we will try to find out a more objective method to subtract the foreground in such extreme conditions.

5.5. Handling Higher and More Complex Noise

We have tested two noise levels above: 6 mK for the future SKA core region and 60 mK as a representation of current working facilities just like what we have done in our previous work (Wang et al. 2013). For the 6 mK noise level, we are able to reconstruct the actual 21 cm signal from each image pixel, while for the 60 mK noise level, we are only able to obtain the power spectrum as statistical information. As has been pointed out in our previous work (Wang et al. 2013), the 60 mK noise level is calculated based on the 21CMA instrument, whose field of view is fixed to the 5° zone around the north celestial pole. This may not hold for other instruments, so that in this section we test a higher noise level of 120 mK. This noise level is equivalent to reducing the total observation time to 25%, which may be "more" realistic. Still with 1000 times simulation, we obtain the one-dimensional power spectrum of the reconstructed 21 cm signal, as shown in Figure 19. We find that the results are similar to those obtained in Section 4.3, but the fluctuation is larger. The effect can be improved by increasing the number of pixels used. For most working and upcoming facilities that are able to produce images, the total number of pixels should be much more than 1000, so that should be able to handle higher noise levels.

**Figure 19.** Same as Figure 10, but using a noise level of ΔT_b = 120 mK.
Download figure:
Standard image High-resolution image

The properties of noise may appear more complex in the aspect of stationarity. In the above tests, we assume the noise to be stationary along both the frequency and time axes, which may be broken during practical observations. However, as the data are accumulated before the subtraction of foreground, the non-stationarity in time domain will not affect our method. But what about the non-stationarity in the radio frequency domain? For a noise level significantly lower than the 21 cm signal (e.g., around several mK), this will not be a serious problem, despite that the noise will be mixed with the 21 cm signal after the subtraction. For a noise level significantly higher than the 21 cm signal, the subtraction algorithm itself can still work, but more corrections are required before producing the final result of the power spectrum. In this condition, the noise is not a white noise, so that for excluding the power spectral density contributed by the noise, one must subtract a more complex noise power spectrum from the total power spectrum to produce the final result.

5.6. Testing Other Signal Extension Methods

As described in Section 3.2, in the above tests, we simply use the period signal extension method. From Figure 3, we note that the significant wavelet coefficients are mainly contributed by the boundary effect. We have also tested other extension methods including filling zeros and symmetric extension. We find that these extension methods differ little from the period extension that we have used above. Nevertheless, we find that if we extend the originally measured signal I(ν) as

$\begin{eqnarray} &&I_{\rm ext}(\nu)=\nonumber\\ &&\left\lbrace \begin{array}{@{}ll}I(\nu _{\rm min})+(\nu -\nu _{\rm min})\frac{dI(\nu)}{d\nu }|_{\nu =\nu ^{+}_{\rm min}} & \nu \in [2\nu _{\rm min}-\nu _{\rm max},\nu _{\rm min})\nonumber\\ I(\nu) &\nu \in [\nu _{\rm min},\nu _{\rm max}]\nonumber\\ I(\nu _{\rm max})+(\nu -\nu _{\rm max})\frac{dI(\nu)}{d\nu }|_{\nu =\nu ^{-}_{\rm max}} &\nu \in (\nu _{\rm max},2\nu _{\rm max}-\nu _{\rm min}] \nonumber\\ \end{array} \right.,\\ \end{eqnarray} \tag{ 37 }$

which can be named as the linear extension method, the boundary effect can be significantly suppressed. We present the wavelet coefficients of the total signal and the filtered signal in band ν_min < ν < ν_max in Figure 20. Note that the filtering procedure is exactly the same as described in Section 4.2, but applied to the total band after the extension.

**Figure 20.** (a) The absolute values of the wavelet coefficients of the total signal with FG I calculated with the linear extension method (Section 5.6). (b) The absolute values of the corresponding filtered wavelet coefficients.
Download figure:
Standard image High-resolution image

We roughly test the effect of foreground subtraction with this extension method using FG I. We find that for the ΔT_b = 6 mK condition, the change of Q is not significant compared with the period extension method, while for the ΔT_b = 60 mK condition, the power leakage in the small wave number end almost disappears, as shown in Figure 21. This apparently can be explained as that the boundary effect mainly affects the large-scale components of the signal.

**Figure 21.** Recovered one-dimensional 21 cm power spectrum with the linear extension method (Section 5.6). Unlike Figure 10, the power spectrum presented in this figure is not corrected by any method.
Download figure:
Standard image High-resolution image

Although the simple test above shows that the linear extension method is a promising method to handle the boundary effect, unlike the period extension method that we use throughout this work, it is not commonly used yet, and more systematic tests are required, which will be performed in our future work. Because of the above reason, in this work, we still use the period extension method to handle the boundary effect.

5.7. What Kind of Conditions Are Different Methods Suitable To?

From the discussion above, we can conclude that if the foreground spectrum can be well approximated by a low-order polynomial, the traditional polynomial fitting based method can work well and obtain an acceptable estimation of the 21 cm spectrum. When the foreground is no longer simple, for example, it appears to possess a turnover with ν_w < 20 MHz; the fitting-based method will not produce an acceptable result, but the wavelet-based method can still work well. Furthermore, actual foreground may be more complex and can deviate from the power-law-shaped spectrum significantly. The fitting-based method can be seriously affected. If the instrument has an uncorrected calibration error, the wavelet-based method will also have significant advantages over the polynomial fitting based method. And for the Wp smoothing based method, in all the conditions that we have tested, it works at least as well as the traditional polynomial fitting based method. When the foreground is no longer as simple as FG I, it can obtain about the same effect as our method. However, solving a nonlinear BVP is a rather time-consuming work, so it may be a problem when a large amount of subtraction is required, for example, when estimating power spectra.

6. CONCLUSION

We propose a CWT-based foreground subtraction method for the detection of redshifted 21 cm signal from the EoR. This method works based on the assumption that the foreground spectra are smooth, while the 21 cm signal spectrum is full of saw-tooth-like structures; thus, their characteristic scales are significantly different. We can distinguish them in the wavelet coefficient space easily and perform the foreground subtraction. By testing the wavelet transform based method with a set of foreground spectra with different complexities, we find that compared with the traditional spectral fitting based method, our method is more tolerant to complex foregrounds. Furthermore, we also find that when the instrument has uncorrected response errors, our method can also work significantly better than the spectral fitting based method. Our method can obtain similar results with the Wp smoothing method, which is also a non-parametric method, but our method consumes much less computing time.

We thank the referee for his/her constructive and valuable comments, which help improve the manuscript. This work was supported by the Ministry of Science and Technology of China (grant Nos. 2009CB824900 and 2013CB837900), the National Science Foundation of China (grant Nos. 11203041, 11261140641, and 11125313), the Chinese Academy of Sciences (grant No. KJZD-EW-T01), and Science and Technology Commission of Shanghai Municipality (grant No. 12XD1406200).

THE APPLICATION OF CONTINUOUS WAVELET TRANSFORM BASED FOREGROUND SUBTRACTION METHOD IN 21 cm SKY SURVEYS

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

ABSTRACT

1. INTRODUCTION

2. SIMULATION OF THE LOW-FREQUENCY RADIO SPECTRA