A kernel-based method for markerless tumor tracking in kV fluoroscopic images

Xiaoyong Zhang; Noriyasu Homma; Kei Ichiji; Makoto Abe; Norihiro Sugita; Yoshihiro Takai; Yuichiro Narita; Makoto Yoshizawa

doi:10.1088/0031-9155/59/17/4897

1. Introduction

Respiration-induced lung tumor motion has a direct impact on the accuracy of treatment beam delivery in radiation therapy. It has been reported that the respiration may cause the lung tumor displacements up to 3 cm (Barnes et al 2001, Stevens et al 2001, Keall et al 2006). To account for the respiratory motion during treatment beam delivery, a conventional approach is to increase the internal margin of the clinical target volume (CTV) when defining the planning target volume (PTV) (ICRU 1993, ICRU 1999). However, this approach is inaccurate and probably leads to unwanted radiation delivery to the healthy organs or tissues adjacent the tumor. To improve the accuracy of the treatment beam delivery, various motion-management techniques, such as beam gating according to the respiratory phase and reposition the radiation beam to follow the tumor's changing position, have been proposed and some of them have already been applied to clinical applications. For these techniques, the tumor motion tracking plays an important role.

In past years, extensive efforts have been devoted to the development of tumor tracking systems for radiation therapy. These methods can be classified into three categories: external surrogate-based, fiducial markers-based and markerless tracking methods (Mao et al 2008, Wiersma et al 2008). Typical external surrogate-based methods use external signals whose varieties are correlated with the tumor motion, such as skin marker or lung volume, to estimate the tumor's position or respiratory phase (Kubo and Hill 1996, Mah et al 2000, Remouchamps et al 2003, Bert et al 2005, Berbeco et al 2006, Chi et al 2006, Li et al 2012). Due to uncertain correlation between the external signals and tumor's motion, however, these methods may lead to inaccuracy in treatment beam delivery (Vedam et al 2003, Hoisak et al 2004). On the other hand, the fiducial markers-based methods utilize visual tracking techniques to track fiducial markers implanted inside or near the tumor or the tumor without using the fiducial markers in kV fluoroscopic image sequence obtained by a fluoroscopic imaging system during the treatment. Currently, tracking the fiducial markers has been applied to clinical treatment because of its relatively higher accuracy (Harada et al 2002, Seppenwoolde et al 2002, Shirato et al 2003). However, since the implantation of fiducial maker has an associated risk of pneumothorax, this procedure may not be widely accepted for clinical uses (Laurent et al 2000, Arslan et al 2002, Geraghty et al 2003). Therefore, techniques for direct tracking of lung tumor without the implanted markers are needed.

Several methods for the markerless tumor tacking in kV fluoroscopic image sequence have been reported. For example, Berbeco et al (2005) use the correlation coefficients between multiple templates corresponding to different respiratory phases and a region of interest (ROI) covering the tumor area in the observed image as a score to determine the ROI is in the treatment position or not. However, this method mainly focuses on determining the respiratory phase of a tumor target, instead of tracking the tumor's position. As extensions, this multiple templates-based method has been improved by optimizing the template generation and the correlation scores with different similarity metrics (Cui et al 2007a, 2007b, Li et al 2009, Lewis et al 2010). In addition, a volumetric image reconstruction based on three-dimensional (3D) image registration has been proposed to use a single projection image to track 3D tumor motion (Li et al 2010a, Li et al 2011). This method has shown its efficiency but requires 4D CT or CBCT date as reference and involves expensive computational cost in 3D image registration. Recently, Lin et al (2009a, 2009b) propose a machine learning-based method that uses the intensity variations of surrogate ROIs in fluoroscopic images to predict or estimate the tumor motion. A drawback of the machine learning-based method is the requirement of a fluoroscopic image sequence and the ground truth of tumor motion for a training procedure prior to the tracking process.

In this paper, we present a new markerless tracking method that is based on a well-known visual tracking technique named as mean-shift algorithm (Comaniciu et al 2003) that formulates the tracking problem as a statistical optimization process. Compared with previous methods, the proposed method dose not require a fluoroscopic image sequence for the training procedure. In addition, the proposed method is robust against the non-rigid tumor deformation and is faster than conventional template matching-based methods. We have evaluated the performance of the proposed method by using four fluoroscopic image sequences. Experimental results demonstrated that the proposed method is superior to conventional template matching-based methods in terms of its accuracy and computational cost.

The rest of this paper is organized as follows. In section 2, we present the details of the proposed method. Experimental results and discussions will be given in section 3. Finally, we conclude our paper in section 4.

2. Methods and materials

Figure 1 shows the flow chart of the proposed method that consists of three procedures: (i) histogram equalization for image contrast enhancement, (ii) tumor target modeling and (iii) kernel-based target tracking. Given a fluoroscopic image sequence, we firstly utilize a histogram equalization to enhance the image contrast for increasing the visibility of a tumor target. In the first frame, the tumor target is delineated manually and is then represented by a histogram-based feature vector. For the subsequent frames, the tumor's position is automatically tracked by using a kernel-based tracking algorithm. The details of the three procedures are described in the following subsections.

2.1. Image enhancement

Generally, in a kV fluoroscopic image, soft tissues such as blood vessels and tumors have a lower contrast compared to those of hard tissues, such as bones, and their intensity range probably varies with the x-ray energy and irradiation position. Figure 2(a) shows a 16-bit fluoroscopic image in which a tumor target is located at the center of the image. Due to the low contrast, we see that the tumor is relatively indistinguishable and can hardly be tracked by using conventional visual tracking approaches. The normalized histogram of the image is shown in figure 2(b). Note that the dynamical rang of the image intensity is extremely narrow for a 16-bit image whose intensity ranges from 0 to 65535.

**Figure 2.** A fluoroscopic image and its normalized histogram. (a) A 16-bit fluoroscopic image. (b) The normalized histogram of image (a).
Download figure:
Standard image High-resolution image

A desirable high-contrast image should cover a wide range of the intensity scale and the distribution of the image intensities should not be too far from a uniform distribution. Histogram equalization is one of the intensity transformations that can automatically improve the image contrast based only on information available in the histogram of the input image (Gonzalez and Woods 2008).

Given an input image with intensity levels in the range [0, L − 1] (e.g. L = 2¹⁶ for a 16-bit image), the normalized histogram is a discrete function, given by p(r_k) = n_k/N, where r_k is the kth intensity value, n_k is the number of pixels in the image with intensity r_k and N is the total number of pixels in the images. The histogram equalization is used to map each pixel in the input image with intensity r_k into a corresponding pixel with intensity s_k in a new range $\left[0,{{L}_{\text{new}}}-1\right]$ using the transformation T(r_k), given by

$\begin{eqnarray}&&{{s}_{k}}=T\left({{r}_{k}}\right)=\left({{L}_{\text{new}}}-1\right)\underset{j=0}{\overset{k}{\sum}} p\left({{r}_{k}}\right)=\frac{\left({{L}_{\text{new}}}-1\right)}{N}\underset{j=0}{\overset{k}{\sum}} {{n}_{j}},k=0,1,\ldots ,L-1\end{eqnarray} \tag{ 1 }$

For reducing the computational cost in the subsequent steps, we convert the input image into an 8-bit image by setting ${{L}_{\text{new}}}=256$ . In practice, s_k probably have fractions because they were generated by summing probability values, so we round them to the nearest integer. Figure 3 shows the intensity transformation function T(r_k) for the image in figure 2(a).

**Figure 3.** Intensity transformation of histogram equalization obtained for the image in figure 2(a).
Download figure:
Standard image High-resolution image

Figure 4(a) shows the output image obtained from the histogram equalization. Compared with the original image shown in figure 2(a), the output image has higher dynamic range and the tumor target becomes more distinguishable. Figure 4(b) shows the normalized histogram of the output image that exhibits a large variety of gray tones.

In practice, given a kV fluoroscopic image sequence, we can use the first frame to generate a lookup table according to equation (1) for the histogram equalization and apply this lookup table to subsequent frames for improving the image contrast.

2.2. Target modeling

In order to track a tumor target in a fluoroscopic image sequence, the tumor target should be chosen in advance. Since the contrast of the observed image has been enhanced in the previous step, the tumor target can more easily be delineated in a rectangle by clinicians. In figure 4, a red rectangle covering the tumor area is delineated by a clinician. The tumor target is then represented by a feature vector $\mathbf{\hat{q}}$ which consists of m elements ${{\left\{{{\hat{q}}_{u}}\right\}}_{u=1,\ldots ,m}}$ , where ${{\hat{q}}_{u}}$ denotes the quantity of the u-th element. The feature vector is generated based on a weighted histogram described as follows.

Suppose that the tumor target is delineated within a rectangle $T\left(\mathbf{x}_{i}^{*}\right)$ of size n pixels, where T is the intensity of a pixel located at ${{\left\{\mathbf{x}_{i}^{*}=\left(x_{i}^{*},y_{i}^{*}\right)\right\}}_{i=1,2,\ldots ,n}}$ , and that the center of the rectangle is 0. An m-bin histogram index function is defined by

$\begin{eqnarray}&&b\left(\mathbf{x}_{i}^{*}\right)=\frac{mT\left(\mathbf{x}_{i}^{*}\right)}{255}+1\end{eqnarray} \tag{ 2 }$

where ⌊·⌋ is a ceiling function that returns the smallest integer greater than or equal to $mT\left(\mathbf{x}_{i}^{*}\right)/255$ . The function $b\left(\mathbf{x}_{i}^{*}\right)$ indicates the index of the histogram bin corresponding to the intensity of the pixel at location $\mathbf{x}_{i}^{*}$ . We also define a profile function of a Gaussian kernel, denoted by k(x), given by

$\begin{eqnarray}&&k\left(x\right)={{\text{e}}^{-\frac{x}{2}}}\end{eqnarray} \tag{ 3 }$

The details of the profile function of a kernel can be found in Comaniciu et al (2003).

The feature vector representing the tumor target is defined as

$\begin{eqnarray}&&{\hat q_u} = C\mathop \sum \limits_{i = 1}^n k\left( {{{\left\| {\frac{{{\bf{x}}_i^*}}{{\bf{h}}}} \right\|}^2}} \right)\delta \left[ {b({\bf{x}}_i^*) - u} \right]\,\,\,\,u = 1,2, \ldots ,m.\end{eqnarray} \tag{ 4 }$

where $\left\| {{\bf{x}}_i^*/{\bf{h}}} \right\|$ is the normalized distance between $\mathbf{x}_{i}^{*}$ and the center of $T\left(\mathbf{x}_{i}^{*}\right)$ , $\mathbf{h}=\left({{h}_{x}},{{h}_{y}}\right)$ is the kernel bandwidth that normalizes the coordinate $\mathbf{x}_{i}^{*}$ within a unit circle, δ is the Kronecker delta function defined by

$\begin{eqnarray}\delta (x) = \left\{ {\begin{array}{*{20}{c}} {1,}&{{\rm{if}}\,x = 0}\\ {0,}&{{\rm{otherwise}}} \end{array}} \right.\end{eqnarray} \tag{ 5 }$

and C is the normalization constant given by

$\begin{eqnarray}&&C = \frac{1}{{\sum\nolimits_{i = 1}^n {k\left( {{{\left\| {\frac{{{\bf{x}}_i^*}}{{\bf{h}}}} \right\|}^2}} \right)} }}\end{eqnarray} \tag{ 6 }$

Figure 5(a) shows the tumor target of size 80 × 90 pixels delineated by the red rectangle shown in figure 4(a). Figure 5(b) is the corresponding Gaussian kernel obtained from equation (3) where h_x = 40 and h_y = 45. Figure 5(c) shows the feature vector ${{\left\{{{\hat{q}}_{u}}\right\}}_{u=1,\ldots ,m}}$ representing the tumor target where m is 25.

Here, the feature vector ${{\left\{{{\hat{q}}_{u}}\right\}}_{u=1,\ldots ,m}}$ can be considered as a weighted intensity probability density function (PDF) of the tumor target. The Gaussian kernel function gives a larger weight for the pixel whose location is closed to the center of the target, while gives a smaller weight for the pixel whose location is far away from the center. Therefore, the feature vector not only contains intensity information of tumor target, but also contains spatial information.

2.3. Target tracking

Tracking the tumor target in the subsequent frame is to find the location of a target candidate whose feature vector denoted by $\mathbf{\hat{p}}$ , is closest to the feature vector of the tumor target $\mathbf{\hat{q}}$ .

Let $T\left({{\mathbf{x}}_{i}}\right)$ be a target candidate, centered at $\mathbf{y}$ in the current frame, where i = 1, 2, ..., n. Using the Gaussian kernel defined by equation (3), the feature vector of this target candidate, denoted by $\mathbf{\hat{p}}\left(\mathbf{y}\right)={{\left\{{{\hat{p}}_{u}}\left(\mathbf{y}\right)\right\}}_{u=1,\ldots ,m}}$ , is given by

$\begin{eqnarray}&&{\hat p_u}({\bf{y}}) = C\mathop \sum \limits_{i = 1}^n k(\frac{{{\bf{y}} - {{\bf{x}}_i}}}{{\bf{h}}}{^2})\delta [b({{\bf{x}}_i}) - u],\,\,\,\,u = 1,2, \ldots ,m.\end{eqnarray} \tag{ 7 }$

Tracking the target can be formulated by finding the coordinate $\mathbf{y}$ where the similarity between two feature vectors $\mathbf{\hat{q}}$ and $\mathbf{\hat{p}}\left(\mathbf{y}\right)$ reaches the maximum. In this paper, we employ the Bhattacharyya coefficient as the similarity metric to measure the similarity between the target model $\mathbf{\hat{q}}$ and the target candidate $\mathbf{\hat{p}}\left(\mathbf{y}\right)$ (Comaniciu et al 2003). The Bhattacharyya coefficient, denoted by $\rho \left[\mathbf{\hat{p}}\left(\mathbf{y}\right),\mathbf{\hat{q}}\right]$ , is defined by

$\begin{eqnarray}&&\rho \left[\mathbf{\hat{p}}\left(\mathbf{y}\right),\mathbf{\hat{q}}\right]=\underset{u=1}{\overset{m}{\sum}} \sqrt{{{{\hat{p}}}_{u}}\left(\mathbf{y}\right){{{\hat{q}}}_{u}}}\end{eqnarray} \tag{ 8 }$

The Bhattacharyya coefficient can be interpreted as the correlation between the vectors $\left(\sqrt{{{{\hat{p}}}_{1}}},\ldots ,\sqrt{{{{\hat{p}}}_{m}}}\right)$ and $\left(\sqrt{{{{\hat{q}}}_{1}}},\ldots ,\sqrt{{{{\hat{q}}}_{m}}}\right)$ . Since the feature vector of the target candidate $\mathbf{\hat{p}}\left(\mathbf{y}\right)$ carries continuous spatial information, the Bhattacharyya coefficient between $\mathbf{\hat{p}}\left(\mathbf{y}\right)$ and $\mathbf{\hat{q}}$ is also a smooth function of $\mathbf{y}$ . Suppose that the center of the tumor target in the previous frame is ${{\mathbf{\hat{y}}}_{0}}$ . The Bhattacharyya coefficient in equation (8) can be approximated by the following Taylor expansion around the location ${{\mathbf{\hat{y}}}_{0}}$ :

$\begin{eqnarray}&&\rho [{\bf{\hat p}}({\bf{y}}),{\bf{\hat q}}] \approx \frac{1}{2}\mathop \sum \limits_{u = 1}^m \sqrt {{{\hat p}_u}({{{\bf{\hat y}}}_0}){{\hat q}_u}} + \frac{C}{2}\mathop \sum \limits_{u = 1}^n {w_i}k\left( {{{\left\| {\frac{{{\bf{y}} - {{\bf{x}}_i}}}{{\bf{h}}}} \right\|}^2}} \right)\end{eqnarray} \tag{ 9 }$

where

$\begin{eqnarray}&&{{w}_{i}}=\underset{u=1}{\overset{m}{\sum}} \sqrt{\frac{{{{\hat{q}}}_{u}}}{{{{\hat{p}}}_{u}}\left({{{\mathbf{\hat{y}}}}_{0}}\right)}}\delta \left[b\left({{\mathbf{x}}_{i}}\right)-u\right]\end{eqnarray} \tag{ 10 }$

In equation (9), the first term is independent of $\mathbf{y}$ and the second term represents a density estimation computed with the kernel profile k (·) at $\mathbf{y}$ in the current image.

According to the theorem of density gradient estimation (Comaniciu et al 2003), the maximization of the Bhattachayya coefficient $\rho \left[\mathbf{\hat{p}}\left(\mathbf{y}\right),\mathbf{\hat{q}}\right]$ can be solved by the following mean-shift algorithm.

Given a tumor target whose center is located at ${{\mathbf{\hat{y}}}_{0}}$ in the previous frame, its feature vector $\mathbf{\hat{q}}={{\left\{{{\hat{q}}_{u}}\right\}}_{u=1,\ldots ,m}}$ is computed according to equation (4). In the current frame, the tracking process consists of the following three steps:

Step 1: Compute the feature vector $\mathbf{\hat{p}}\left({{\mathbf{\hat{y}}}_{0}}\right)={{\left\{{{\hat{p}}_{u}}\left({{\mathbf{\hat{y}}}_{0}}\right)\right\}}_{u=1,\ldots ,m}}$ of a target candidate located at ${{\mathbf{\hat{y}}}_{\mathbf{0}}}$ in the current frame and the Bhattachayya coefficient between $\mathbf{\hat{p}}\left({{\mathbf{\hat{y}}}_{0}}\right)$ and $\mathbf{\hat{q}}$

$\begin{eqnarray}&&\rho \left[\mathbf{\hat{p}}\left({{\mathbf{\hat{y}}}_{\mathbf{0}}}\right),\mathbf{\hat{q}}\right]=\underset{u=1}{\overset{m}{\sum}} \sqrt{{{{\hat{p}}}_{u}}\left({{{\mathbf{\hat{y}}}}_{0}}\right){{{\hat{q}}}_{u}}}\end{eqnarray} \tag{ 11 }$

Step 2: Compute the weight {w_i}_{i = 1,...,n} according to equation (10).

Step 3: Compute the new location of the target

$\begin{eqnarray}&&{{\bf{\hat y}}_1} = \frac{{\sum\nolimits_{i = 1}^n {{{\bf{x}}_i}{w_i}g} \left( {{{\left\| {\frac{{{{{\bf{\hat y}}}_0} - {{\bf{x}}_i}}}{{\bf{h}}}} \right\|}^2}} \right)}}{{\sum\nolimits_{i = 1}^n {{w_i}g} \left( {{{\left\| {\frac{{{{{\bf{\hat y}}}_0} - {{\bf{x}}_i}}}{{\bf{h}}}} \right\|}^2}} \right)}}\end{eqnarray} \tag{ 12 }$

where g(x) = − k'(x). In this paper, since k(x) is the profile function of a Gaussian kernel, g(x) is then given byg(x) = 0.5e^−x/2.

Step 4: If $\left\| {{{{\bf{\hat y}}}_1} - {{{\bf{\hat y}}}_0}} \right\| <$ , stop the iterations, otherwise set ${{\mathbf{\hat{y}}}_{0}}~\leftarrow ~{{\mathbf{\hat{y}}}_{1}}$ and return to Step 1.

Once the iteration is stopped in current frame, we replace the tumor target by using the tracking result and continue the tracking process to the next frame. Therefore, we only need to decide the tumor target in the first frame during whole tracking process. In practice, it is difficult to delineate a tumor target during the treatment. Therefore, we consider that this step can be conducted in patient positioning.

To demonstrate the efficiency of the mean-shift algorithm, figure 6 gives an example to show the tracking process. In figure 6(a), the red rectangle whose center is ${{\mathbf{\hat{y}}}_{0}}$ indicates the tumor's location in the previous frame and the green rectangle shows an estimated location of the tumor target obtained by the mean-shift iteration processing. Figure 6(b) shows the Bhattacharyys similarity surface and the mean-shift iteration that converges at the maximum of the similarity surface within about 20 iterations. In this figure, the similarity surface reaches its maximum at the coordinates (−43, −21) which indicates the values of tumor motion along the vertical and horizontal directions.

3. Results and discussion

To evaluate the performance of the proposed method, computer simulations were performed on four clinical kV image sequences obtained by the On-Board kV imaging system (Varian Medical Systems, Palo Alto, CA). Each image sequence consists of 200 frames acquired continuously with the exposure conditions of 70 kV, 20 mA and 15 fps. The size of image is 300 × 300 pixels with resolution of 0.26 mm/pixel. Figures 7(a)–(d) show the first frames in the image sequences.

**Figure 7.** The first frames in kV image sequences for experiments. (a) Patient 1. (b) Patient 2. (c) Patient 3. (d) Patient 4.
Download figure:
Standard image High-resolution image

Table 1 summarizes the gantry and couch angles of each image sequence. The first (Patient 1) and second (Patient 2) image sequences are acquired from the anteroposterior (AP) view. The third (Patient 3) and the fourth (Patient 4) image sequences are acquired approximately from the left-right (LR) view. As shown in figure 7, the tumors in the first and second image sequences are distinguishable, while the tumors in the third and fourth image sequences are relatively indistinguishable.

Table 1. Gantry and couch angles (degree) of kV image sequences.

	Patient 1	Patient 2	Patient 3	Patient 4
Gantry angle	0	0	85	285
Couch angle	0	0	0	10
View	AP	AP	≈ LR	≈ LR

In addition, we also conducted a deformation analysis to observe the tumor's deformation in each image sequence. For evaluation of tumor's deformation, the tumor's boundaries in the end-inhale and end-exhale were segmented by using a level-set method-based segmentation technique (Li et al 2010b). Then, the centres of the tumor's boundaries were aligned to origin of coordinates. Finally, we utilized volume overlap index (OI) and root-mean-squared distance (RMSD) to measure the degree of tumor deformation (Wu et al 2009).

Figure 8 shows an example of deformation analysis on the first (Patient 1) image sequence. Figures 8(a) and (b) show the tumor's boundaries (red curve) and it's centres (circle marker) at end-exhale and end-inhale images. Figure 8(c) shows the alignment of the centres of the tumor's boundaries to origin of coordinates. Table 2 summarizes the OI and RMSD (mm) between the end-exhale and end-inhale images in each image sequence. An OI closed to 1 indicates low-degree deformation, whereas an OI closed to 0 indicates large-degree deformation. On the contrary, a larger RMSD indicates large-degree deformation, while lower RMSD indicates lower-degree deformation. From this table, we see that the degrees of tumor's deformations in the first and second image sequences are small and that the tumor's deformations in the third and fourth image sequences are relatively large.

Table 2. Volume overlap index (OI) and root-mean-squared distance (RMSD) (mm) between end-exhale and end-inhale images.

	Patient 1	Patient 2	Patient 3	Patient 4
OI (%)	82%	84%	70%	71%
RMSD (mm)	1.58	1.17	1.99	2.84

The computer simulations including the image enhancement processing and the tracking algorithm are implemented by using MATLAB R2012b (The Mathworks, Inc., Natick, MA) on an Intel Corei7 3.3 GHz computer with Windows7 OS.

3.1. Results of image enhancement

Figure 9 shows the contrast-enhanced images obtained from the histogram equalization described in section 2.1. Compared with the original images in figure 7, the tumor target delineated by the red rectangles are more distinguishable. Specially, for the third (Patient 3) and fourth (Patient 4) images, the visibilities of tumor targets have been significantly improved.

To assess the contrast enhancement numerically, we employ the image entropy to measure the degree of randomness of image. The image entropy is defined by

$\begin{eqnarray}&&H=-\underset{k=0}{\overset{L-1}{\sum}} p\left({{s}_{k}}\right) {{\log}_{2}}~p\left({{s}_{k}}\right)\end{eqnarray} \tag{ 13 }$

where L and p(s_k) are defined in section 2.1. A high entropy implies better contrast in image, while low entropy implies low contrast in image. Figure 10 shows the comparison of the entropies between the original images and the enhanced images. For the first and second patients, the image entropies are increased by about 20% . For the third and fourth patients, the image entropies are increased by about 100% . These results demonstrate that the histogram equalization is capable of increasing the image entropies effectively.

**Figure 10.** Comparison of image entropies between the original images and contrast-enhanced images.
Download figure:
Standard image High-resolution image

3.2. Results of target tracking

In figure 9, four tumor targets within the red rectangles are delineated manually in the first frame before the target tracking. These tumor targets are represented by 25-element feature vectors. We then apply the tracking algorithm to localize the tumor target in the subsequent frames.

Figures 11(a)–(d) show the tracking results in four image sequences. For comparison, we also plot the ground truths of the tumor motion. The ground truths are the mean of manually localized positions generated by three observers. We can see that our tracking results are in good agreement with the ground truths.

**Figure 11.** Tracking results in comparison with ground truth where SI and LR stand for superior-inferior and left-right directions, respectively. (a) Patient 1. (b) Patient 2. (c) Patient 3. (d) Patient 4.
Download figure:
Standard image High-resolution image

We also implemented four basic template matching (TM)-based tracking methods by using different similarity metrics: mean squared error (MSE), correlation ratio (CR), correlation coefficient (CC) and mutual information (MI) for comparison. All the template matching methods are performed on the same contrast-enhanced image sequences. In addition, in the template matching-based methods, we limited the searching region within a pre-defined region of size 100 × 100 pixels, which includes the tumor positions at all breathing phases. Here, the template matching methods for comparison do not represent other previously published methods, but are instead a basic implementation of template matching. Table 3 summarizes the absolute means of errors (mm) and the standard deviations of errors (mm), receptively denoted by $\bar{e}$ and σ, between the tracking results and the ground truths. For each image sequence, we calculate the errors along two directions corresponding to the image acquisition angle. We also summarize the averages of errors along three directions. This table demonstrates that the accuracy of our tracking method is higher than those of the template matching methods.

As mentioned in section 2.2, the proposed method is to use a histogram-based feature vector to represent a tumor target. If the intensities of a tumor target have a narrow dynamical range, for example, when the tumor overlaps with heart or spine, the accuracy of our method will be degraded inevitably. In future work, we will focus on tracking the tumor when it overlaps with the heart or spine. We plan to utilize an adaptive histogram equalization or motion enhancement technique (Berbeco et al 2006) as a preprocessing technique to enhance the tumor target and extend the validity of the proposed method.

Table 3. Absolute mean of error $\left(\bar{e}\right)$ and standard deviation of errors (σ) (mm) of tumor tracking.

Patient No.		Patient 1		Patient 2		Patient 3		Patient 4		Average
Direction		SI	LR	SI	LR	SI	AP	SI	AP	SI	LR	AP
Moving range		16.9	10.5	2.9	1.3	12.7	3.8	22.4	9.6	1.5	1.0	1.4
Method 1	$\bar{e}$	1.4	1.3	1.0	0.7	2.1	1.3	1.4	1.5	2.6	1.3	3.6
	σ	2.1	1.8	1.0	0.7	2.6	2.1	3.1	3.2	2.2	1.3	2.7
Method 2	$\bar{e}$	1.9	1.4	0.6	0.4	2.0	1.1	2.8	2.1	1.6	0.9	1.6
	σ	2.3	1.9	1.2	0.9	3.4	2.1	2.9	4.5	2.5	1.3	3.3
Method 3	$\bar{e}$	1.0	0.7	0.3	0.3	2.1	1.2	2.9	2.0	1.6	0.5	1.6
	σ	2.0	1.2	0.5	0.9	3.6	2.8	3.0	3.2	2.3	1.1	3.0
Method 4	$\bar{e}$	1.7	1.1	0.7	0.4	2.8	2.1	2.5	2.1	1.9	0.8	2.1
	σ	2.5	1.3	1.0	0.9	3.0	2.9	2.9	2.3	2.3	1.1	2.6
Proposed	$\bar{e}$	1.0	0.6	0.1	0.2	0.4	0.1	1.6	0.7	0.8	0.4	0.4
	σ	1.9	1.1	0.3	0.5	2.2	0.6	2.4	2.0	1.7	0.8	1.3

Note: Method 1: MSE-based TM, Method 2: CR-based TM, Method 3: CC-based TM, Method 4: MI-based TM.

3.3. Analysis of computational cost

As described in section 3.1, the processing of contrast enhancement is performed by using an intensity look-up table that is obtained from the first frame of an image sequence. The contrast enhancement for the subsequent frame is computationally inexpensive. In fact, the contrast enhancement for such a frame was about 3 ms in our experiments.

Therefore, we mainly focus on evaluating the computational time of the tracking process. Figure 12 shows the average computational times (s/frame) of the tracking process of the proposed method with comparison of four template matching methods. The average tracking time of the proposed method is about 45 ms per frame, while the lowest computational time of template matching method is about 200 ms per frame that is about 4 times of the proposed method. Consider that the fps of the kV imaging system in our experiments is about 15 fps (about 67 ms/frame), the proposed method is capable of providing a real-time performance in practice.

4. Conclusions

In this paper, we proposed a real-time markerless tracking framework to track lung tumor motion in fluoroscopic image sequence. The proposed system consists of a histogram equalization processing for improving the visual quality and an optimization algorithm for tumor tracking in the fluoroscopic image sequence. The experimental results demonstrated that the proposed system is more accurate and faster than template matching-based tracking methods. Moreover, the proposed system is capable of tracking the tumor motion with non-rigid deformation and does not introduce expensive computational cost.

Acknowledgments

This work was supported by the Varian Medical Systems (Palo Alto, CA) and JSPS KAKENHI Grant Numbers 25293258 and 24659554.

A kernel-based method for markerless tumor tracking in kV fluoroscopic images

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction