Brought to you by:
Paper

A pilot study of a new spectrophotometry device to measure tissue oxygen saturation*

, and

Published 13 August 2014 © 2014 Institute of Physics and Engineering in Medicine
, , Citation Gemma Abel et al 2014 Physiol. Meas. 35 1769 DOI 10.1088/0967-3334/35/9/1769

0967-3334/35/9/1769

Abstract

Tissue oxygen saturation (SO2) measurements have the potential for far wider use than at present but are limited by device availability and portability for many potential applications. A device based on a small, low-cost general-purpose spectrophotometer (the Harrison device) might facilitate wider use. The aim of this study was to compare the Harrison device with a commercial instrument, the LEA O2C.

Measurements were carried out on the forearm and finger of 20 healthy volunteers, using a blood pressure cuff on the upper arm to induce different levels of oxygenation. Repeatability of both devices was assessed, and the Bland–Altman method was used to assess agreement between them.

The devices showed agreement in overall tracking of changes in SO2. Test–retest agreement for the Harrison device was worse than for O2C, with SD repeatability of 10.6% (forearm) or 18.6% (finger). There was no overall bias between devices, but mean (SD) difference of 1.2 (11.8%) (forearm) or 4.4 (11.5%) (finger) were outside of a clinically acceptable range.

Disagreements were attributed to the stability of the Harrison probe and the natural SO2 variations across the skin surface increasing the random error. Therefore, though not equivalent to the LEA O2C, a probe redesign and averaged measurements may help establish the Harrison device as a low cost alternative.

Export citation and abstract BibTeX RIS

1. Introduction

Oxygen saturation (SO2) is the percentage of the total oxygen carrying capacity of haemoglobin which has oxygen molecules attached to it. A wide variety of techniques are used for measuring oxygen saturation according to application. The most widespread application is for monitoring of SO2 in critical care using pulse oximetry, but pulse oximetry only works on pulsatile tissue, and assessing SO2 is important in all tissues. Furthermore it has reliability issues for SO2 less than 70%, and SO2 can drop to a much lower level (Tungjitkusolmun 1997) e.g. in ischaemic limbs.

Reflectance spectrophotometry provides information about the colour of white light reflected from human tissue. The remitted spectrum characteristics can be compared with those of oxygenated and deoxygenated blood in order to measure tissue oxygen saturation. Its use on non-pulsatile tissue, and in low tissue perfusion, opens this technique to a more diverse range of clinical applications. In particular, lowered SO2 is reported in peripheral arterial disease where SO2 measurement might improve early detection and monitoring, and it is used in some specialized vascular centres to assess amputation level and wound healing in these patients (Ibrahim et al 1999).

Laser Doppler, thermal imaging or clearance methods can demonstrate that blood is reaching the extremities but do not assess whether blood flow is sufficient for tissue nutrition. SO2 measurement demonstrates whether oxygen is reaching the tissue (Harrison et al 1994). Instruments are commercially available, but their use is arguably limited by size and cost to these specialist centres. A recent system (Moor VMS-Oxy, Moor Instruments Ltd, UK) is portable but still does not fully address the cost issue.

In an earlier manuscript, Harrison et al (1992) described a device based on a general-purpose spectrophotometer module that is smaller and less costly than those currently available. It has been used for amputation level assessment (Harrison et al 1994), and to quantify tissue oxygen saturation in venous hypertension (Hanna et al 1995) and the tuberculin reaction (Harrison et al 1992). However, it remains unclear how well the Harrison device compares in SO2 measurements to its more expensive counterparts. In our centre we use the LEA O2C, which is CE marked as a medical device for tissue oxygen measurement (LEA Medizintechnik, Germany). Therefore, the aims of the study were to:

  • Assess the within-subject repeatability of the Harrison device and the O2C device.
  • Assess agreement of the Harrison device with the O2C device as the 'gold standard'.

2. Methods

2.1. Spectrophotometry devices

Each device consists of a light source, with light transmitted to the skin through a fibre optic probe. Adjacent there is a receiving fibre that transmits the light to a spectrometer. More detail on the principles of spectrophotometry, and the theory behind its application to SO2 measurements, can be found elsewhere (Dawson et al 1980, Feather et al 1989, Frank et al 1989, Harrison 2002).

The O2C measures oxygen saturation and is targeted towards assessment of peripheral vascular disease. It is effective in assessing wound healing for ulcers that develop as a result of this disease (Beckert et al 2004, Forst et al 2008). It is now used in amputation level assessment in our centre. The O2C probe used here was of the flat design LF-2, which can be stuck with purpose-made double-sided tape to the skin. The design of the probe is shown in figure 1. An automatic ambient light correction subtracts the signal due to ambient light from the measurements. The algorithm for the calculation of SO2 by the O2C is proprietary and not published.

Figure 1.

Figure 1. Showing the design of measurement probes for the Harrison and O2C devices. The diagrams on the left-hand side show the side of the probe that was attached to the skin. The right-hand side shows the side view. Lengths are in mm and are approximate.

Standard image High-resolution image

The device developed by Harrison et al (1992), has been used in research and in a clinical setting. Details of calibration and the algorithm used to calculate SO2 can be found in Harrison et al (1992, 1994). In brief the device uses a reflectance type probe as illustrated in figure 1, and an OEM spectrometer module, currently Avantes Avaspec ULS2048, configured for visible/NIR spectra. After smoothing and correcting for white balance, the reflected spectrum is converted to absorption units. By 3-factor linear regression, a best-fit is calculated to the reference spectra of oxyHb, deoxyHb, and melanin. SO2 is calculated from the ratio of the first two components.

2.2. Participants

Since it formed an MSc final-year project, the study was approved by both the Newcastle NHS R&D Office and the University of Leeds Ethics Committee.

Twenty healthy volunteers aged 22–50 years, 12 male and 8 female, were recruited from the Regional Medical Physics Department at the Freeman Hospital, Newcastle upon Tyne. Potential volunteers were excluded if they suffered from hypertension, any known vascular disease, any known neurological or muscle disease, diabetes mellitus, any irritative skin disease, or Raynaud's phenomenon. Age, height, weight, blood pressure and resting heart rate of each volunteer were recorded. Skin type was also recorded, since it is defined by the level of melanin present in the skin which affects transmission of light through the skin. Skin type was defined using the Fitzpatrick scale, from 1 (palest) to 6 (darkest).

All volunteers were invited back for a repeat test, within 2 weeks, to assess repeatability.

2.3. Measurements

Devices were powered up for at least 40 min prior to the start of measurements each day. Both devices use a ceramic white standard to obtain a reference spectrum, and this calibration was carried out before each subject's measurements.

Volunteers were acclimatized in the clinical microvascular measurement room, in the supine position, for 10 min. Measurements were taken under dim ambient lighting at a normothermic room temperature of 23.6 °C ± 0.5 °C, humidity 40% ± 5%.

A blood pressure cuff was placed around the upper arm; it was used later in the experiment to occlude the flow of blood in the arm and induce different levels of oxygenation. The probes from the two instruments were attached to the forearm of the subject using medical double-sided sticky tape, at a distance of approximately 2 cm apart. Following acclimatisation, resting SO2 measurements were taken for 3 min. Subsequently the cuff was inflated to progressively restrict the blood flow to the arm, pausing for 1 min for each pressure level. Pressures of 50, 100 and 200 mmHg were used in order to occlude first venous circulation and then arterial circulation. The cuff was then rapidly deflated, precipitating a hyperaemic response phase. Resting measurements were continued for a further 5 min.

The entire pressure cuff protocol was then repeated with the probes placed on the first and third fingers. Finally, and where the volunteer was willing, the protocol was repeated at least 2 days, but under 2 weeks, apart.

2.4. Sampling

Each instrument recorded oxygen saturation every 2 s. The internal clocks of the two instruments were synchronised to within a second. The mean value of the samples taken during the final 10 s of each measurement period was calculated (initial resting, 50, 100, 200 mmHg, end resting) since oxygen saturation had adjusted to the cuff pressure and was relatively stable at this point. This single value of SO2 was used to represent the measurement period.

A typical oxygen saturation plot on the forearm is shown with timings for each phase of measurements in figure 2. The same form of response was found in the finger. The superficial SO2 from the O2C device is reported.

Figure 2.

Figure 2. Showing the typical SO2 plot obtained. The labels under the timeline are the periods referred to in the results. The graph below shows the typical changes seen from all devices. Measurements for comparison were taken in the final 10 s of each period, at the rightmost end of each arrow. The large difference in response at the superficial and deep levels shows the difference between superficial skin oxygenation levels and the deeper muscle oxygenation, which is due to the very different muscle oxygen transport physiology.

Standard image High-resolution image

Note that the O2C can also measure oxygen saturation at a deeper level, several millimetres into the skin. This measurement was recorded and an illustrative example is shown in figure 2. As in this example, the deep saturation measurements bore no relation to the superficial measurements from either device, and indeed showed little or no response to the cuff challenge. Therefore we did not report these results in this manuscript.

2.5. Statistical analysis

The analysis described below was repeated for the forearm and finger sites.

2.5.1. Repeatability.

We calculated the mean and SD of the differences between the first and repeat SO2 measurements in each instrument. This was calculated for each pressure level separately, and also for the mean across all pressure levels.

Student's t-test was used to assess the differences for statistical significance. Given the multiple tests we applied the Bonferroni calculation and a p value of less than 0.01 was considered statistically significant. A significant result would indicate a systematic bias between first and repeat measurements.

2.5.2. Agreement between devices.

Agreement between instruments was assessed using the method of Bland and Altman (1986). Mean difference, standard deviation and standard error were calculated for each cuff pressure level, and for the mean across all pressure levels. The mean difference between devices was assessed using Student's paired t-test; a statistically significant difference (p < 0.01) would indicate a systematic bias between devices.

3. Results

3.1. Summary of subjects

Twenty subjects were recruited and consented. Of these, 16 returned for a repeat test. The age range was 35 ± 2 years (mean ± standard deviation), BMI was 25 ± 4 kg m − 2, systolic blood pressure was 118 ± 9 mmHg, diastolic blood pressure was 70 ± 8 mmHg, and mean heart rate was 61 ± 10 bpm. Skin type was: 1 for 5 subjects; 2 for 6 subjects; 3 for 7 subjects; 4 for 2 subjects. 18 of 20 skin types were in the lower half of the Fitzpatrick scale, scoring 3 or less out of 6. In these subjects there should be negligible interference from melanin in the skin (Hajizadeh-Saffar et al 1990).

3.2. Repeatability

There was no significant difference between the first and repeat measurements at any cuff pressure in either device (tables 1 and 2), but larger standard deviations indicating more test–retest variability were found with the Harrison device.

Table 1. Forearm measurement site: statistics for the repeatability of each instrument, and the comparison of the two. In each case the statistics are: mean ± SD of the differences; 95% confidence interval for the difference; P value of difference according to Student's t-test. Statistically significant results at the p < 0.01 level are shown in bold.

Phase Harrison repeatability (2nd minus 1st) N = 16 O2C repeatability (2nd minus 1st) N = 16 Harrison versus O2C (Harrison minus O2C) N = 20
Mean diff (%SO2) 95% CI (%SO2) Mean diff (%SO2) 95% CI (%SO2) Mean diff (%SO2) 95% CI (%SO2)
Initial rest  − 7.5 ± 14.0  − 14.9 to 0.0 p = 0.05 1.6 ± 9.4  − 3.4 to  + 6.6 p = 0.5  + 6.5 ± 14.8  − 0.4 to  + 13.4 p = 0.07
50 mmHg  − 2.0 ± 7.5  − 6.0 to  + 2.0 p = 0.3 2.1 ± 6.3  − 1.2 to  + 5.5 p = 0.2  − 5.8 ± 7.7  − 9.3 to  − 2.2 p = 0.003
100 mmHg  − 2.2 ± 9.5  − 7.3 to  + 2.8 p = 0.4 1.7 ± 4.9  − 0.9 to  + 4.3 p = 0.2  − 5.4 ± 8.3  − 9.3 to  − 1.6 p = 0.009
200 mmHg  − 3.1 ± 8.2  − 7.5 to  + 1.3 p = 0.2 1.0 ± 5.6  − 2.0 to  + 4.0 p = 0.5  − 5.0 ± 9.2  − 9.3 to  − 0.7 p = 0.02
End rest  − 8.7 ± 12.1  − 15.2 to  − 2.3 p = 0.01 3.3 ± 7.7  − 0.8 to  + 7.5 p = 0.1 3.8 ± 12.4  − 2.0 to  + 9.6 p = 0.2
SUBJECT MEAN  − 4.7 ± 8.2  − 9.1 to  − 0.3 p = 0.04 1.9 ± 5.6  − 1.1 to  + 4.9 p = 0.2  − 1.2 ± 8.7  − 5.3 to  + 2.9 p = 0.6
OVERALL  − 4.7 ± 10.6   1.9 ± 6.8    − 1.2 ± 11.8  

Note. The SUBJECT MEAN value gives a comparison of the per subject mean measured oxygen saturation across all five cuff conditions. The SD is improved by taking the mean of multiple measurements. Note. The OVERALL value gives a comparison of every individual pair of measurements, five pairs per patient, and is the best estimate of overall variability. It is not appropriate to conduct a t-test here because the measurements in each individual are not independent.

Table 2. Finger measurement site: statistics for the repeatability of each instrument, and the comparison of the two. In each case the statistics are: mean ± SD of the differences; 95% confidence interval for the difference; P value of difference according to Student's t-test. Statistically significant results at the p < 0.01 level are shown in bold.

Phase Harrison repeatability (2nd minus 1st) N = 16 O2C repeatability (2nd minus 1st) N = 16 Harrison versus O2C (Harrison minus O2C) N = 20
Mean diff (%SO2) 95% CI (%SO2) Mean diff (%SO2) 95% CI (%SO2) Mean diff (%SO2) 95% CI (%SO2)
Initial rest 7.3 ± 20.2  − 3.5 to  + 18.0 p = 0.2 2.0 ± 18.2  − 7.7 to  + 11.7 p = 0.7  + 0.2 ± 11.1  − 5.0 to  + 5.4 p = 0.9
50 mmHg 5.6 ± 15.3  − 2.6 to  + 13.8 p = 0.2 3.6 ± 15.2  − 4.5 to  + 11.7 p = 0.4  − 8.5 ± 14.4  − 15.2 to  − 1.8 p = 0.02
100 mmHg 4.7 ± 10.9  − 1.1 to  + 10.5 p = 0.1 5.5 ± 12.9  − 1.4 to  + 12.3 p = 0.1  − 6.1 ± 7.4  − 9.5 to  − 2.6 p = 0.001
200 mmHg 2.9 ± 15.2  − 5.2 to  + 11.0 p = 0.5 1.8 ± 12.8  − 5.0 to  + 8.7 p = 0.6  − 5.0 ± 8.9  − 9.2 to  − 0.9 p = 0.02
End rest 9.6 ± 28.2  − 5.4 to  + 24.7 p = 0.2 5.4 ± 21.2  − 5.9 to  + 16.7 p = 0.3  − 2.8 ± 13.5  − 9.2 to  + 3.5 p = 0.4
SUBJECT MEAN 6.0 ± 14.4  − 1.7 to  + 13.7 p = 0.1 3.7 ± 12.8  − 3.1 to  + 10.5 p = 0.3  − 4.4 ± 7.9  − 8.1 to  − 0.7 p = 0.02
OVERALL 6.0 ± 18.6   3.7 ± 16.0    − 4.4 ± 11.5  

Note. The SUBJECT MEAN value gives a comparison of the per subject mean oxygen saturation across all five cuff conditions. The SD is improved by taking the mean of multiple measurements. Note. The OVERALL value gives a comparison of every individual pair of measurements, five pairs per patient, and is the best estimate of overall variability. It is not appropriate to conduct a t-test here because the measurements in each individual are not independent.

3.3. Agreement between devices

The mean SO2 across all subjects at each pressure level and each site from the Harrison device and the LEA O2C is shown in figure 3. Bland–Altman plots, showing the differences between the two instruments, are shown in figures 4 and 5. Note that these figures treat each measurement pair separately, and so there are 20 × 5 = 100 points in each graph.

Figure 3.

Figure 3. Showing the overall means and standard deviations for both forearm and finger measurements sites and for both Harrison and O2C tissue oxygen instruments.

Standard image High-resolution image
Figure 4.

Figure 4. Forearm measurement site: Bland–Altman plot showing the difference between the two devices for the first set of measurements. The O2C measurements were subtracted from the Harrison device measurements. The solid line is the mean difference between these two measurements. The inner dashed lines are twice the standard error on the mean difference. The outer dashed lines are twice the standard deviation i.e. the limits of agreement.

Standard image High-resolution image
Figure 5.

Figure 5. Finger measurement site: Bland–Altman plot showing the difference between the two devices for the first set of measurements. The O2C measurements were subtracted from the Harrison measurements. The solid line is the mean difference between these two measurements. The inner dashed lines are 1.96 ×  the standard error on the mean difference. The outer dashed lines are twice the standard deviation i.e. the limits of agreement.

Standard image High-resolution image

Descriptive statistics for the differences between the instruments are given in tables 1 and 2, one measurement per subject except the OVERALL row with five measurements per subject. There were statistically significant differences in three cases, indicating a systematic bias between devices. There was no clear pattern to these differences by site (2 forearm, 1 finger) but in each case the O2C device measured higher.

For both sites there was a significant correlation between the mean of the O2C and Harrison measurements, and their difference (between the X and Y variables in figures 4 and 5):

Forearm (figure 4): difference (Y) = 0.44 × mean SO2 (X) − 15.3 (r = 0.58, p << 0.001)

Finger (figure 5): difference (Y) = 0.15 × mean SO2 (X)  − 12.0 (r = 0.30, p = 0.002)

In other words, the difference between the two instruments is related to the value of SO2 being measured. The Harrison device measures relatively higher for high SO2, and lower for low SO2.

4. Discussion

In this manuscript we describe a systematic comparison between a general-purpose spectrophotometer with custom software for assessing peripheral oxygenation (the Harrison device) and a commercially available CE marked reference standard for tissue oxygen saturation measurements (the O2C device). We assessed repeatability of the two devices individually, and agreement between them.

4.1. First impressions: tracking of SO2 changes

Subjectively, the tracking of changes in tissue oxygen saturation is in good agreement between both devices (figure 3). First, there is a progressive decrease from baseline as the cuff is inflated. Then both devices track the post-hyperaemic flush response, expected after blood flow is restored to the limb. Finally, there is a return to a resting level close to the baseline. This implies that both devices measure the same underlying physiological parameter. While SO2 recorded at the finger is consistently higher than at the forearm, this is due to the vasculature of the finger site being different to that of the forearm site.

4.2. Repeatability

Repeatability, or test–retest agreement, is measured by the standard deviation of test–retest difference. Summary data are in tables 1 and 2. Test–retest agreement for the O2C device was 6.8% (forearm) or 16.0% (finger). There is limited validation data for the O2C device from the manufacturer, but resting CoVs for the device have been reported in the range 15–20% for non-diabetic adult subjects (Forst et al 2008).

Test–retest agreement for the Harrison device was worse than for O2C, with an overall SD repeatability of 10.6% (forearm) or 18.6% (finger). Of course oxygenation will change in the space of a week, but since the measurements were made on the same individual whose vascular health had presumably not changed dramatically, this reflects the fundamental uncertainty in the clinical measurement of SO2. There was no significant bias from first to repeat measurement for either device, as would be expected and has been previously reported (Beckert et al 2004).

As in clinical practice, the probes were always placed on the same finger, whereas a somewhat arbitrary forearm site was used. However repeatability was better on the forearm for both devices. This suggests that variations are due to some property of the measurement site rather than spatial variations in tissue oxygenation. Circulation to the extremities is inherently variable, and movements of the subject's fingers may induce changes or artefacts in the SO2 measurement.

4.3. Agreement between devices

Taking the mean across all cuff pressures there was no overall bias between devices for measurements at either site (tables 1 and 2). However we recorded significant differences between instruments under some cuff conditions. On closer inspection, these effects were part of an overall correlation between the measured saturation, and the difference between the two instruments. At both sites, the Harrison device measured relatively high for higher saturations, and relatively low for lower saturations. To put it another way the extremes of the Harrison device were greater, and this is also apparent in figure 3. This effect was most marked at the forearm site, but the effect was statistically significant at the finger as well.

We suspect this is most likely a function of the probe design; it is known that with reflectance probes, changing the spacing of emitter and sensor will affect the effective depth of the measurement. It is also clear from figure 2 that deeper measurements respond very differently to the cuff challenge.

4.4. Causes of the measurement inaccuracies

The agreement of the devices in assessing oxygen saturation needs to be considered further. First, oxygen saturation will vary from site to site. Therefore agreement will always be affected by the natural spatial variations of oxygenation whether on the forearm or between fingers.

Probe stability was noticeably worse with the pencil-design Harrison probe than the flat O2C probe (shown in figure 1), and this could explain slightly poorer repeatability of this device. Notably, repeatability was worse on the finger for both devices; probe stability was visibly worse because the finger pulp does not present a flat surface for reliable attachment. The Harrison device might be improved by a probe of a flatter, more stable design, and we have plans for this development of the technology.

4.5. Clinical implications of the measurement inaccuracies

The 95% limits of agreement between the two devices are  ± 30% for the forearm and  ± 22% for the finger. In several cases there are large discrepancies between the two instruments (figures 4 and 5), and many of the measurement pairs differ by more than 10%.

We note that systematic differences between the two instruments are not necessarily problematic, even if the difference is correlated with the measured saturation. Any systematic effect can be corrected once it is understood. Statistical significance indicates just that the systematic difference is relatively large in proportion to the random error, and therefore is unlikely to be due to chance. In our opinion, the key implication of this finding is that nominally equivalent instruments cannot be used interchangeably in research or in clinical practice. Workers should understand what their own instrument is measuring, and report their results or adapt clinical threshold values appropriately.

Random effects are of more concern, because they cannot be corrected in the same way. First, we should consider the random variability after the correctable systematic effects have been removed. The SD errors are similar at both sites: 9.7% (arm) and 11.0% (finger), and are also broadly comparable with the test–retest agreement for the two systems, in the range 6.8–18.6%.

Each SD summarises the random variability between two SO2 measures, either at different times with the same system, or at the same time with different systems. The error per measurement would be approximately √2 smaller, but the 95% confidence limit would be 2 ×  bigger. In summary, while the Harrison device performed less well we note that in every case of test–retest agreement for both devices, the 95% confidence interval would be greater than 10%. Such uncertainty would potentially affect interpretation of the result as the critical hypoxia level in the venous oxygen saturation is typically considered to be in the range 10–20% (LEA 2007). Therefore as we used it, neither device is accurate enough for making confident clinical judgements.

Note however that in this experiment we made a single spot measurement, since we wished to assess the effects of differing oxygen saturation in an explanatory study. Average measurements taken over an area would almost certainly be required to reduce the random error component. This is what is done for our leg amputation level assessments in routine clinical practice, where we make N = 18 flap area measurements to reduce the error by a factor of √N. We note that the Harrison probe is a pencil probe intended for this purpose, whereas the O2C probe is more suited to be taped to the skin for time series studies.

Spectrophotometric measurements of SO2 have the potential to be much more widely implemented than they are now. The viability of transplanted organs could be monitored using endoscopic spectrophotometry (Beckert et al 2004, Harrison et al 1992), and this could also be applied to other internal organs (Frank et al 1989, Hoper and Funk 1994). Oxygenation of the brain in neurosurgery could be non-invasively measured to ensure minimal damage to the brain (Harrison 2002, Hoper and Gaab 1994). The low cost core technology of this device would mean that spectrophotometric measurements could potentially be available to a larger healthcare market, and that more applications could come to the fore. It would also enable this technology to be extended to other parts of the world e.g. in the Caribbean where there is a high prevalence of diabetes (Barcelo et al 2003) or certain economically developing countries such as Mexico, India and China, currently the main growth areas for Type II diabetes.

5. Summary

Tissue oxygen saturation (SO2) measurements have the potential for far wider use than at present but are limited by device availability and portability for many potential applications. We compared the performance of a small, low-cost general-purpose spectrophotometer (the Harrison device) with a commercial instrument, the LEA O2C.

As reported in previous studies, we found better agreement in relative changes to SO2 than for absolute measurements. For example both instruments tracked the hyperaemic flush response (Forst et al 2008). However while there was no overall bias between devices, the differences were outside of a clinically acceptable range. In addition, repeatability was poorer in the Harrison device. Errors were attributed to the stability of the Harrison probe, as well as the natural SO2 variations across the skin surface.

We cannot at present conclude that a general-purpose device is a direct replacement for the LEA O2C. However this may be improved by redesign of the probe and/or making repeat measurements as part of the clinical protocol.

Please wait… references are loading.
10.1088/0967-3334/35/9/1769