Optical coherence tomography of basal cell carcinoma: influence of location, subtype, observer variability and image quality on diagnostic performance

We previously described the principal results from an observational, prospective, multicentre, clinical trial of the diagnostic value of optical coherence tomography (OCT) for basal cell carcinoma (BCC) in a clinical setting. In this trial, much additional useful information was gathered that warranted further analysis, presented here.


Summary
Background We previously described the principal results from an observational, prospective, multicentre, clinical trial of the diagnostic value of optical coherence tomography (OCT) for basal cell carcinoma (BCC) in a clinical setting. In this trial, much additional useful information was gathered that warranted further analysis, presented here. Objectives To investigate the influence of candidate diagnostic criteria, OCT image quality, lesion location, and observer confidence and interobserver variability on the diagnostic performance of OCT, and to assess its potential use for diagnosis of BCC subtypes. Methods A total of 234 clinically unclear 'pink lesions' were evaluated in three steps: after clinical examination, after adding dermoscopy and after adding OCT. In addition to the diagnoses (including lesion subtype), observers recorded which of 15 diagnostic criteria the OCT image contained, their confidence in the diagnoses, the OCT image quality and the anatomical location of the lesion. Results Diagnostic performance of OCT did not depend on the lesion's anatomical location. Good OCT image quality was correlated with improved diagnostic performance, but diagnostic performance for lesions with mediocre image quality was still better than by clinical and dermoscopic examination. The main reason for reduced image quality was superficial scales and crusting. Observer confidence in diagnosis was correlated with diagnostic performance. Interobserver diagnostic performance was consistently higher than clinical examination and dermoscopy across all sites. BCC subtype could be determined with moderate accuracy, but further independent image markers are required. Conclusion OCT is useful in the diagnosis of BCC.
What's already known about this topic?
• Optical coherence tomography (OCT) is an emerging imaging modality that has been shown to have utility in the noninvasive diagnosis of basal cell carcinoma Allergan; and has received speaker's honoraria from Almirall Hermal, Biofrontera, Galderma, Meda Pharma and Janssen-Cilag. C.B. has received speaker's and advisory board member's honoraria from Almirall, Biofrontera, Galderma and LEO Pharma; and has been involved in clinical trials sponsored by Biofrontera and LEO Pharma. *Plain language summary available online DOI 10.1111/bjd.16154 (BCC), and is more sensitive and more specific than clinical or dermoscopic examination alone.
What does this study add?
• Lesion location does not affect diagnostic performance with OCT. • Poor OCT image quality is associated with superficial scales and crusting, reducing diagnostic performance, but in these cases diagnosis with OCT is better than by clinical or dermoscopy examination alone.
• Observers' diagnostic confidence increases when using OCT and their performance reflects this.
• Diagnostic performance is consistent between trained observers. • BCC subtype can be diagnosed from OCT images with moderate accuracy.
Optical coherence tomography (OCT) is gaining acceptance as a useful device to aid in the diagnosis of basal cell carcinoma (BCC). Several studies have been published demonstrating that OCT improves diagnostic accuracy compared with clinical and dermoscopic methods alone. [1][2][3][4][5] For example, the baseline results in a previous report of our study are reproduced in Table 1. 4 However, further work is required to determine the key factors that affect diagnostic performance with OCT, the most useful OCT image markers, interobserver variability, and the potential to identify reliably and accurately BCC subtypes. To answer these questions, further analysis of the dataset collected in our study was performed and reported herein.

Materials and methods
This was an investigator-initiated, phase IV, observational, prospective, multicentre trial conducted in six institutions in Germany from April 2013 to March 2014. Michelson Diagnostics Ltd (previously based in Orpington, U.K. and now in Maidstone, U.K.) provided the OCT equipment. The details of the multibeam 'VivoSight' OCT equipment are described elsewhere. 6 Inclusion criteria for the study were the presence of a clinically unclear erythematous papule or plaque ('pink lesion') with clinical suspicion of BCC. Lesions that were clinically obvious BCCs and pigmented lesions were excluded from the study. Patients had to be 18 years of age or older and give written informed consent; patients with unstable or uncontrolled clinically significant medical conditions were excluded.
Initially, observers recorded their clinical diagnosis, including whether or not a BCC, the BCC subtype or other lesion type, and their confidence in this diagnosis expressed as a percentage. Consecutively, dermoscopy was performed and the revised conclusions and diagnostic confidence recorded. Finally, the lesion was scanned with OCT and the evaluation was performed once more. Additionally, the assessment of OCT image quality for the lesion was documented on a 4-point scale (excellent/good/mediocre/poor). It was also recorded which of 15 'image biomarkers' that might possibly relate to BCC diagnosis were present in the OCT images, in the observer's judgement. Finally, the lesion was either biopsied (37% by shave biopsy, 29% by punch biopsy) or excised (34%). All assessments were documented before histological results were available.
A total of 256 lesions were examined. Histology was missing for 21 lesions, and one lesion lacked OCT and dermoscopy, leaving 234 lesions with data from clinical, dermoscopic and OCT examination. For two of these lesions, the anatomical location was not recorded and for 13 lesions the OCT image quality was not recorded. Of the 234 lesions, 141 (60Á2%) were identified as BCC by histological analysis.
We analysed the data from these 234 lesions to assess the dependence of diagnostic performance upon (i) anatomical location (trunk/head/limb); (ii) OCT image quality; (iii) observer confidence level in the diagnosis; (iv) interobserver variability; and (v) lesion type and subtype. We also performed univariate logistic regression analysis of the image biomarkers to assess which were statistically significant for each subtype, and then calculated the accuracy of identification of the BCC subtype for the 141 histologically identified BCCs. We used the XLSTAT statistical analysis Excel add-in (Addinsoft, New York, NY, U.S.A.) to perform univariate logistic regression analysis of the dataset.
Local ethics committees approved the research protocol and all research was conducted according to the principles of the Declaration of Helsinki.

Results
Lesion location Table 2 shows the diagnostic performance of each technique, grouped into the main anatomical areas for lesion location. For two lesions, the location was not recorded. No significant variation in OCT diagnostic performance was observed between lesions located on the head and limbs. The calculated specificity for lesions on the trunk was a little lower than for head and limbs but still just within the limits of statistical variability for this small sample size (P < 0Á05). However, it is noticeable that specificity for trunk lesions was also lower for dermoscopy owing to high false-positive diagnoses of actinic keratoses as BCCs.

Optical coherence tomography image quality
The observers graded OCT image quality as poor, mediocre, good or excellent, based on their experience.  Table 3 shows OCT diagnostic performance vs. OCT image quality.
Only two lesions were classified as having poor image quality; a further 13 were not graded. The results show that lower OCT image quality adversely affects diagnostic performance. However, specificity and negative predictive value (NPV) remain higher than calculated for clinical examination or dermoscopy, even for lesions with mediocre image quality.
We examined the data for reasons why the OCT image quality might be reduced. We noticed that medium or poor OCT image quality was associated with lesions exhibiting superficial scales or crusting (78%) or thickened epidermis (76%), whereas the proportion of lesions with these properties fell to 54% and 53% for good OCT image quality, respectively, and to 35% and 26% for excellent OCT image quality, respectively. Thus, poor image quality was associated with superficial scales or crusting and a thickened epidermis.

Diagnostic confidence
Each observer recorded an estimate of their own confidence in the diagnosis, for each lesion, in a three-step model: for clinical examination alone, after adding dermoscopy and then after using OCT in addition to clinical and dermoscopy examinations. The average diagnostic confidence for the clinical diagnosis of BCC was 59%; additional dermoscopic evaluation improved this slightly to 64% and additional OCT increased it to 83%. These results mirror the improvements in actual diagnostic accuracy achieved (67% for clinical; 77% for additional dermoscopy; 88% for additional OCT), suggesting that the observers were able to assess quite accurately their own diagnostic ability. Table 4 shows the diagnostic performance for lesions for each of three bands of 'OCT diagnostic confidence'.
The table shows that diagnostic accuracy, NPV, positive predictive value and sensitivity all improve with diagnostic confidence. The anomalously low specificity for the 'very high' confidence band is related to the small size of the number of true-negative and false-positive lesions (3 and 10, respectively). In the highest confidence category, there were no false negatives, resulting in 100% sensitivity and NPV for these lesions. Table 5 shows the variation in diagnostic performance using OCT between five observers. The sixth observer did not examine enough lesions to provide a useful sample. The 95% confidence interval (CI) is also given for the global result. Because of the prospective nature of the study and to reflect the usual workflow of diagnosis, the observers assessed only their own OCT images. Therefore, the number of images varied between the observers, depending of the number of cases at each centre.

Interobserver variability
Statistically, the results are consistent with the calculated 95% CI. Overall accuracy was high for all observers, with little variation. These results suggest that the OCT imaging criteria for BCC diagnosis were effectively learned and then applied consistently by each observer.

Lesion type and subtype
We analysed the diagnostic performance by lesion type and subtype. For each histologically confirmed lesion class, we calculated the diagnostic performance for each method (proportion of total number of confirmed lesions of that subtype that were correctly identified as BCC, or, for non-BCC types, as not BCC). The results are shown in Table 6, which also shows the number of histologically confirmed lesion types and subtypes in the sample set. The most common BCC subtype was superficial BCC (44% of all BCCs), followed by nodular BCC (22% of all BCCs) and then infiltrative. 'Other BCC' included two pigmented BCCs, five cystic BCCs, five exulcerated BCCs and seven multicentric BCCs.   In Table 6, a low result means that subtype is challenging to diagnose as BCC, or that type is difficult to diagnose as not BCC, and a high result means that it is easy to diagnose correctly as BCC or not BCC. For example, 100% of the 12 infiltrative BCCs were identified as BCC by all three methods, but clinical diagnosis correctly identified only 39% of actinic keratosis as not BCC vs. 81% with OCT. For all BCC subtypes and other lesion types, diagnostic performance was higher with OCT.
These results suggest that OCT aids diagnosis of BCC for all BCC subtypes and that OCT also aids correct diagnosis of other lesion types as not BCC.
The observers were asked to indicate which of 15 image 'biomarkers' were present in the OCT images. The candidate biomarkers were selected prospectively based upon the observers' experience and from previously published research. [7][8][9][10][11][12][13] Our objective was to determine which OCT image biomarkers are of most diagnostic value, and also to find out if there are clear differences between BCC subtypes. A perfect biomarker would be present in all BCC OCT images and in no non-BCC images, or vice versa. A biomarker with little value appears equally often in both BCC and non-BCC lesions. The biomarkers were assessed by comparison with the healthy adjacent skin of the same participant. Table 7 shows, for each biomarker and each classification, whether it was positively or negatively correlated, and if so whether strongly correlated with statistical significance (P < 0Á05) or weakly correlated (P < 0Á15). A blank entry indicates that the biomarker was only very weakly correlated or not correlated.
For identifying BCCs, the most useful positively correlated image features are ovoid structures (with or without bright centres), dark borders to the dermis, black areas/cysts and bulges/cones intruding into the dermis. A thickened epidermis was negatively correlated.
Concerning subtypes of BCC the most useful markers for the nodular subtype were the presence of ovoid structures and, especially, the presence of black areas/cysts. The latter feature was a powerful discriminator vs. the superficial BCC subtype for which this feature was negatively correlated. Other useful features for superficial BCC subtype were the presence of a dark border to the dermis and the presence of bulges/cones extending from the epidermis into the dermis. Negatively correlated features included thinned epidermis and surface scales. The infiltrative BCC subtype was distinguishable only by the presence of 'shoal of fish'-like narrow, elongated structures in the dermis.   Data are % unless otherwise indicated (n = 234). PPV, positive predictive value; NPV, negative predictive value. Figures 1-3 show typical OCT images of nodular, superficial and infiltrative BCC subtypes exhibiting these characteristic features.
We also examined the false-negative and false-positive OCT results for insight into why the misdiagnoses occurred. There were six false-negative OCT diagnoses. All six had superficial scales/crust marker; as a result, four of them had only 60% observer confidence, the other two 80%. Four of them had 'broken/poorly defined dermoepidermal junction (DEJ)', which is a marker for BCC, but none had dark or brightcentred ovoid structures or dark dermis borders. Just one of the false negatives had black areas/cysts, a strong marker for nodular BCC. All six were histologically identified as superficial BCC. In summary, the observers did not see the significant markers for BCC, probably because of the superficial crusting, and could not make a positive identification of BCC on a broken DEJ alone.
There were 22 OCT false positives, of which the majority (n = 12) were misdiagnosed as superficial BCC, three as nodular, four as infiltrative, one as other and two not specified. The histological identification of these lesions included actinic keratosis, lentigo solaris, scars, granuloma, lichen planus, and sebaceous and adnexal dysplasia. The most common OCT image markers of these false-positive lesions were thickened epidermis (70%), broken/poorly defined DEJ (60%), dark ovoid structures (50%), dark borders (50%) and superficial scales/crusting (50%). We found that 17 of the 22 (77%) OCT false positives had at least one of dark ovoid structures, dark borders and ovoid structures with bright centres. It is not  surprisingand perhaps inevitablethat these were identified as BCCs. The univariate logistic regression produced a predictive model for identifying the BCC subtype from the image markers. From the statistics of markers in each subtype, it calculates weights for each marker and then, for a given lesion, the resulting probability that it is that subtype. The most probable subtype is then the model prediction.
The model prediction was compared with the performance of the observers in identifying BCC subtype based on their clinical assessment, dermoscopy and, finally, OCT. The number of correct predictions was calculated as a percentage of the total number of histologically confirmed lesions of that subtype (see Table 8).
The best results were obtained by the observers using OCT, not by the regression model. This could be for a number of reasons: (i) the observers might be using other additional image markers, based on their experience; (ii) the observers may not have recorded the presence of the markers on the case report forms with consistent accuracy, leading to a biased model; (iii) there may be too few examples in the dataset to enable an accurate model (especially for the infiltrative BCC subtype).
Nevertheless, the observers' subtype estimation was only 62-72%, i.e. incorrect in over a quarter of all cases. Further work will need to be done to find additional independent biomarkers to improve on these data.

Discussion
This detailed analysis has revealed some potentially useful criteria that may be suitable to apply when using OCT for assisting in the diagnosis of BCC.
Reduced OCT image quality is associated with superficial scales or crusting, although observer diagnostic performance with mediocre or better OCT image quality was still better than by clinical or dermoscopic assessment alone.
Nevertheless, it seems clear that extra care should be taken when superficial scales or crusting are present before deciding on a therapy.
All observers were able to judge their own diagnostic accuracy for a given lesion, downgrading their confidence in their decision when the OCT image quality was poor or the lesion characteristics less typical. Clearly, diagnostic confidence for each lesion depends not only on the quality of the OCT image and the difficulty of the lesion, but also on the experience of the observer. In this study, only experienced observers participated. Other workers have shown that diagnostic performance with OCT improves with observer experience. 7 Based on this, we suggest that it might be useful practice, upon application of OCT, to assign a 'confidence level' in the diagnostic decision, e.g. very high, high, medium, low. This could then be used to determine further actions such as an additional biopsy. Also, if this information is documented systematically, it could also be used to track the increase in expertise of OCT image interpretation of the user over time.
The analysis of image markers and subtype identification shows that there are statistically significant correlations between image features and subtypes, although it is acknowledged that the sample size is rather small and these findings should be treated with caution. Table 7 summarizes these correlations. There are clear differences between characteristics of nodular, superficial and infiltrative BCCs, and observers were more accurate in identifying the specific subtype when using OCT than by clinical examination and dermoscopy alone.
For nodular BCCs, ovoid structures with cysts in the dermis are the most prominent marker, whereas for superficial BCCs, epidermal bulges intruding into the upper dermis, surrounded by a dark rim, are typical findings. These image markers for BCC have been previously reported by other workers. [10][11][12][13][14] Infiltrative BCCs are characterized by ill-defined, narrow, dark, longish structures in the dermis, surrounded by brighter tissue, resembling a shoal of fish. These findings demonstrate that OCT visualizes the histopathology of architectural features of BCCs, i.e. the nodules and cysts of nodular BCCs, as well as the small tumours deriving from the basal epidermis in superficial BCCs and the irregular, ill-defined dermal tumour cords of infiltrative BCC, surrounded by a fibrotic stroma. This is in contrast to clinical examination and dermoscopy, where only the surface and very superficial findings can be assessed, lacking resolution and penetration depth.
Nevertheless, the diagnostic accuracy for each subtype was limited to 60-70%. However, it should be noted that the study excluded clinically obvious BCCs, suggesting that if OCT had been used on these types of BCC the diagnostic accuracy of defining the subtype would have been better.
Further image markers would be helpful; one possibility could be features observed in the en face images, which were not assessed in this study but have been examined elsewhere; 14,15 another possibility could be vessel morphology in the lesions, which can be identified by the new dynamic OCT technology. [16][17][18][19] With regard to limitations, firstly, the sample size was limited to 234 lesions. For the analysis of further subcategories the sample size of certain subgroups was too small to enable definitive conclusions to be drawn and only indications and trends can be described. Furthermore, there are many types of non-BCC skin lesions, some of which could be 'confounders' for BCC, that were not seen in this trial. Therefore, it may be worthwhile repeating this trial with a larger sample size. Secondly, the study was performed by dermatologists experienced in noninvasive imaging. It is therefore not necessarily representative of the performance that would be achieved in routine use by nonspecialized dermatologists.
Thirdly, many of the observers' results, such as assessments of OCT image quality, their own diagnostic performance and the presence of specific image biomarkers, were somewhat subjective. Other observers might, using the same dataset, obtain different results by their own assessment, or with more quantitative image analysis methods and tools, although the results shown in Table 5 show a good degree of consistency between the observers in this study.
A study by Olsen et al. compared the diagnostic performance for actinic keratosis and BCC between a group of five dermatologists with experience in OCT image interpretation and five with no experience. 7 They reported that skilled observers performed better than unskilled, and that there were no significant differences in interobserver agreement within each group. They also reported that high diagnostic accuracy was associated with higher observer confidence.
Fourthly, the histological results that provided the diagnostic standard might not be 100% accurate. Sixty-six per cent of the lesions were biopsied rather than excised, and the biopsy might in some cases have missed a tumour, leading to potential over-reporting of false positives by all three diagnostic methods, orin the case of a 'mixed' BCCan error in the accuracy of subtype identification.
Despite these limitations, we believe that the results of this study provide helpful implications for the standard of care and management of BCC using OCT.
For successful use of OCT, it is essential that physicians are trained in OCT scanning and image interpretation. It is helpful to note from Table 4 that diagnostic performance increases with observer confidence in their diagnosis. We expect, therefore, that OCT trainees will initially be able to diagnose BCCs with very obvious image features, and will steadily progress to less clear examples as their experience and expertise grow.
Further analysis of the data from this multicentre, prospective, observational, diagnostic trial of diagnosis of BCC with OCT has shown that: (i) OCT improves the differential diagnosis of BCC vs. other lesion types in clinically suspicious lesions, compared with clinical and dermoscopic diagnosis alone as reported previously; (ii) there are a number of useful 'image biomarkers' that aid the OCT user in diagnosing BCCs vs. other lesion types, but further research is needed to find additional new independent markers; (iii) poor OCT image quality is associated with superficial scales and crusting, and this affects the diagnostic performance of OCT, but diagnosis aided by OCT in these cases is still better than by clinical or dermoscopy examination alone; (iv) observers' own confidence in their diagnosis of BCC increased when using OCT vs. clinical and dermoscopy alone, and their actual diagnostic performance reflected this (i.e. they were more likely to be right when they had high confidence); (v) observer diagnostic performance was consistently better with OCT than with clinical examination or dermoscopy alone across all test sites. These conclusions support the targeted use of OCT to aid the diagnosis of BCC, potentially improving the standard of care by enabling more early-stage BCCs to be detected and by supporting the use of noninvasive treatment options.