Intra- and inter-observer agreement of brain MRI lesion volume measurements in multiple sclerosis. A comparison of techniques.
Academic Article
Overview
abstract
The measurement of MRI lesion load in multiple sclerosis is increasingly being used to evaluate the natural history of the disease and to monitor the efficacy of treatments. If, as might occur in multicentre studies, lesion load is measured by several observers in different patients or by the same observer in serial scans, it would be necessary to utilize a technique which provides results with high inter- and intra-observer agreements. This study was performed to evaluate the intra- and inter-observer agreement of semi-automated lesion volume measurement using thresholding, and to compare them with those obtained using an arbitrary scoring system (ASS) and a quantitative manual tracing method (MTM). Brain MRIs were obtained for 20 clinically definite multiple sclerosis patients and were evaluated independently by three observers. The median intra- and inter-observer agreements were, respectively, 88.5% (range 69.0-96.8%) and 79.0% (range 73.3-98.3%) using the ASS, 95.0% (range 85.1-99.4%) and 93.4% (range 77.3-98.3%) for the MTM, 96.3% (range 94.2-98.9%) and 93.7% (range 83.8-98.3%) for the semi-automated technique. The intra- and inter-observer agreements for the semi-automated technique increased to 98.5% (range 96.3-99.8%) and 96.1% (range 90.5-98.6%) when a consensus in the choice of threshold for lesion segmentation was reached. The intra- and inter-observer agreements were significantly greater for the semi-automated method compared with both the arbitrary scoring and the MTMs. The intra-observer variability for the semi-automated technique was significantly lower (P < 0.0001) than the inter-observer variability obtained using the same technique. These data indicate that it is possible to obtain high intra- and inter-observer agreements using a semi-automatic thresholding technique to quantify lesion volumes in multiple sclerosis. The technique may prove useful in multicentre studies, in which a single observer is still preferable.