Empirical assessment of the assumptions of ComBat with diffusion tensor imaging

Michael E. Kim, Chenyu Gao, Leon Y. Cai, Qi Yang, Nancy R. Newlin, Karthik Ramadass, Angela Jefferson, Derek Archer, Niranjana Shashikumar, Kimberly R. Pechman, Katherine A. Gifford, Timothy J. Hohman, Lori L. Beason-Held, Susan M. Resnick, Stefan Winzeck, Kurt G. Schilling, Panpan Zhang, Daniel Moyer, and Bennett A. Landman. “Empirical Assessment of the Assumptions of ComBat with Diffusion Tensor Imaging.” Journal of Medical Imaging (Bellingham), vol. 11, no. 2, 024011, March 2024. doi:10.1117/1.JMI.11.2.024011.

Diffusion tensor imaging (DTI) is a magnetic resonance imaging technique that provides unique insights into white matter microstructure in the brain. However, it is susceptible to confounding effects introduced by scanner or acquisition differences. ComBat is a leading approach for addressing these site biases. Despite its frequent use for harmonization, ComBat’s robustness towards site dissimilarities and overall cohort size has not yet been evaluated in the context of DTI.

To address this, we matched 358 participants from two sites to create a “silver standard” cohort for multi-site harmonization. We harmonized mean fractional anisotropy (FA) and mean diffusivity (MD) calculated from participant DTI data for regions of interest defined by the JHU EVE-Type III atlas. To quantify the reliability of ComBat, we performed bootstrapping over 10 iterations at 19 levels of total sample size, 10 levels of sample size imbalance between sites, and 6 levels of mean age difference between sites. We measured three key metrics: (i) β_AGE, the linear regression coefficient of the relationship between FA and age; (ii) γ_sf, the ComBat-estimated site-shift; and (iii) δ_sf, the ComBat-estimated site-scaling. We evaluated the reliability of ComBat by calculating the root mean squared error (RMSE) in these metrics and examined the correlation between the reliability of ComBat and the violation of model assumptions.

Our results indicate that ComBat performs reliably for β_AGE when the total sample size is greater than 162 and the mean age difference between sites is less than 4 years. The assumptions of the ComBat model regarding the normality of residual distributions are not violated as the model becomes unstable.

In conclusion, before harmonizing DTI data with ComBat, it is crucial to examine the input cohort for size and covariate distributions at each site. Direct assessment of residual distributions is less informative on stability than bootstrap analysis. We advise caution when using ComBat in situations that do not conform to the identified thresholds.

After registration of the JHU EVE-III Atlas, mean FA values were calculated in all the regions for each participant in the silver standard cohort. A point in the experimental space is “feasible” if the sample size for either site is at least
N = 6, the imbalance level does not result in N for either site exceeding the available number of participants for that site, and if sampling of participants yielded a covariate shift within 1 year of the target age difference between sites. For each feasible point in the experimental space, 10 bootstraps were subsampled from the silver standard cohort, and the FA values for the subsamples were harmonized by ComBat. The resulting parameters were then compared to those from the silver standard to determine reliability of ComBat at that location in the experimental space.