Yuqi Tian dissertation defense – March 16
PhD candidate Yuqi Tian will be defending her dissertation on Wednesday, March 16, at 9 a.m. Central Time. All are invited. To attend the defense, which will take place online, contact our department at biostatistics[at]vumc[dot]org for the link.
Semiparametric Cumulative Probability Models for Skewed, Censored, and Clustered Continuous Response Data
Continuous response variables are often skewed and need to be transformed to meet modeling assumptions, but the choice of the transformation can be challenging. Censored and clustered continuous response data are also common in practice. We propose methods to address skewed, censored, and clustered continuous response data based on a semiparametric linear transformation model for ordinal responses, the cumulative probability model (CPM). First, via extensive simulations, we compare CPMs with a parametric transformation model, the most likely transformation model (MLT), for fitting continuous response data. Both approaches model cumulative distribution functions (CDFs) and require specifying a link function. With CPMs, the transformation is estimated nonparametrically by treating each continuous response as a distinct level. With MLTs, the transformation is parameterized using basis functions. Both methods have good performance. We apply them to an HIV biomarker study. Second, we propose new approaches based on CPMs to handle detection limits (DLs), where a variable is unable to be measured outside a certain range. Most approaches for DLs in response variables implicitly make parametric assumptions on distributions outside DLs. CPMs are rank based and can handle mixed distributions of continuous and discrete response variables. While observations inside DLs are continuous, those outside DLs are generally put into discrete categories. With a single DL, CPMs assign values outside the DL as the lowest /highest rank. With multiple DLs, CPM likelihoods can be modified to distribute probability mass. We demonstrate our approaches with simulations and two HIV data examples. Finally, we extend CPMs to fit clustered continuous response variables based on generalized estimating equation methods for ordinal responses. With our approach, estimates of marginal parameters, CDFs, expectation, and quantiles conditional on covariates can be obtained without pre transformation of the potentially skewed continuous response data. Computational challenges arise with large numbers of unique values of the continuous response variable. We propose two computationally efficient approaches to fit CPMs on clustered continuous response variables with different working correlation. We illustrate our approaches with simulations and two studies modeling longitudinal measurements of CD4:CD8 ratio and lung function respectively.