Title: Scalable distributional regression for wearable devices, adjusting for informative non-wear bias (plus hybrid statistical-AI generative models for complex structured data)
Abstract: Many modern instruments (wearables, imaging, geospatial sensors) generate subject-level data streams with thousands to millions or more measurements. Collapsing these to simple summaries (e.g., means) can obscure important structure. We present a general distributional regression framework for distribution-on-scalar and distribution-on-function settings. Distributions are modeled via subject specific empirical quantile functions and represented with quantlet basis functions that provide a compact and near-lossless representation for the subject-specific distributions, enabling joint inference on entire distributions and flexible post hoc computation of any distributional summary to characterize the differences.
Our approach is built on the Bayesian Functional Mixed Model (BayesFMM) framework, accommodating arbitrary mixes of discrete/continuous predictors with smooth (nonlinear) effects, multiple random-effects levels for multilevel designs, nonstationary spatial/temporal dependence, and Gaussian or heavier-tailed errors for robustness. A basis-projection strategy makes the method computationally scalable in both the number of subjects and the number of repeated measurements per subject. Simulation studies show greater efficiency than fitting separate models over a grid of quantiles.
To address informative missingness, we incorporate functional predictors that encode time-of-day non-wear patterns, effectively calibrating each subject’s distribution to a common non-wear profile. Simulations demonstrate that non-wear biases both full-distribution inference and scalar summaries, and we show that our proposed regression calibration approach mitigates this bias more effectively and much more computationally efficiently than imputation.
Applied to the TEAN adolescent accelerometer study, we confirm and refine associations of adolescent activity levels with age, BMI, and walkability, characterizing nuanced effects for these continuous predictors, and identifying which parts of the activity distribution shift without predefining summaries.
Time permitting, I will briefly preview ongoing work using statistical generative AI to build inferential frameworks for complex object data while preserving key internal structure, with application to handwritten digits data, and show how both the distributional regression for wearable devices and generative AI model for digits data are encompassed by one general methodological framework.