Comparative effectiveness studies of cancer therapeutics in observational data face confounding by patterns of clinical treatment over time. The validity of survival analysis in longitudinal health records depends on study design choices including index date definition and model specification for covariate adjustment.
Overall survival in cancer is a multi-state transition process with mortality and treatment switching as competing risks. Parametric Weibull regression quantifies proportionality of hazards across lines of therapy in real-world cohorts of 12 solid tumor types. Study design assessments compare alternative analytic models in simulations with realistic disproportionality. The multi-state simulation framework is adaptable to alternative treatment effect profiles and exposure patterns.
Event-specific hazards of treatment-switching and death are not proportional across lines of therapy in 12 solid tumor types. Study designs that include all eligible lines of therapy per subject showed lower bias and variance than designs that select one line per subject. Confounding by line number was effectively mitigated across a range of simulation scenarios by Cox proportional hazards models with stratified baseline hazards and inverse probability of treatment weighting.
Quantitative study design assessment can inform the planning of observational research in clinical oncology by demonstrating the potential impact of model misspecification. Use of empirical parameter estimates in simulation designs adapts analytic recommendations to the clinical population of interest.
- • Mortality and treatment switching survival times are competing risks in a multi-state model.
- • Hazards are not proportional across lines of therapy in real-world data.
- • We compare retrospective study designs for treatment effects and effect modifiers.
- • Allowing repeated observations per subject with robust std errors is recommended.
Secondary data studies of clinical exposures and outcomes from electronic health records (EHR) provide novel opportunities for large-scale comparative effectiveness research to augment clinical trials and improve medical decision making. When combined with comprehensive genomic profiling, EHR studies can investigate treatment effects in molecularly-defined subgroups where biological mechanisms suggest possible effect modification. Rare biomarker subgroups can suffer from severe sample size constraints , prompting questions about whether and how researchers can best pool information from patients at different moments in their cancer journey (e.g., immediately after metastatic diagnosis or many lines of therapy later).
For example, a targeted therapy initially approved in one type of cancer may be occasionally prescribed to patients with the same biomarker and a different primary disease site. Observational EHR data can provide evidence about therapeutic utility in these novel settings where there is not sufficient commercial incentive for a clinical trial. Drugs approved under accelerated regulatory pathways in rare indications are of particular interest for efficacy validation in real-world clinical data . Oncologists are more likely to try experimental treatment options when standard treatments have been exhausted, introducing substantial risk of confounding bias in retrospective studies of novel therapeutics where standard of care treatments tend be prescribed earlier in the patient journey than the novel therapy of interest.
Causally-informed retrospective studies in observational data use statistical techniques designed to reduce or eliminate bias, but their performance depends on the unknown data generating process. Simulation studies with data generation parameters informed by real data offer the opportunity to evaluate study designs and estimator operating characteristics in realistic settings. Using parameters estimated in EHR-derived clinicogenomic data from several tumor types, we present simulation studies mimicking real-world treatment switching and mortality patterns to develop practical analytic guidance in two key areas: (1) definition of the set of survival observations to be included in the analytic data sample from longitudinal records where a patient may have multiple eligible treatment initiation dates, and (2) specification of an adjusted proportional hazards regression model for estimation of treatment effects.