Estimating Lifetime or Episode-of-Illness Costs Under Censoring

August 23, 2017

How can you estimate an individual’s total lifetime cost of medical care?  For people who die in your sample, this is simple.  In most data sets, however, not all individuals will die during the period of observation.  Thus, the data set is censored for those who do not die.

In addition, many standard hazard models do not allow for researchers to disaggregate the effects of covariates on survival and the intensity of utilization.  Both factors have an effect on cost.

Assuming that censoring is random, Basu and Manning (2010) describe a method to calculate expected lifetime costs for each individual as follows:

  1. Estimate Survival Probabilities. Use a flexible survival model, such as an accelerated failure time model based on the generalized gamma distribution for time, to estimate the individual’s survival function after taking into account censoring. Let Sj(X) and hj(X) be the estimated survivor function and the hazard function for an interval indexed by j. The observation The predictions are obtained for all time periods for all patients.
  2. Estimate cost among patients who died. Among those subject intervals, (aj-1, aj], where we observe the subject to die, estimate a generalized linear model (or models if a two-part specification is necessary) for the observed costs after conditioning on covariates X.  One can also condition on the time of death within the interval as well. Use parameter estimates from this model to predict costs, μ1j(X), for every subject-interval in the data. To account for the stochastic nature of U within that interval (i.e. to account for what would the costs be if the patient died inside that interval but at different times), one simply averages the predictions that are conditional of each value of U after weighting with the observed distribution of U among intervals where patients are observed to die. Therefore, μ1j(X)=∫μ1j(X,U) dF(U|abobsb+1).
  3. Estimate the cost among subjects not observed to die.  Next, among those subject intervals, (aj-1, aj], where patients are not observed to die but excluding those where we only observe costs over a partial duration due to censoring, estimate a generalized linear model (or models if a two-part specification is necessary) for the observed cost functions after conditioning on covariates X. We use parameter estimates from this model to predict costs, μ2j(X), for every subject-interval in our data. We do not use the subject intervals where censoring occurs in our estimation in this part. This allows us effectively to allow for continuous censoring times.


Thus, the resulting cost function for interval j for any individual is given by:

  • μj(X)=Sj(X)*[hj(X)*μ1j(X) + (1-hj(X))*μ2j(X)]


There are a number of benefits of using this framework. First, Basu and Manning show that this estimator can “decompose the covariate effects on total costs into part mediated by survival effects and another mediated by intensity of use.” Second, this method allows for death to take place any time during each interval rather than solely at the end of an interval. Third, the model “allows for separate estimators to be used for end-of-life and non-end-of-life periods.” The separate estimators are especially useful in cases where end-of-life cost differs significantly from the regular course of care. For instance, one study demonstrated the there is a U-shaped pattern of cost history among cancer patients with the left side of the U corresponding to initial treatment and the right side reflecting a substantial spike in costs during the last 6 months of life.

For those interested, Basu and Manning also provide a simulation and empirical application to demonstrate the utility of their econometric specification compared to earlier models.