survivalstan package


survivalstan.models module

survivalstan.sim module

Functions to simulate failure-time data for testing & model checking purposes

survivalstan.sim.sim_data_exp(N, censor_time, rate)[source]

simulate true lifetimes (t) according to exponential model

N: (int) number of observations censor_time: (float) uniform censor time for each observation rate: (float, positive) hazard rate used to parameterize failure times
pandas DataFrame with N observations, and 3 columns:
  • true_t: “actual” simulated failure time

  • t: observed failure/censor time, given censor_time

  • event: boolean indicating if failure event was observed (TRUE)

    or censored (FALSE)

survivalstan.sim.sim_data_exp_correlated(N, censor_time, rate_form='1 + age + sex', rate_coefs=[-3, 0.3, 0])[source]

simulate true lifetimes (t) according to exponential model

N: (int) number of observations censor_time: (float) uniform censor time for each observation rate_form: names of variables to use when estimating rate. defaults to ‘1 + age + sex’ rate_coefs: inputs to rate-calc (coefs used to estimate log-rate). defaults to [-3, 0.3, 0]
pandas DataFrame with N observations, and 3 columns:
  • true_t: “actual” simulated failure time

  • t: observed failure/censor time, given censor_time

  • event: boolean indicating if failure event was observed (TRUE)

    or censored (FALSE)

  • age: simulated age in years (poisson random variable, expectation = 55)

  • sex: simulated sex, as ‘female’ or ‘male’ (uniform 50/50 split)

  • rate: simulated rate value for each obs

survivalstan.sim.sim_data_jointmodel(N, p=0.5, **kwargs)[source]

Simulate data for joint model

Dictionary of 4 ojects:

  • params: parameter values used to simulate data
  • covars: dataframe of covariates per subject_id
  • events: dataframe of multiple-event data, per subject_id
  • biomarker: dataframe of longitudinal biomarker values simulated

survivalstan.survivalstan module

class survivalstan.survivalstan.SurvivalStanData(df, formula, event_col, time_col=None, sample_id_col=None, sample_col=None, group_id_col=None, group_col=None, timepoint_id_col=None, timepoint_end_col=None, drop_intercept=True, **kwargs)[source]

Bases: object

Input data representing a survival model in survivalstan


Create x_df and df_nonmiss

survivalstan.survivalstan.extract_baseline_hazard(results, element='baseline', timepoint_id_col='timepoint_id', timepoint_end_col='end_time')[source]

If model results contain a baseline object, extract & summarize it

survivalstan.survivalstan.extract_grp_baseline_hazard(results, timepoint_id_col='timepoint_id', timepoint_end_col='end_time')[source]

If model results contain a grp_baseline object, extract & summarize it

survivalstan.survivalstan.fit_stan_survival_model(df=None, formula=None, event_col=None, model_code=None, file=None, model_cohort='survival model', time_col=None, sample_id_col=None, sample_col=None, group_id_col=None, group_col=None, timepoint_id_col=None, timepoint_end_col=None, make_inits=None, stan_data={}, grp_coef_type=None, FIT_FUN=<function fit>, drop_intercept=True, input_data=None, *args, **kwargs)[source]

Prepare data & fit a survival model using Stan

This function wraps a number of steps into one function:

  1. Prepare input data dictionary for Stan - calls SurvivalStanData with user-provided formulas & df - (can be overridden using the input_data parameter)
  2. Compiles & optionally caches compiled stan code
  3. Fits model to data
  4. Tries the following functions on the resulting fit object:
  • stanity.psisloo to summarize model fit using LOO-PSIS approximation
  • extract posterior draws for beta coefficients (if model contains beta parameter)
  • extract posterior draws for grouped-beta coefficients (if applicable)

df (pandas DataFrame): The data frame containing input data to Survival model. formula (chr): Patsy formula to use for covariates. E.g ‘met_status + pd_l1’ event_col (chr): name of column containing event status. Will be coerced to boolean model_code (chr): stan model code to use. file (chr): path to stan file (if model_code not given) *args, **kwargs: passed to FIT_FUN ( or replacement)

model_cohort (chr): description of this model fit, to be used when plotting or summarizing output time_col (chr): name of column containing event time – used for parameteric models sample_id_col (chr): name of column containing numeric sample ids (1-indexed & sequential) sample_col (chr): name of column containing sample descriptions - will be converted to an ID group_id_col (chr): name of column containing numeric group ids (1-indexed & sequential) group_col (chr): name of column containing group descriptions - will be converted to an ID timepoint_id_col (chr): name of column containing timepoint ids (1-indexed & sequential) timepoint_end_col (chr): name of column containing end times for each timepoint (will be converted to an ID) stan_data (dict): extra params passed to stan data object grp_coef_type (chr): type of group coef specified, if using a varying-coef model

Can be one of: - ‘None’ (default): guess group coef orientation from data.

Works except in case where M (num covariates) == G (num groups)
  • ‘matrix’: grp_beta defined as matrix[M, G] grp_beta;
  • ‘vector-of-vectors’: grp_beta defined as vector[M] grp_beta[G];

drop_intercept (bool): whether to drop the intercept term from the model matrix (default: True)


dictionary of results objects.

df: Pandas data frame containing input data, filtered to non-missing obs & with ID variables created x_df: Covariate matrix passed to Stan x_names: Column names for the covariate matrix passed to Stan data: List passed to Stan - contains dimensions, etc. fit: pystan fit object returned from Stan call coefs: posterior draws for coefficient values loo: psis-loo object returned for fit model. Used for model comparison & summary model_cohort: description of this model and/or cohort on which the model was fit df_all: input df given, with calculated values included sample_col: name of column (in df_all) used to identify the sample sample_id_col: name of column containing numeric id derived from the sample timepoint_end_col: name of column (in df_all) used to determine end-time of ‘long’ data, if relevant timepoint_id_col: name of column containing numeric id derived from timepoint_end_col
AttributeError, KeyError


>>> testfit = fit_stan_survival_model(
            model_file = stanmodels.stan.pem_survival_model,
            formula = '~ met_status + pd_l1',
            df = dflong,
            sample_col = 'patient_id',
            timepoint_end_col = 'end_time',
            event_col = 'end_failure',
            model_cohort = 'PEM survival model',
            iter = 30000,
            chains = 4,
>>> print(testfit['fit'])
>>> seaborn.boxplot(x = 'value', y = 'variable', data = testfit['coefs'])
survivalstan.survivalstan.prep_data_long_surv(df, time_col, event_col, sample_col=None, event_name=None)[source]

Convert wide survival dataframe (df) to long format, in preparation for modeling using PEM models.

Returns a pandas DataFrame with original records duplicated for each unique failure time observed.
Each record will have two new columns: ‘end_failure’ and ‘end_time’, indicating the event status (end_failure) for each unique timepoint (end_time).
df (pandas.DataFrame):
Input data containing survival time & status for each subject
time_col (str):
name of column containing time to censor/event
event_col (str or list of strings):
name of column containing status (1 or True: event, 0 or False: censor) If a list is provided, these will be processed as multiple event types.
sample_col (str):
(optional) column containing sample or subject identifier. If given, result will be de-duped so that multiple events within a sample are handled correctly.
event_name (str):
(optional) column containing description of event type, if more than one type of event is observed. If given, then then multiple events per subject will be processed.

pandas.DataFrame with original records duplicated for each unique failure time observed.

Each record will _include all original covariate values_, plus two new columns: ‘end_failure’ and ‘end_time’, indicating the timepoint-specific event status for each record.

If multiple events are given (either via a list of event_cols or by providing an event_name, the result will contain multiple end_failure columns, one for each event type.

survivalstan.utils module

survivalstan.utils.extract_params_long(models, element, rename_vars=None, varnames=None)[source]

Helper function to extract & reformat params

models (list):
List of model objects
element (string, optional):
Which element to plot. defaults to ‘coefs’. Other options (depending on model type) include: - ‘grp_coefs’ - ‘baseline_hazard’
rename_vars (dict, optional):
  • dictionary mapping from integer positions (0, 1, 2) to variable names
varnames (list of strings, optional):
  • list of variable names to apply to columns from the extracted object

Pandas dataframe containing posterior draws per iteration

survivalstan.utils.extract_time_betas(models, element='beta_time', value_name='beta', **kwargs)[source]

Extract posterior draws for values of time-varying element from each model given in the list of models.

Returns a pandas.DataFrame containing one record for each posterior draw of each parameter, where

the parameter varies over time.

Columns include:

  • model_cohort: description of the model or cohort from which the draw was taken

  • <value-column>: the value of the posterior draw, named according to given parameter value_name

  • coef: description of the coefficient estimated, as per patsy formula provided

  • iter: integer indicator of the draw from which that estimate was taken

  • <timepoint-id-column>: integer identifier for each unique time at which betas are estimated

    (default column name is set by fit_stan_survival_model, typically as “timepoint_id”)

  • <timepoint-end-column>: time at which this beta was estimated

    (default column name is set by fit_stan_survival_model, typically as “end_time”)

** Parameters **:

param models:list of model-fit objects returned by survivalstan.fit_stan_survival_model.
type models:list
param element:name of parameter to extract. Defaults to “beta_time”, the parameter name used in the example time-varying stan model.
type element:str
param value_name:
 what you would like the “value” column called in the resulting dataframe
type value_name:
param **kwargs:**kwargs are passed to _extract_time_betas_single_model, allowing user to customize “default” values which would otherwise be read from each model object. examples include: coefs, timepoint_id_col, and timepoint_end_col.

** Returns **:

returns:pandas.DataFrame containing posterior draws of parameter values.
survivalstan.utils.filter_stan_summary(stan_fit, pars=None, remove_nan=False)[source]

Filter stan fit summary, for the set of parameters in pars. See ?pystan.summary for details about summary stats given.

StanFit object for which posterior draws are desired to be summarized
pars: (list, optional)
list of strings used to filter parameters. Passed directly to pystan.summary. default: return all parameters
remove_nan: (bool, optional)
whether to remove (and report on) NaN values for Rhat. These are problematic for distplot.

pandas dataframe containing summary stats for posterior draws of selected parameters

survivalstan.utils.get_sample_ids(models, sample_col='patient_id')[source]
survivalstan.utils.plot_coefs(models, element='coefs', force_direction=None, trans=None, **kwargs)[source]

Plot coefficients for models listed

models (list):
List of model objects
element (string, optional):
Which element to plot. defaults to ‘coefs’. Other options (depending on model type) include: - ‘grp_coefs’ - ‘baseline’ - ‘beta_time’
force_direction (string, optional):
Takes values ‘h’ or ‘v’
  • if ‘h’: forces horizontal orientation, (variable names along the x axis)
  • if ‘v’: forces vertical orientation (variable names along the y axis)

if None (default), coef plots default to ‘v’ for all plots except baseline hazard.

trans (function, optional):
If present, transforms value of value column
  • example: np.exp to plot exp(beta)

if None (default), plots raw value

survivalstan.utils.plot_observed_survival(df, event_col, time_col, label='observed', *args, **kwargs)[source]
survivalstan.utils.plot_pp_survival(models, time_element='y_hat_time', event_element='y_hat_event', num_ticks=10, step_size=None, ticks_at=None, time_col='event_time', event_col='event_status', fill=True, by=None, alpha=0.5, pal=None, subplot=None, **kwargs)[source]

Plot KM curve estimates from posterior-predicted values by group, for each model given in the list of models.

See prep_pp_survival_data for details regarding process of extracting posterior-predicted values.

**Parameters controlling data extraction **:

param models:

list of fit_stan_survival_model results from which to extract posterior-predicted values

type models:


param by:

additional column or columns by which to summarize posterior-predicted values. Default is None, which results in draws summarized by [iter and model_cohort]. Values can include any covariates provided in the original df.

type by:

str or list of strings

param time_element:

(optional) name of parameter containing posterior-predicted event time for each subject Defaults to standard used in survivalstan models: y_hat_time.

type time_element:


param event_element:

(optional) name of parameter containing posterior-predicted event status for each subject Defaults to the standard used in survivalstan models: y_hat_event.

type event_element:


param event_col:

(optional) name to use for column containing posterior draw for event_status

type event_col:


param time_col:

(optional) name to use for column containing posterior draw for time to event

type time_col:


param **kwargs:
**kwargs are passed to _prep_pp_data_single_model, allowing user to override

or specify default values given in the original call to fit_stan_survival_model. Parameters include: sample_col, sample_id_col to define names of sample description & id columns

as well as join_with giving name of dataframe to join with (options include df_nonmiss, x_df, or None).

Use join_with = None to disable merge with original dataframe.

** Parameters controlling plot orientation/presentation **:

param pal:(optional) palette to use for plotting.
type pal:list of colors, matching length of by groups
param ticks_at:(optional) exact locations for placement of ticks
param num_ticks:
 (optional) control number of ticks, if ticks_at not given.
param step_size:
 (optional) control tick spacing, if ticks_at or num_ticks not given
param alpha:(optional) level of transparency for boxplots
param fill:(optional) whether to fill in boxplots or just show outlines. Defaults to True
param subplot:(optional) pyplot.subplots object to use, if provided. Useful if you want to overlay observed or true survival on the same plot.
param xlabel:(optional) label for x-axis (defaults to “Days”)
param ylabel:(optional) label for y-axis (defaults to “Survival %”)
param label:(optional) legend-label for this plot group (defaults to “posterior predictions”, model-cohort, or by-group label depending options)
param **kwargs:(optional) args passed to set properties of boxes, medians & whiskers (e.g. color)

** Returns **:

returns:Nothing. Plotted object is a side-effect.
survivalstan.utils.plot_stan_summary(stan_fit, pars=None, metric='Rhat')[source]

Plot distribution of values in stan fit summary, for the set of parameters in pars.

Primary use case is to summarize Rhat estimates for set of parameters, as a quick check of convergence.

StanFit object for which posterior draws are desired to be summarized
pars: (list of str, optional)
list of strings used to filter parameters. Passed directly to pystan.summary. default: return all parameters
metric: (str, optional)
the name of the metric to plot, as one of: [‘mean’,’se_mean’,’sd’,‘2.5%’,‘50%’,‘97.5%’,’Rhat’] default: Rhat
survivalstan.utils.plot_time_betas(models=None, df=None, element='beta_time', y='beta', trans=None, coefs=None, x='timepoint_end_col', by=['model_cohort', 'coef'], timepoint_id_col=None, timepoint_end_col=None, subplot=None, ticks_at=None, ylabel=None, xlabel='time', num_ticks=10, step_size=None, fill=True, alpha=0.5, pal=None, value_name='beta', **kwargs)[source]

Plot posterior draws of time-varying parameters (element) from each model given in the list of models.

See also

extract_time_betas to return the dataframe used by this function to plot data.


this function can optionally take a df argument (the result of extract_time_betas) to support data-extraction & plotting in a two-step operation.

** Parameters controlling data extraction **:

param models:list of model-fit objects returned by survivalstan.fit_stan_survival_model.
type models:list
param element:name of parameter to extract. Defaults to “beta_time”, the parameter name used in the example time-varying stan model.
type element:str
param value_name:
 what you would like the “value” column called in the resulting dataframe
type value_name:
param coefs:(optional) parameter passed to extract_time_betas, to override coefficient names captured in fit_stan_survival_model.
param timepoint_id_col:
 (optional) parameter passed to extract_time_betas, to override timepoint_id_col captured in fit_stan_survival_model.
param timepoint_end_col:
 (optional) parameter passed to extract_time_betas to override timepoint_end_col captured in fit_stan_survival_model.

** Parameters controlling plot orientation/presentation **:

param trans:(optional) function to transform y-values plotted. Example: np.log
type trans:function
param by:(optional) list of columns by which to aggregate & color boxplots Defaults to: [‘model_cohort’, ‘coef’]
type by:list
param pal:(optional) palette to use for plotting.
type pal:list of colors, matching length of by groups
param y:(optional) column to put on the y-axis. Defaults to ‘beta’
type y:str
param x:(optional) column to put in the x-axis. Defaults to ‘timepoint_end_col’
type x:str
param num_ticks:
 (optional) how many ticks to show on the x-axis. See _plot_time_betas for details.
param alpha:(optional) level of transparency for boxplots
param fill:(optional) whether to fill in boxplots or just show outlines. Defaults to True
param subplot:(optional) pyplot.subplots object to use, if provided. Useful if you want to overlay multiple values on the same plot.

** Returns **:

returns:Nothing. Plotted object is a side-effect.
survivalstan.utils.prep_pp_data(models, time_element='y_hat_time', event_element='y_hat_event', event_col='event_status', time_col='event_time', **kwargs)[source]
Extract posterior-predicted values from each model included in the list of models given, optionally merged with
covariates & meta-data provided in the input df.


param models:

list of fit_stan_survival_model results from which to extract posterior-predicted values

type models:


param time_element:

(optional) name of parameter containing posterior-predicted event time for each subject Defaults to standard used in survivalstan models: y_hat_time.

type time_element:


param event_element:

(optional) name of parameter containing posterior-predicted event status for each subject Defaults to the standard used in survivalstan models: y_hat_event.

type event_element:


param event_col:

(optional) name to use for column containing posterior draw for event_status

type event_col:


param time_col:

(optional) name to use for column containing posterior draw for time to event

type time_col:


param **kwargs:
**kwargs are passed to _prep_pp_data_single_model, allowing user to override

or specify default values given in the original call to fit_stan_survival_model. Parameters include: sample_col, sample_id_col to define names of sample description & id columns

as well as join_with giving name of dataframe to join with (options include df_nonmiss, x_df, or None).

Use join_with = None to disable merge with original dataframe.


returns:pandas.DataFrame with one record per posterior draw (iter) for each subject, from each model optionally joined with original input data.
survivalstan.utils.prep_pp_survival_data(models, time_element='y_hat_time', event_element='y_hat_event', time_col='event_time', event_col='event_status', by=None, **kwargs)[source]
Summarize posterior-predicted values into KM survival/censor rates

by group, for each model given in the list of models.

See prep_pp_data for details regarding process of extracting posterior-predicted values.


param models:

list of fit_stan_survival_model results from which to extract posterior-predicted values

type models:


param by:

additional column or columns by which to summarize posterior-predicted values. Default is None, which results in draws summarized by [iter and model_cohort]. Values can include any covariates provided in the original df.

type by:

str or list of strings

param time_element:

(optional) name of parameter containing posterior-predicted event time for each subject Defaults to standard used in survivalstan models: y_hat_time.

type time_element:


param event_element:

(optional) name of parameter containing posterior-predicted event status for each subject Defaults to the standard used in survivalstan models: y_hat_event.

type event_element:


param event_col:

(optional) name to use for column containing posterior draw for event_status

type event_col:


param time_col:

(optional) name to use for column containing posterior draw for time to event

type time_col:


param **kwargs:
**kwargs are passed to _prep_pp_data_single_model, allowing user to override

or specify default values given in the original call to fit_stan_survival_model. Parameters include: sample_col, sample_id_col to define names of sample description & id columns

as well as join_with giving name of dataframe to join with (options include df_nonmiss, x_df, or None).

Use join_with = None to disable merge with original dataframe.


returns:pandas.DataFrame with one record per posterior draw (iter), timepoint, model_cohort, and by-groups.
survivalstan.utils.print_stan_summary(stan_fit, pars=None)[source]

Convenience function to print stan fit summary, for the set of parameters in pars.

StanFit object for which posterior draws are desired to be summarized
pars: (optional)
list of strings used to filter parameters. Passed directly to pystan.summary. default: return all parameters
survivalstan.utils.read_files(path, pattern='*.stan', encoding='utf-8', resource=None)[source]

Reads file contents from a directory path into memory. Returns a dictionary of file names: file contents.

Is intended to be used to load a directory of stan files into an object.

path (string):
directory path (can be relative or absolute)
pattern (string, optional):
regex pattern applied to files on import defaults to “*.stan”
encoding (string, optional):
encoding to use when importing files defaults to “UTF-8”
resource (string, optional):
if given, path is relative to package install root used to load stan files provided by packages (e.g. those within a package library)
The specifics of the return type depend on the value of resource.
  • if resource is None, returns contents of file as a character string

  • otherwise, returns a “resource_string” which

    acts as a character string but technically isn’t one.

Module contents