survivalstan package

Submodules

survivalstan.models module

survivalstan.sim module

Functions to simulate failure-time data for testing & model checking purposes

survivalstan.sim.sim_data_exp(N, censor_time, rate)[source]

simulate true lifetimes (t) according to exponential model

N: (int) number of observations censor_time: (float) uniform censor time for each observation rate: (float, positive) hazard rate used to parameterize failure times
pandas DataFrame with N observations, and 3 columns:
  • true_t: “actual” simulated failure time

  • t: observed failure/censor time, given censor_time

  • event: boolean indicating if failure event was observed (TRUE)

    or censored (FALSE)

survivalstan.sim.sim_data_exp_correlated(N, censor_time, rate_form='1 + age + sex', rate_coefs=[-3, 0.3, 0])[source]

simulate true lifetimes (t) according to exponential model

N: (int) number of observations censor_time: (float) uniform censor time for each observation rate_form: names of variables to use when estimating rate. defaults to ‘1 + age + sex’ rate_coefs: inputs to rate-calc (coefs used to estimate log-rate). defaults to [-3, 0.3, 0]
pandas DataFrame with N observations, and 3 columns:
  • true_t: “actual” simulated failure time

  • t: observed failure/censor time, given censor_time

  • event: boolean indicating if failure event was observed (TRUE)

    or censored (FALSE)

  • age: simulated age in years (poisson random variable, expectation = 55)

  • sex: simulated sex, as ‘female’ or ‘male’ (uniform 50/50 split)

  • rate: simulated rate value for each obs

survivalstan.sim.sim_data_jointmodel(N, p=0.5, **kwargs)[source]

Simulate data for joint model

Dictionary of 4 ojects:

  • params: parameter values used to simulate data
  • covars: dataframe of covariates per subject_id
  • events: dataframe of multiple-event data, per subject_id
  • biomarker: dataframe of longitudinal biomarker values simulated

survivalstan.survivalstan module

class survivalstan.survivalstan.SurvivalStanData(df, formula, event_col, time_col=None, sample_id_col=None, sample_col=None, group_id_col=None, group_col=None, timepoint_id_col=None, timepoint_end_col=None, drop_intercept=True, **kwargs)[source]

Bases: object

Input data representing a survival model in survivalstan

get_group_names()[source]
prep_df_nonmiss()[source]

Create x_df and df_nonmiss

prep_input_data(**kwargs)[source]
survivalstan.survivalstan.extract_baseline_hazard(results, element='baseline', timepoint_id_col='timepoint_id', timepoint_end_col='end_time')[source]

If model results contain a baseline object, extract & summarize it

survivalstan.survivalstan.extract_grp_baseline_hazard(results, timepoint_id_col='timepoint_id', timepoint_end_col='end_time')[source]

If model results contain a grp_baseline object, extract & summarize it

survivalstan.survivalstan.fit_stan_survival_model(df=None, formula=None, event_col=None, model_code=None, file=None, model_cohort='survival model', time_col=None, sample_id_col=None, sample_col=None, group_id_col=None, group_col=None, timepoint_id_col=None, timepoint_end_col=None, make_inits=None, stan_data={}, grp_coef_type=None, FIT_FUN=<function fit>, drop_intercept=True, input_data=None, *args, **kwargs)[source]

Prepare data & fit a survival model using Stan

This function wraps a number of steps into one function:

  1. Prepare input data dictionary for Stan - calls SurvivalStanData with user-provided formulas & df - (can be overridden using the input_data parameter)
  2. Compiles & optionally caches compiled stan code
  3. Fits model to data
  4. Tries the following functions on the resulting fit object:
  • stanity.psisloo to summarize model fit using LOO-PSIS approximation
  • extract posterior draws for beta coefficients (if model contains beta parameter)
  • extract posterior draws for grouped-beta coefficients (if applicable)
Parameters:

df (pandas DataFrame): The data frame containing input data to Survival model. formula (chr): Patsy formula to use for covariates. E.g ‘met_status + pd_l1’ event_col (chr): name of column containing event status. Will be coerced to boolean model_code (chr): stan model code to use. file (chr): path to stan file (if model_code not given) *args, **kwargs: passed to FIT_FUN (stanity.fit or replacement)

model_cohort (chr): description of this model fit, to be used when plotting or summarizing output time_col (chr): name of column containing event time – used for parameteric models sample_id_col (chr): name of column containing numeric sample ids (1-indexed & sequential) sample_col (chr): name of column containing sample descriptions - will be converted to an ID group_id_col (chr): name of column containing numeric group ids (1-indexed & sequential) group_col (chr): name of column containing group descriptions - will be converted to an ID timepoint_id_col (chr): name of column containing timepoint ids (1-indexed & sequential) timepoint_end_col (chr): name of column containing end times for each timepoint (will be converted to an ID) stan_data (dict): extra params passed to stan data object grp_coef_type (chr): type of group coef specified, if using a varying-coef model

Can be one of: - ‘None’ (default): guess group coef orientation from data.

Works except in case where M (num covariates) == G (num groups)
  • ‘matrix’: grp_beta defined as matrix[M, G] grp_beta;
  • ‘vector-of-vectors’: grp_beta defined as vector[M] grp_beta[G];

drop_intercept (bool): whether to drop the intercept term from the model matrix (default: True)

Returns:

dictionary of results objects.

Contents::
df: Pandas data frame containing input data, filtered to non-missing obs & with ID variables created x_df: Covariate matrix passed to Stan x_names: Column names for the covariate matrix passed to Stan data: List passed to Stan - contains dimensions, etc. fit: pystan fit object returned from Stan call coefs: posterior draws for coefficient values loo: psis-loo object returned for fit model. Used for model comparison & summary model_cohort: description of this model and/or cohort on which the model was fit df_all: input df given, with calculated values included sample_col: name of column (in df_all) used to identify the sample sample_id_col: name of column containing numeric id derived from the sample timepoint_end_col: name of column (in df_all) used to determine end-time of ‘long’ data, if relevant timepoint_id_col: name of column containing numeric id derived from timepoint_end_col
Raises:
AttributeError, KeyError

Example:

>>> testfit = fit_stan_survival_model(
            model_file = stanmodels.stan.pem_survival_model,
            formula = '~ met_status + pd_l1',
            df = dflong,
            sample_col = 'patient_id',
            timepoint_end_col = 'end_time',
            event_col = 'end_failure',
            model_cohort = 'PEM survival model',
            iter = 30000,
            chains = 4,
        )
>>> print(testfit['fit'])
>>> seaborn.boxplot(x = 'value', y = 'variable', data = testfit['coefs'])
survivalstan.survivalstan.make_weibull_survival_model_inits(stan_input_dict)[source]
survivalstan.survivalstan.prep_data_long_surv(df, time_col, event_col, sample_col=None, event_name=None)[source]

Convert wide survival dataframe (df) to long format, in preparation for modeling using PEM models.

Returns a pandas DataFrame with original records duplicated for each unique failure time observed.
Each record will have two new columns: ‘end_failure’ and ‘end_time’, indicating the event status (end_failure) for each unique timepoint (end_time).
Parameters:
df (pandas.DataFrame):
Input data containing survival time & status for each subject
time_col (str):
name of column containing time to censor/event
event_col (str or list of strings):
name of column containing status (1 or True: event, 0 or False: censor) If a list is provided, these will be processed as multiple event types.
sample_col (str):
(optional) column containing sample or subject identifier. If given, result will be de-duped so that multiple events within a sample are handled correctly.
event_name (str):
(optional) column containing description of event type, if more than one type of event is observed. If given, then then multiple events per subject will be processed.
Returns:

pandas.DataFrame with original records duplicated for each unique failure time observed.

Each record will _include all original covariate values_, plus two new columns: ‘end_failure’ and ‘end_time’, indicating the timepoint-specific event status for each record.

If multiple events are given (either via a list of event_cols or by providing an event_name, the result will contain multiple end_failure columns, one for each event type.

survivalstan.utils module

survivalstan.utils.extract_params_long(models, element, rename_vars=None, varnames=None)[source]

Helper function to extract & reformat params

models (list):
List of model objects
element (string, optional):
Which element to plot. defaults to ‘coefs’. Other options (depending on model type) include: - ‘grp_coefs’ - ‘baseline_hazard’
rename_vars (dict, optional):
  • dictionary mapping from integer positions (0, 1, 2) to variable names
varnames (list of strings, optional):
  • list of variable names to apply to columns from the extracted object

Pandas dataframe containing posterior draws per iteration

survivalstan.utils.extract_time_betas(models, element='beta_time', value_name='beta', **kwargs)[source]

Extract posterior draws for values of time-varying element from each model given in the list of models.

Returns a pandas.DataFrame containing one record for each posterior draw of each parameter, where

the parameter varies over time.

Columns include:

  • model_cohort: description of the model or cohort from which the draw was taken

  • <value-column>: the value of the posterior draw, named according to given parameter value_name

  • coef: description of the coefficient estimated, as per patsy formula provided

  • iter: integer indicator of the draw from which that estimate was taken

  • <timepoint-id-column>: integer identifier for each unique time at which betas are estimated

    (default column name is set by fit_stan_survival_model, typically as “timepoint_id”)

  • <timepoint-end-column>: time at which this beta was estimated

    (default column name is set by fit_stan_survival_model, typically as “end_time”)

** Parameters **:

param models:list of model-fit objects returned by survivalstan.fit_stan_survival_model.
type models:list
param element:name of parameter to extract. Defaults to “beta_time”, the parameter name used in the example time-varying stan model.
type element:str
param value_name:
 what you would like the “value” column called in the resulting dataframe
type value_name:
 str
param **kwargs:**kwargs are passed to _extract_time_betas_single_model, allowing user to customize “default” values which would otherwise be read from each model object. examples include: coefs, timepoint_id_col, and timepoint_end_col.

** Returns **:

returns:pandas.DataFrame containing posterior draws of parameter values.
survivalstan.utils.filter_stan_summary(stan_fit, pars=None, remove_nan=False)[source]

Filter stan fit summary, for the set of parameters in pars. See ?pystan.summary for details about summary stats given.

stan_fit:
StanFit object for which posterior draws are desired to be summarized
pars: (list, optional)
list of strings used to filter parameters. Passed directly to pystan.summary. default: return all parameters
remove_nan: (bool, optional)
whether to remove (and report on) NaN values for Rhat. These are problematic for distplot.

pandas dataframe containing summary stats for posterior draws of selected parameters

survivalstan.utils.get_sample_ids(models, sample_col='patient_id')[source]
survivalstan.utils.plot_coefs(models, element='coefs', force_direction=None, trans=None, **kwargs)[source]

Plot coefficients for models listed

models (list):
List of model objects
element (string, optional):
Which element to plot. defaults to ‘coefs’. Other options (depending on model type) include: - ‘grp_coefs’ - ‘baseline’ - ‘beta_time’
force_direction (string, optional):
Takes values ‘h’ or ‘v’
  • if ‘h’: forces horizontal orientation, (variable names along the x axis)
  • if ‘v’: forces vertical orientation (variable names along the y axis)

if None (default), coef plots default to ‘v’ for all plots except baseline hazard.

trans (function, optional):
If present, transforms value of value column
  • example: np.exp to plot exp(beta)

if None (default), plots raw value

survivalstan.utils.plot_observed_survival(df, event_col, time_col, label='observed', *args, **kwargs)[source]
survivalstan.utils.plot_pp_survival(models, time_element='y_hat_time', event_element='y_hat_event', num_ticks=10, step_size=None, ticks_at=None, time_col='event_time', event_col='event_status', fill=True, by=None, alpha=0.5, pal=None, subplot=None, **kwargs)[source]

Plot KM curve estimates from posterior-predicted values by group, for each model given in the list of models.

See prep_pp_survival_data for details regarding process of extracting posterior-predicted values.

**Parameters controlling data extraction **:

param models:

list of fit_stan_survival_model results from which to extract posterior-predicted values

type models:

list

param by:

additional column or columns by which to summarize posterior-predicted values. Default is None, which results in draws summarized by [iter and model_cohort]. Values can include any covariates provided in the original df.

type by:

str or list of strings

param time_element:
 

(optional) name of parameter containing posterior-predicted event time for each subject Defaults to standard used in survivalstan models: y_hat_time.

type time_element:
 

str

param event_element:
 

(optional) name of parameter containing posterior-predicted event status for each subject Defaults to the standard used in survivalstan models: y_hat_event.

type event_element:
 

str

param event_col:
 

(optional) name to use for column containing posterior draw for event_status

type event_col:

str

param time_col:

(optional) name to use for column containing posterior draw for time to event

type time_col:

str

param **kwargs:
**kwargs are passed to _prep_pp_data_single_model, allowing user to override

or specify default values given in the original call to fit_stan_survival_model. Parameters include: sample_col, sample_id_col to define names of sample description & id columns

as well as join_with giving name of dataframe to join with (options include df_nonmiss, x_df, or None).

Use join_with = None to disable merge with original dataframe.

** Parameters controlling plot orientation/presentation **:

param pal:(optional) palette to use for plotting.
type pal:list of colors, matching length of by groups
param ticks_at:(optional) exact locations for placement of ticks
param num_ticks:
 (optional) control number of ticks, if ticks_at not given.
param step_size:
 (optional) control tick spacing, if ticks_at or num_ticks not given
param alpha:(optional) level of transparency for boxplots
param fill:(optional) whether to fill in boxplots or just show outlines. Defaults to True
param subplot:(optional) pyplot.subplots object to use, if provided. Useful if you want to overlay observed or true survival on the same plot.
param xlabel:(optional) label for x-axis (defaults to “Days”)
param ylabel:(optional) label for y-axis (defaults to “Survival %”)
param label:(optional) legend-label for this plot group (defaults to “posterior predictions”, model-cohort, or by-group label depending options)
param **kwargs:(optional) args passed to set properties of boxes, medians & whiskers (e.g. color)

** Returns **:

returns:Nothing. Plotted object is a side-effect.
survivalstan.utils.plot_stan_summary(stan_fit, pars=None, metric='Rhat')[source]

Plot distribution of values in stan fit summary, for the set of parameters in pars.

Primary use case is to summarize Rhat estimates for set of parameters, as a quick check of convergence.

stan_fit:
StanFit object for which posterior draws are desired to be summarized
pars: (list of str, optional)
list of strings used to filter parameters. Passed directly to pystan.summary. default: return all parameters
metric: (str, optional)
the name of the metric to plot, as one of: [‘mean’,’se_mean’,’sd’,‘2.5%’,‘50%’,‘97.5%’,’Rhat’] default: Rhat
survivalstan.utils.plot_time_betas(models=None, df=None, element='beta_time', y='beta', trans=None, coefs=None, x='timepoint_end_col', by=['model_cohort', 'coef'], timepoint_id_col=None, timepoint_end_col=None, subplot=None, ticks_at=None, ylabel=None, xlabel='time', num_ticks=10, step_size=None, fill=True, alpha=0.5, pal=None, value_name='beta', **kwargs)[source]

Plot posterior draws of time-varying parameters (element) from each model given in the list of models.

See also

extract_time_betas to return the dataframe used by this function to plot data.

Note

this function can optionally take a df argument (the result of extract_time_betas) to support data-extraction & plotting in a two-step operation.

** Parameters controlling data extraction **:

param models:list of model-fit objects returned by survivalstan.fit_stan_survival_model.
type models:list
param element:name of parameter to extract. Defaults to “beta_time”, the parameter name used in the example time-varying stan model.
type element:str
param value_name:
 what you would like the “value” column called in the resulting dataframe
type value_name:
 str
param coefs:(optional) parameter passed to extract_time_betas, to override coefficient names captured in fit_stan_survival_model.
param timepoint_id_col:
 (optional) parameter passed to extract_time_betas, to override timepoint_id_col captured in fit_stan_survival_model.
param timepoint_end_col:
 (optional) parameter passed to extract_time_betas to override timepoint_end_col captured in fit_stan_survival_model.

** Parameters controlling plot orientation/presentation **:

param trans:(optional) function to transform y-values plotted. Example: np.log
type trans:function
param by:(optional) list of columns by which to aggregate & color boxplots Defaults to: [‘model_cohort’, ‘coef’]
type by:list
param pal:(optional) palette to use for plotting.
type pal:list of colors, matching length of by groups
param y:(optional) column to put on the y-axis. Defaults to ‘beta’
type y:str
param x:(optional) column to put in the x-axis. Defaults to ‘timepoint_end_col’
type x:str
param num_ticks:
 (optional) how many ticks to show on the x-axis. See _plot_time_betas for details.
param alpha:(optional) level of transparency for boxplots
param fill:(optional) whether to fill in boxplots or just show outlines. Defaults to True
param subplot:(optional) pyplot.subplots object to use, if provided. Useful if you want to overlay multiple values on the same plot.

** Returns **:

returns:Nothing. Plotted object is a side-effect.
survivalstan.utils.prep_pp_data(models, time_element='y_hat_time', event_element='y_hat_event', event_col='event_status', time_col='event_time', **kwargs)[source]
Extract posterior-predicted values from each model included in the list of models given, optionally merged with
covariates & meta-data provided in the input df.

Parameters:

param models:

list of fit_stan_survival_model results from which to extract posterior-predicted values

type models:

list

param time_element:
 

(optional) name of parameter containing posterior-predicted event time for each subject Defaults to standard used in survivalstan models: y_hat_time.

type time_element:
 

str

param event_element:
 

(optional) name of parameter containing posterior-predicted event status for each subject Defaults to the standard used in survivalstan models: y_hat_event.

type event_element:
 

str

param event_col:
 

(optional) name to use for column containing posterior draw for event_status

type event_col:

str

param time_col:

(optional) name to use for column containing posterior draw for time to event

type time_col:

str

param **kwargs:
**kwargs are passed to _prep_pp_data_single_model, allowing user to override

or specify default values given in the original call to fit_stan_survival_model. Parameters include: sample_col, sample_id_col to define names of sample description & id columns

as well as join_with giving name of dataframe to join with (options include df_nonmiss, x_df, or None).

Use join_with = None to disable merge with original dataframe.

Returns:

returns:pandas.DataFrame with one record per posterior draw (iter) for each subject, from each model optionally joined with original input data.
survivalstan.utils.prep_pp_survival_data(models, time_element='y_hat_time', event_element='y_hat_event', time_col='event_time', event_col='event_status', by=None, **kwargs)[source]
Summarize posterior-predicted values into KM survival/censor rates

by group, for each model given in the list of models.

See prep_pp_data for details regarding process of extracting posterior-predicted values.

Parameters:

param models:

list of fit_stan_survival_model results from which to extract posterior-predicted values

type models:

list

param by:

additional column or columns by which to summarize posterior-predicted values. Default is None, which results in draws summarized by [iter and model_cohort]. Values can include any covariates provided in the original df.

type by:

str or list of strings

param time_element:
 

(optional) name of parameter containing posterior-predicted event time for each subject Defaults to standard used in survivalstan models: y_hat_time.

type time_element:
 

str

param event_element:
 

(optional) name of parameter containing posterior-predicted event status for each subject Defaults to the standard used in survivalstan models: y_hat_event.

type event_element:
 

str

param event_col:
 

(optional) name to use for column containing posterior draw for event_status

type event_col:

str

param time_col:

(optional) name to use for column containing posterior draw for time to event

type time_col:

str

param **kwargs:
**kwargs are passed to _prep_pp_data_single_model, allowing user to override

or specify default values given in the original call to fit_stan_survival_model. Parameters include: sample_col, sample_id_col to define names of sample description & id columns

as well as join_with giving name of dataframe to join with (options include df_nonmiss, x_df, or None).

Use join_with = None to disable merge with original dataframe.

Returns:

returns:pandas.DataFrame with one record per posterior draw (iter), timepoint, model_cohort, and by-groups.
survivalstan.utils.print_stan_summary(stan_fit, pars=None)[source]

Convenience function to print stan fit summary, for the set of parameters in pars.

stan_fit:
StanFit object for which posterior draws are desired to be summarized
pars: (optional)
list of strings used to filter parameters. Passed directly to pystan.summary. default: return all parameters
survivalstan.utils.read_files(path, pattern='*.stan', encoding='utf-8', resource=None)[source]

Reads file contents from a directory path into memory. Returns a dictionary of file names: file contents.

Is intended to be used to load a directory of stan files into an object.

path (string):
directory path (can be relative or absolute)
pattern (string, optional):
regex pattern applied to files on import defaults to “*.stan”
encoding (string, optional):
encoding to use when importing files defaults to “UTF-8”
resource (string, optional):
if given, path is relative to package install root used to load stan files provided by packages (e.g. those within a package library)
The specifics of the return type depend on the value of resource.
  • if resource is None, returns contents of file as a character string

  • otherwise, returns a “resource_string” which

    acts as a character string but technically isn’t one.

Module contents