survivalstan package¶

Submodules¶

survivalstan.models module¶

survivalstan.sim module¶

Functions to simulate failure-time data for testing & model checking purposes

survivalstan.sim.sim_data_exp(N, censor_time, rate)[source]¶

simulate true lifetimes (t) according to exponential model

N: (int) number of observations censor_time: (float) uniform censor time for each observation rate: (float, positive) hazard rate used to parameterize failure times

pandas DataFrame with N observations, and 3 columns:

true_t: “actual” simulated failure time
t: observed failure/censor time, given censor_time
event: boolean indicating if failure event was observed (TRUE)

or censored (FALSE)

survivalstan.sim.sim_data_exp_correlated(N, censor_time, rate_form='1 + age + sex', rate_coefs=[-3, 0.3, 0])[source]¶

simulate true lifetimes (t) according to exponential model

N: (int) number of observations censor_time: (float) uniform censor time for each observation rate_form: names of variables to use when estimating rate. defaults to ‘1 + age + sex’ rate_coefs: inputs to rate-calc (coefs used to estimate log-rate). defaults to [-3, 0.3, 0]

pandas DataFrame with N observations, and 3 columns:

true_t: “actual” simulated failure time
t: observed failure/censor time, given censor_time
event: boolean indicating if failure event was observed (TRUE)

or censored (FALSE)
age: simulated age in years (poisson random variable, expectation = 55)
sex: simulated sex, as ‘female’ or ‘male’ (uniform 50/50 split)
rate: simulated rate value for each obs

survivalstan.sim.sim_data_jointmodel(N, p=0.5, **kwargs)[source]¶

Simulate data for joint model

Dictionary of 4 ojects:

params: parameter values used to simulate data
covars: dataframe of covariates per subject_id
events: dataframe of multiple-event data, per subject_id
biomarker: dataframe of longitudinal biomarker values simulated

survivalstan.survivalstan module¶

class survivalstan.survivalstan.SurvivalStanData(df, formula, event_col, time_col=None, sample_id_col=None, sample_col=None, group_id_col=None, group_col=None, timepoint_id_col=None, timepoint_end_col=None, drop_intercept=True, **kwargs)[source]¶

Bases: object

Input data representing a survival model in survivalstan

get_group_names()[source]¶

prep_df_nonmiss()[source]¶: Create x_df and df_nonmiss

prep_input_data(**kwargs)[source]¶

survivalstan.survivalstan.extract_baseline_hazard(results, element='baseline', timepoint_id_col='timepoint_id', timepoint_end_col='end_time')[source]¶: If model results contain a baseline object, extract & summarize it

survivalstan.survivalstan.extract_grp_baseline_hazard(results, timepoint_id_col='timepoint_id', timepoint_end_col='end_time')[source]¶: If model results contain a grp_baseline object, extract & summarize it

survivalstan.survivalstan.fit_stan_survival_model(df=None, formula=None, event_col=None, model_code=None, file=None, model_cohort='survival model', time_col=None, sample_id_col=None, sample_col=None, group_id_col=None, group_col=None, timepoint_id_col=None, timepoint_end_col=None, make_inits=None, stan_data={}, grp_coef_type=None, FIT_FUN=<function fit>, drop_intercept=True, input_data=None, *args, **kwargs)[source]¶

Prepare data & fit a survival model using Stan

This function wraps a number of steps into one function:

Prepare input data dictionary for Stan - calls SurvivalStanData with user-provided formulas & df - (can be overridden using the input_data parameter)

Compiles & optionally caches compiled stan code

Fits model to data

Tries the following functions on the resulting fit object:

stanity.psisloo to summarize model fit using LOO-PSIS approximation

extract posterior draws for beta coefficients (if model contains beta parameter)

extract posterior draws for grouped-beta coefficients (if applicable)

Parameters:

df (pandas DataFrame): The data frame containing input data to Survival model. formula (chr): Patsy formula to use for covariates. E.g ‘met_status + pd_l1’ event_col (chr): name of column containing event status. Will be coerced to boolean model_code (chr): stan model code to use. file (chr): path to stan file (if model_code not given) *args, **kwargs: passed to FIT_FUN (stanity.fit or replacement)

model_cohort (chr): description of this model fit, to be used when plotting or summarizing output time_col (chr): name of column containing event time – used for parameteric models sample_id_col (chr): name of column containing numeric sample ids (1-indexed & sequential) sample_col (chr): name of column containing sample descriptions - will be converted to an ID group_id_col (chr): name of column containing numeric group ids (1-indexed & sequential) group_col (chr): name of column containing group descriptions - will be converted to an ID timepoint_id_col (chr): name of column containing timepoint ids (1-indexed & sequential) timepoint_end_col (chr): name of column containing end times for each timepoint (will be converted to an ID) stan_data (dict): extra params passed to stan data object grp_coef_type (chr): type of group coef specified, if using a varying-coef model

Can be one of: - ‘None’ (default): guess group coef orientation from data.

Works except in case where M (num covariates) == G (num groups)

‘matrix’: grp_beta defined as matrix[M, G] grp_beta;

‘vector-of-vectors’: grp_beta defined as vector[M] grp_beta[G];

drop_intercept (bool): whether to drop the intercept term from the model matrix (default: True)

Returns:

dictionary of results objects.

Contents::: df: Pandas data frame containing input data, filtered to non-missing obs & with ID variables created x_df: Covariate matrix passed to Stan x_names: Column names for the covariate matrix passed to Stan data: List passed to Stan - contains dimensions, etc. fit: pystan fit object returned from Stan call coefs: posterior draws for coefficient values loo: psis-loo object returned for fit model. Used for model comparison & summary model_cohort: description of this model and/or cohort on which the model was fit df_all: input df given, with calculated values included sample_col: name of column (in df_all) used to identify the sample sample_id_col: name of column containing numeric id derived from the sample timepoint_end_col: name of column (in df_all) used to determine end-time of ‘long’ data, if relevant timepoint_id_col: name of column containing numeric id derived from timepoint_end_col

Raises:

AttributeError, KeyError

Example:

>>> testfit = fit_stan_survival_model(
            model_file = stanmodels.stan.pem_survival_model,
            formula = '~ met_status + pd_l1',
            df = dflong,
            sample_col = 'patient_id',
            timepoint_end_col = 'end_time',
            event_col = 'end_failure',
            model_cohort = 'PEM survival model',
            iter = 30000,
            chains = 4,
        )
>>> print(testfit['fit'])
>>> seaborn.boxplot(x = 'value', y = 'variable', data = testfit['coefs'])

survivalstan.survivalstan.make_weibull_survival_model_inits(stan_input_dict)[source]¶

survivalstan.survivalstan.prep_data_long_surv(df, time_col, event_col, sample_col=None, event_name=None)[source]¶

Convert wide survival dataframe (df) to long format, in preparation for modeling using PEM models.

Returns a pandas DataFrame with original records duplicated for each unique failure time observed.

Each record will have two new columns: ‘end_failure’ and ‘end_time’, indicating the event status (end_failure) for each unique timepoint (end_time).

Parameters:

df (pandas.DataFrame):: Input data containing survival time & status for each subject
time_col (str):: name of column containing time to censor/event
event_col (str or list of strings):: name of column containing status (1 or True: event, 0 or False: censor) If a list is provided, these will be processed as multiple event types.
sample_col (str):: (optional) column containing sample or subject identifier. If given, result will be de-duped so that multiple events within a sample are handled correctly.
event_name (str):: (optional) column containing description of event type, if more than one type of event is observed. If given, then then multiple events per subject will be processed.

Returns:

pandas.DataFrame with original records duplicated for each unique failure time observed.

Each record will _include all original covariate values_, plus two new columns: ‘end_failure’ and ‘end_time’, indicating the timepoint-specific event status for each record.

If multiple events are given (either via a list of event_cols or by providing an event_name, the result will contain multiple end_failure columns, one for each event type.

survivalstan.utils module¶

survivalstan.utils.extract_params_long(models, element, rename_vars=None, varnames=None)[source]¶

Helper function to extract & reformat params

models (list):

List of model objects

element (string, optional):

Which element to plot. defaults to ‘coefs’. Other options (depending on model type) include: - ‘grp_coefs’ - ‘baseline_hazard’

rename_vars (dict, optional):

dictionary mapping from integer positions (0, 1, 2) to variable names

varnames (list of strings, optional):

list of variable names to apply to columns from the extracted object

Pandas dataframe containing posterior draws per iteration

survivalstan.utils.extract_time_betas(models, element='beta_time', value_name='beta', **kwargs)[source]¶

Extract posterior draws for values of time-varying element from each model given in the list of models.

Returns a pandas.DataFrame containing one record for each posterior draw of each parameter, where

the parameter varies over time.

Columns include:

model_cohort: description of the model or cohort from which the draw was taken

<value-column>: the value of the posterior draw, named according to given parameter value_name

coef: description of the coefficient estimated, as per patsy formula provided

iter: integer indicator of the draw from which that estimate was taken

<timepoint-id-column>: integer identifier for each unique time at which betas are estimated

(default column name is set by fit_stan_survival_model, typically as “timepoint_id”)

<timepoint-end-column>: time at which this beta was estimated

(default column name is set by fit_stan_survival_model, typically as “end_time”)

** Parameters **:

param models: list of model-fit objects returned by survivalstan.fit_stan_survival_model.

type models: list

param element: name of parameter to extract. Defaults to “beta_time”, the parameter name used in the example time-varying stan model.

type element: str

param value_name:

what you would like the “value” column called in the resulting dataframe

type value_name:

str

param **kwargs: **kwargs are passed to _extract_time_betas_single_model, allowing user to customize “default” values which would otherwise be read from each model object. examples include: coefs, timepoint_id_col, and timepoint_end_col.

** Returns **:

returns: pandas.DataFrame containing posterior draws of parameter values.

survivalstan.utils.filter_stan_summary(stan_fit, pars=None, remove_nan=False)[source]¶

Filter stan fit summary, for the set of parameters in pars. See ?pystan.summary for details about summary stats given.

stan_fit:: StanFit object for which posterior draws are desired to be summarized
pars: (list, optional): list of strings used to filter parameters. Passed directly to pystan.summary. default: return all parameters
remove_nan: (bool, optional): whether to remove (and report on) NaN values for Rhat. These are problematic for distplot.

pandas dataframe containing summary stats for posterior draws of selected parameters

survivalstan.utils.get_sample_ids(models, sample_col='patient_id')[source]¶

survivalstan.utils.plot_coefs(models, element='coefs', force_direction=None, trans=None, **kwargs)[source]¶

Plot coefficients for models listed

models (list):

List of model objects

element (string, optional):

Which element to plot. defaults to ‘coefs’. Other options (depending on model type) include: - ‘grp_coefs’ - ‘baseline’ - ‘beta_time’

force_direction (string, optional):

Takes values ‘h’ or ‘v’

if ‘h’: forces horizontal orientation, (variable names along the x axis)
if ‘v’: forces vertical orientation (variable names along the y axis)

if None (default), coef plots default to ‘v’ for all plots except baseline hazard.

trans (function, optional):

If present, transforms value of value column

example: np.exp to plot exp(beta)

if None (default), plots raw value

survivalstan.utils.plot_observed_survival(df, event_col, time_col, label='observed', *args, **kwargs)[source]¶

survivalstan.utils.plot_pp_survival(models, time_element='y_hat_time', event_element='y_hat_event', num_ticks=10, step_size=None, ticks_at=None, time_col='event_time', event_col='event_status', fill=True, by=None, alpha=0.5, pal=None, subplot=None, **kwargs)[source]¶

Plot KM curve estimates from posterior-predicted values by group, for each model given in the list of models.

See prep_pp_survival_data for details regarding process of extracting posterior-predicted values.

**Parameters controlling data extraction **:

param models:
list of fit_stan_survival_model results from which to extract posterior-predicted values

type models:
list

param by:
additional column or columns by which to summarize posterior-predicted values. Default is None, which results in draws summarized by [iter and model_cohort]. Values can include any covariates provided in the original df.

type by:
str or list of strings

param time_element:

(optional) name of parameter containing posterior-predicted event time for each subject Defaults to standard used in survivalstan models: y_hat_time.

type time_element:

str

param event_element:

(optional) name of parameter containing posterior-predicted event status for each subject Defaults to the standard used in survivalstan models: y_hat_event.

type event_element:

str

param event_col:

(optional) name to use for column containing posterior draw for event_status

type event_col:
str

param time_col:
(optional) name to use for column containing posterior draw for time to event

type time_col:
str

param **kwargs:

**kwargs are passed to _prep_pp_data_single_model, allowing user to override

or specify default values given in the original call to fit_stan_survival_model. Parameters include: sample_col, sample_id_col to define names of sample description & id columns

as well as join_with giving name of dataframe to join with (options include df_nonmiss, x_df, or None).

Use join_with = None to disable merge with original dataframe.

** Parameters controlling plot orientation/presentation **:

param pal: (optional) palette to use for plotting.

type pal: list of colors, matching length of by groups

param ticks_at: (optional) exact locations for placement of ticks

param num_ticks:

(optional) control number of ticks, if ticks_at not given.

param step_size:

(optional) control tick spacing, if ticks_at or num_ticks not given

param alpha: (optional) level of transparency for boxplots

param fill: (optional) whether to fill in boxplots or just show outlines. Defaults to True

param subplot: (optional) pyplot.subplots object to use, if provided. Useful if you want to overlay observed or true survival on the same plot.

param xlabel: (optional) label for x-axis (defaults to “Days”)

param ylabel: (optional) label for y-axis (defaults to “Survival %”)

param label: (optional) legend-label for this plot group (defaults to “posterior predictions”, model-cohort, or by-group label depending options)

param **kwargs: (optional) args passed to set properties of boxes, medians & whiskers (e.g. color)

** Returns **:

returns: Nothing. Plotted object is a side-effect.

survivalstan.utils.plot_stan_summary(stan_fit, pars=None, metric='Rhat')[source]¶

Plot distribution of values in stan fit summary, for the set of parameters in pars.

Primary use case is to summarize Rhat estimates for set of parameters, as a quick check of convergence.

stan_fit:: StanFit object for which posterior draws are desired to be summarized
pars: (list of str, optional): list of strings used to filter parameters. Passed directly to pystan.summary. default: return all parameters
metric: (str, optional): the name of the metric to plot, as one of: [‘mean’,’se_mean’,’sd’,‘2.5%’,‘50%’,‘97.5%’,’Rhat’] default: Rhat

survivalstan.utils.plot_time_betas(models=None, df=None, element='beta_time', y='beta', trans=None, coefs=None, x='timepoint_end_col', by=['model_cohort', 'coef'], timepoint_id_col=None, timepoint_end_col=None, subplot=None, ticks_at=None, ylabel=None, xlabel='time', num_ticks=10, step_size=None, fill=True, alpha=0.5, pal=None, value_name='beta', **kwargs)[source]¶

Plot posterior draws of time-varying parameters (element) from each model given in the list of models.

See also

extract_time_betas to return the dataframe used by this function to plot data.

Note

this function can optionally take a df argument (the result of extract_time_betas) to support data-extraction & plotting in a two-step operation.

** Parameters controlling data extraction **:

param models: list of model-fit objects returned by survivalstan.fit_stan_survival_model.

type models: list

param element: name of parameter to extract. Defaults to “beta_time”, the parameter name used in the example time-varying stan model.

type element: str

param value_name:

what you would like the “value” column called in the resulting dataframe

type value_name:

str

param coefs: (optional) parameter passed to extract_time_betas, to override coefficient names captured in fit_stan_survival_model.

param timepoint_id_col:

(optional) parameter passed to extract_time_betas, to override timepoint_id_col captured in fit_stan_survival_model.

param timepoint_end_col:

(optional) parameter passed to extract_time_betas to override timepoint_end_col captured in fit_stan_survival_model.

** Parameters controlling plot orientation/presentation **:

param trans: (optional) function to transform y-values plotted. Example: np.log

type trans: function

param by: (optional) list of columns by which to aggregate & color boxplots Defaults to: [‘model_cohort’, ‘coef’]

type by: list

param pal: (optional) palette to use for plotting.

type pal: list of colors, matching length of by groups

param y: (optional) column to put on the y-axis. Defaults to ‘beta’

type y: str

param x: (optional) column to put in the x-axis. Defaults to ‘timepoint_end_col’

type x: str

param num_ticks:

(optional) how many ticks to show on the x-axis. See _plot_time_betas for details.

param alpha: (optional) level of transparency for boxplots

param fill: (optional) whether to fill in boxplots or just show outlines. Defaults to True

param subplot: (optional) pyplot.subplots object to use, if provided. Useful if you want to overlay multiple values on the same plot.

** Returns **:

returns: Nothing. Plotted object is a side-effect.

survivalstan.utils.prep_pp_data(models, time_element='y_hat_time', event_element='y_hat_event', event_col='event_status', time_col='event_time', **kwargs)[source]¶

Extract posterior-predicted values from each model included in the list of models given, optionally merged with: covariates & meta-data provided in the input df.

Parameters:

param models:
list of fit_stan_survival_model results from which to extract posterior-predicted values

type models:
list

param time_element:

(optional) name of parameter containing posterior-predicted event time for each subject Defaults to standard used in survivalstan models: y_hat_time.

type time_element:

str

param event_element:

(optional) name of parameter containing posterior-predicted event status for each subject Defaults to the standard used in survivalstan models: y_hat_event.

type event_element:

str

param event_col:

(optional) name to use for column containing posterior draw for event_status

type event_col:
str

param time_col:
(optional) name to use for column containing posterior draw for time to event

type time_col:
str

param **kwargs:

**kwargs are passed to _prep_pp_data_single_model, allowing user to override

or specify default values given in the original call to fit_stan_survival_model. Parameters include: sample_col, sample_id_col to define names of sample description & id columns

as well as join_with giving name of dataframe to join with (options include df_nonmiss, x_df, or None).

Use join_with = None to disable merge with original dataframe.

Returns:

returns: pandas.DataFrame with one record per posterior draw (iter) for each subject, from each model optionally joined with original input data.

survivalstan.utils.prep_pp_survival_data(models, time_element='y_hat_time', event_element='y_hat_event', time_col='event_time', event_col='event_status', by=None, **kwargs)[source]¶

Summarize posterior-predicted values into KM survival/censor rates

by group, for each model given in the list of models.

See prep_pp_data for details regarding process of extracting posterior-predicted values.

Parameters:

param models:
list of fit_stan_survival_model results from which to extract posterior-predicted values

type models:
list

param by:
additional column or columns by which to summarize posterior-predicted values. Default is None, which results in draws summarized by [iter and model_cohort]. Values can include any covariates provided in the original df.

type by:
str or list of strings

param time_element:

(optional) name of parameter containing posterior-predicted event time for each subject Defaults to standard used in survivalstan models: y_hat_time.

type time_element:

str

param event_element:

(optional) name of parameter containing posterior-predicted event status for each subject Defaults to the standard used in survivalstan models: y_hat_event.

type event_element:

str

param event_col:

(optional) name to use for column containing posterior draw for event_status

type event_col:
str

param time_col:
(optional) name to use for column containing posterior draw for time to event

type time_col:
str

param **kwargs:

**kwargs are passed to _prep_pp_data_single_model, allowing user to override

or specify default values given in the original call to fit_stan_survival_model. Parameters include: sample_col, sample_id_col to define names of sample description & id columns

as well as join_with giving name of dataframe to join with (options include df_nonmiss, x_df, or None).

Use join_with = None to disable merge with original dataframe.

Returns:

returns: pandas.DataFrame with one record per posterior draw (iter), timepoint, model_cohort, and by-groups.

survivalstan.utils.print_stan_summary(stan_fit, pars=None)[source]¶

Convenience function to print stan fit summary, for the set of parameters in pars.

stan_fit:: StanFit object for which posterior draws are desired to be summarized
pars: (optional): list of strings used to filter parameters. Passed directly to pystan.summary. default: return all parameters

survivalstan.utils.read_files(path, pattern='*.stan', encoding='utf-8', resource=None)[source]¶

Reads file contents from a directory path into memory. Returns a dictionary of file names: file contents.

Is intended to be used to load a directory of stan files into an object.

path (string):: directory path (can be relative or absolute)
pattern (string, optional):: regex pattern applied to files on import defaults to “*.stan”
encoding (string, optional):: encoding to use when importing files defaults to “UTF-8”
resource (string, optional):: if given, path is relative to package install root used to load stan files provided by packages (e.g. those within a package library)

The specifics of the return type depend on the value of resource.

if resource is None, returns contents of file as a character string
otherwise, returns a “resource_string” which

acts as a character string but technically isn’t one.

survivalstan package¶

Submodules¶

survivalstan.models module¶

survivalstan.sim module¶

survivalstan.survivalstan module¶

survivalstan.utils module¶

Module contents¶