Lymphatic Progression Models#
The lymph module implements the core classes to model lymphatic tumor progression.
- class lymph.models.Unilateral(graph_dict: dict[tuple[str, str], list[str]], named_params: Sequence[str] | None = None, tumor_state: int | None = None, allowed_states: list[int] | None = None, max_time: int = 10, **_kwargs)[source]#
Bases:
Composite,Composite,ModelClass that models metastatic progression in a unilateral lymphatic system.
It does this by representing it as a directed graph (DAG), which is stored in and managed by the attribute
graph. The progression itself can be modelled via hidden Markov models (HMM) or Bayesian networks (BN). In both cases, instances of this class allow to calculate the probability of a certain hidden pattern of involvement, given an individual diagnosis of a patient.- __init__(graph_dict: dict[tuple[str, str], list[str]], named_params: Sequence[str] | None = None, tumor_state: int | None = None, allowed_states: list[int] | None = None, max_time: int = 10, **_kwargs) None[source]#
Create a new instance of the
Unilateralclass.The
graph_dictthat represents the lymphatic system should given as a dictionary. Its keys are tuples of the form("tumor", "<tumor_name>")or("lnl", "<lnl_name>"). The values are lists of strings that represent the names of the nodes that are connected to the node given by the key.Note
Do make sure the values in the dictionary are of type
listand notset. Sets do not preserve the order of the elements and thus the order of the edges in the graph. This may lead to inconsistencies in the model.For example, the following graph represents a lymphatic system with one tumors and three lymph node levels:
graph = { ("tumor", "T"): ["II", "III", "IV"], ("lnl", "II"): ["III"], ("lnl", "III"): ["IV"], ("lnl", "IV"): [], }
The
tumor_stateis the initial (and unchangeable) state of the tumor. Typically, this can be omitted and is then set to be the maximum of theallowed_states, which is the states the LNLs can take on. The default is a binary representation withallowed_states=[0, 1]. For this, one can also use the classmethodbinary(). For a trinary representation withallowed_states=[0, 1, 2]use the classmethodtrinary().The
max_timeparameter defines the latest possible time step for a diagnosis. In the HMM case, the probability distribution over all hidden states is evolved from \(t=0\) tomax_time. In the BN case, this parameter has no effect.
- classmethod binary(graph_dict: dict[tuple[str, str], list[str]], **kwargs) Unilateral[source]#
Create an instance of the
Unilateralclass with binary LNLs.
- classmethod trinary(graph_dict: dict[tuple[str, str], list[str]], **kwargs) Unilateral[source]#
Create an instance of the
Unilateralclass with trinary LNLs.
- property is_trinary: bool#
Return whether the model is trinary.
- property is_binary: bool#
Return whether the model is binary.
- get_t_stages(which: Literal['valid', 'distributions', 'data'] = 'valid') list[str][source]#
Return the T-stages of the model.
- get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Get the parameters of the tumor spread edges.
- get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Get the parameters of the LNL spread edges.
In the trinary case, this includes the growth parameters as well as the microscopic modification parameters.
- get_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Get the parameters of the spread edges.
- get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Get the parameters of the model.
If
as_dictisTrue, the parameters are returned as a dictionary. Ifas_flatisTrue, the dictionary is flattened, i.e., all nested dictionaries are merged into one, usingflatten().
- set_tumor_spread_params(*args: float, **kwargs: float) tuple[float][source]#
Assign new parameters to the tumor spread edges.
- set_lnl_spread_params(*args: float, **kwargs: float) tuple[float][source]#
Assign new parameters to the LNL spread edges.
- set_spread_params(*args: float, **kwargs: float) tuple[float][source]#
Assign new parameters to the spread edges.
- set_params(*args: float, **kwargs: float) tuple[float][source]#
Assign new parameters to the model.
The parameters can be provided either via positional arguments or via keyword arguments. The positional arguments are used up one by one first by the
lymph.graph.set_params()method and then by thelymph.models.Unilateral.set_distribution_params()method.The keyword arguments can be of the format
"<edge_name>_<param_name>"or"<t_stage>_<param_name>"for the distributions over diagnosis times. If only a"<param_name>"is provided, it is assumed to be a global parameter and is sent to all edges or distributions. But the more specific keyword arguments override the global ones, which in turn override the positional arguments.>>> graph = { ... ("tumor", "T"): ["II", "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... } >>> model = Unilateral.trinary( ... graph_dict=graph, ... is_micro_mod_shared=True, ... is_growth_shared=True, ... ) >>> model.set_params(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.99, AtoB_param="not_used") (0.99,) >>> model.get_params(as_dict=True) {'TtoII_spread': 0.1, 'TtoIII_spread': 0.2, 'II_growth': 0.3, 'IItoIII_spread': 0.4, 'IItoIII_micro': 0.5, 'III_growth': 0.6} >>> _ = model.set_params(growth=0.123) >>> model.get_params(as_dict=True) {'TtoII_spread': 0.1, 'TtoIII_spread': 0.2, 'II_growth': 0.123, 'IItoIII_spread': 0.4, 'IItoIII_micro': 0.5, 'III_growth': 0.123}
- transition_prob(new_state: list[int], assign: bool = False) float[source]#
Compute probability to transition to
new_state, given its current state.The probability is computed as the product of the transition probabilities of the individual LNLs. If
assignisTrue, the new state is assigned to the model using the methodlymph.graph.Representation.set_state().
- diagnosis_prob(diagnosis: Series | dict[str, dict[str, bool]]) float[source]#
Compute the probability to observe a diagnosis given the current state.
The
diagnosisis either a pandasSeriesobject corresponding to one row of a patient data table, or a dictionary with keys of diagnostic modalities and values of dictionaries holding the observation for each LNL under the respective key.It returns the probability of observing this particular combination of diagnosis, given the current state of the system.
- property obs_list#
Return the list of all possible observations.
They are ordered like the
graph.Representation.state_list, but additionally by modality. E.g., for two LNLs II, III and two modalities CT, pathology, the list would look like this:>>> model = Unilateral(graph_dict={ ... ("tumor", "T"): ["II" , "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... }) >>> model.set_modality("CT", spec=0.8, sens=0.8) >>> model.set_modality("pathology", spec=1.0, sens=1.0) >>> model.obs_list array([[0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 0, 1, 1], ... [1, 1, 0, 1], [1, 1, 1, 0], [1, 1, 1, 1]])
The first two columns correspond to the observation of LNLs II and III under modality CT, the second two columns correspond to the same LNLs under the pathology modality.
- transition_matrix() ndarray[source]#
Matrix encoding the probabilities to transition from one state to another.
This is the crucial object for modelling the evolution of the probabilistic system in the context of the hidden Markov model. It has the shape \(2^N \\times 2^N\) where \(N\) is the number of nodes in the graph. The \(i\)-th row and \(j\)-th column encodes the probability to transition from the \(i\)-th state to the \(j\)-th state. The states are ordered as in the
graph.Representation.state_list.See also
generate_transition()The function actually computing the transition matrix.
>>> model = Unilateral(graph_dict={ ... ("tumor", "T"): ["II", "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... }) >>> model.set_params(0.7, 0.3, 0.2) () >>> model.transition_matrix() array([[0.21, 0.09, 0.49, 0.21], [0. , 0.3 , 0. , 0.7 ], [0. , 0. , 0.56, 0.44], [0. , 0. , 0. , 1. ]])
- observation_matrix() ndarray[source]#
Get the matrix encoding the probabilities to observe a certain diagnosis.
Every element in this matrix holds a probability to observe a certain diagnosis (or combination of diagnosis, when using multiple diagnostic modalities) given the current state of the system. It has the shape \(2^N \\times 2^\\{N \\times M\\}\) where \(N\) is the number of nodes in the graph and \(M\) is the number of diagnostic modalities.
See also
generate_observation()The function actually computing the observation matrix.
- data_matrix(t_stage: str | None = None) ndarray[source]#
Extract the data matrix for a given
t_stage.The data matrix is a binary encoding of the patient data. For every patient, it encodes the information which observational state could have led to the observed diagnosis. If a diagnosis is complete, i.e., for every diagnostic modality and every LNL we have an observation, the data matrix is a one-hot encoding of the observed diagnosis. Otherwise it may contain multiple 1s, indicating over which observational state one should marginalize.
The data matrix is used to compute the
diagnosis_matrix, which in turn is used to compute the likelihood of the model given the patient data.See also
matrix.generate_data_encoding()This function actually computes the data encoding.
- diagnosis_matrix(t_stage: str | None = None) ndarray[source]#
Extract the diagnosis matrix for a given
t_stage.For every patient this matrix stores the probability to observe this patient’s diagnosis, given one of the possible hidden states of the model. It is computed by multiplying the
data_matrix()with theobservation_matrix().
- load_patient_data(patient_data: DataFrame, side: str = 'ipsi', mapping: Callable[[int], Any] | dict[int, Any] | None = None) None[source]#
Load patient data in LyProX format into the model.
Since the LyProX data format contains information on both sides (i.e., ipsi- and contralateral) of the neck, the
sideparameter is used to select the for which of the two to store the involvement data.hpv_statusis used to filter for patients with HPV status. Ifhpv_statusis set toTrue, only patients with HPV status are kept. Ifhpv_statusis set toFalse, only patients without HPV status are kept. Ifhpv_statusis set toNone, all patients are kept.With the
mappingfunction or dictionary, the reported T-stages (usually 0, 1, 2, 3, and 4) can be mapped to any keys also used to access the corresponding distribution over diagnosis times. The default mapping is to map 0, 1, and 2 to “early” and 3 and 4 to “late”.What this method essentially does is to copy the entire data frame, check all necessary information is present, and add a new top-level header
"_model"to the data frame. Under this header, columns are assembled that contain all the information necessary to compute the observation and diagnosis matrices.
- property patient_data: DataFrame#
Return the patient data loaded into the model.
After successfully loading the data with
load_patient_data(), the copied patient data now contains the additional top-level header"_model". Under it, the observed per LNL involvement is listed for every diagnostic modality in the dictionary returned byget_all_modalities()and for each of the LNLs in the listgraph.Representation.lnls.It also contains information on the patient’s T-stage under the header
("_model", "core", "t_stage").Additionally, it holds the data encodings and probability of diagnosis given the hidden states for each patient under the headers
("_model", "_encoding", <obs_state>)and("_model", "_diagnosis_prob", <hidden_state>), respectively.
- evolve(state_dist: ndarray, num_steps: int) ndarray[source]#
Evolve the
state_distof possible states overnum_steps.This is done by multiplying the
state_distwith the transition matrix from the leftnum_stepstimes. The result is a new distribution over possible states at a new time-step \(t' = t + n\), where \(n\) is the number of stepsnum_steps.
- state_dist_evo() ndarray[source]#
Compute an evolution of the model’s state distribution over time steps.
This returns a matrix with the distribution over the possible states for each time step from \(t = 0\) to \(t = T\), where \(T\) is the maximum diagnosis time stored in the model’s attribute
max_time.Note that at this point, the distributions are not weighted with the distribution over diagnosis times that are stored and managed for each T-stage in the dictionary returned by
get_all_distributions().
- state_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#
Compute the distribution over possible states.
Do this either for a given
t_stage, whenmodeis set to"HMM", which is essentially a marginalization of the evolution over the possible states as computed bystate_dist_evo()with the distribution over diagnosis times for the given T-stage from the dictionary returned byget_all_distributions().Or, when
modeis set to"BN", compute the distribution over states for the Bayesian network. In that case, thet_stageparameter is ignored.
- obs_dist(given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#
Compute the distribution over all possible observations for a given T-stage.
Returns an array of probabilities for each possible complete observation. This entails multiplying the distribution over states as returned by the
state_dist()method with theobservation_matrix().Note that since the
observation_matrixcan become very large, this method is not very efficient for inference. Instead, we compute thediagnosis_matrix()from theobservation_matrix()and thedata_matrix()and use these to compute the likelihood.
- patient_likelihoods(t_stage: str, mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#
Compute the likelihood of each patient individually.
- likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, t_stage: str | None = None, mode: Literal['HMM', 'BN'] = 'HMM') float[source]#
Compute the (log-)likelihood of the stored data given the model (and params).
See the documentation of
lymph.types.Model.likelihood()for more information on how to use thegiven_paramsparameter.Returns the log-likelihood if
logis set toTrue. Themodeparameter determines whether the likelihood is computed for the hidden Markov model ("HMM") or the Bayesian network ("BN").
- compute_encoding(given_diagnosis: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]] | None = None) ndarray[source]#
Compute one-hot vector encoding of a given diagnosis.
- posterior_state_dist(given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]] | None = None, t_stage: str | int = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#
Compute the posterior distribution over hidden states given a diagnosis.
The
given_diagnosisis a dictionary of diagnosis for each modality. E.g., this could look like this:given_diagnosis = { "MRI": {"II": True, "III": False, "IV": False}, "PET": {"II": True, "III": True, "IV": None}, }
The
t_stageparameter determines the T-stage for which the posterior is computed. Themodeparameter determines whether the posterior is computed for the hidden Markov model ("HMM") or the Bayesian network ("BN"). In case of the Bayesian network mode, thet_stageparameter is ignored.Warning
To speed up repetitive computations, one can provide precomputed state distributions via the
given_state_distparameter. When provided, the method will ignore thegiven_params,t_stage, andmodearguments, but compute the posterior much quicker.
- marginalize(involvement: dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None], given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float[source]#
Marginalize
given_state_distover matchinginvolvementpatterns.Any state that matches the provided
involvementpattern is marginalized over. For this, thematrix.compute_encoding()function is used.If
given_state_distisNone, it will be computed by callingstate_dist()with the givent_stageandmode. These arguments are ignored ifgiven_state_distis provided.
- risk(involvement: dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None], given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]] | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float[source]#
Compute risk of a certain
involvement, using thegiven_diagnosis.If an
involvementpattern of interest is provided, this method computes the risk of seeing just that pattern for the set of given parameters and a dictionary of diagnosis for each modality.If no
involvementis provided, this will simply return the posterior distribution over hidden states, given the diagnosis, as computed by theposterior_state_dist()method. See its documentation for more details about the arguments and the return value.
- draw_diagnosis(diag_times: list[int], rng: Generator | None = None, seed: int = 42) ndarray[source]#
Given some
diag_times, draw diagnosis for each LNL.>>> model = Unilateral(graph_dict={ ... ("tumor", "T"): ["II" , "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... }) >>> model.set_modality("CT", spec=0.8, sens=0.8) >>> model.draw_diagnosis([0, 1, 2, 3, 4]) array([[False, True], [False, True], [ True, True], [ True, True], [False, True]]) >>> draw_diagnosis( # this is the same as the previous example ... diagnosis_times=[0, 1, 2, 3, 4], ... state_evolution=model.state_dist_evo(), ... observation_matrix=model.observation_matrix(), ... possible_diagnosis=model.obs_list, ... ) array([[False, True], [False, True], [ True, True], [ True, True], [False, True]])
- draw_patients(num: int, stage_dist: Iterable[float], rng: Generator | None = None, seed: int = 42, **_kwargs) DataFrame[source]#
Draw
numrandom patients from the model.For this, a
stage_dist, i.e., a distribution over the T-stages, needs to be defined. This must be an iterable of probabilities with as many elements as there are defined T-stages in the model (accessible viaget_all_distributions()).A random number generator can be provided as
rng. IfNone, a new one is initialized with the givenseed(or42, by default).See also
lymph.diagnosis_times.Distribution.draw_diag_times()Method to draw diagnosis times from a distribution.
lymph.models.Unilateral.draw_diagnosis()Method to draw individual diagnosis.
lymph.models.Bilateral.draw_patients()The corresponding bilateral method.
- class lymph.models.HPVUnilateral(graph_dict: dict[tuple[str, str], list[str]], named_params: Sequence[str] | None = None, uni_kwargs: dict[str, Any] | None = None, hpv_kwargs: dict[str, Any] | None = None, nohpv_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#
Bases:
Composite,Composite,ModelClass that models metastatic progression in HPV and non HPV lymphatic systems.
This is achieved by creating two instances of the
Unilateralmodel, one for the HPV+ and one for the HPV- patients.See also
UnilateralTwo instances of this class are created as attributes. One for the HPV+ and one for the HPV- model.
Warning
This class is still a bit experimental and not super thoroughly tested. It may especially cause issues if one wanted to use e.g. a bilateral model composed of two
HPVUnilateralmodels.- __init__(graph_dict: dict[tuple[str, str], list[str]], named_params: Sequence[str] | None = None, uni_kwargs: dict[str, Any] | None = None, hpv_kwargs: dict[str, Any] | None = None, nohpv_kwargs: dict[str, Any] | None = None, **_kwargs) None[source]#
Initialize a
unilateralHPV model.The
graph_dictis a dictionary of tuples as keys and lists of strings as values. It is passed to bothmodels.Unilateralinstances, which in turn pass it to thegraph.Representationclass that stores the graph.
- classmethod binary(*args, **kwargs) HPVUnilateral[source]#
Initialize a binary bilateral model.
This is a convenience method that sets the
allowed_statesof theuni_kwargsto[0, 1]. All otherargsandkwargsare passed to the__init__()method.
- classmethod trinary(*args, **kwargs) HPVUnilateral[source]#
Initialize a trinary bilateral model.
This is a convenience method that sets the
allowed_statesof theuni_kwargsto[0, 1, 2]. All otherargsandkwargsare passed to the__init__()method.
- property is_trinary: bool#
Return whether the model is trinary.
- property is_binary: bool#
Return whether the model is binary.
- get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Return the parameters of the model’s spread from tumor to LNLs.
- get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Return the parameters of the model’s spread from LNLs to tumor.
Similarly to the
get_tumor_spread_params()method. However, since the spread from LNLs is symmetric in HPV and noHPV, the spread parameters are the same and only one set is returned.
- get_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Return the parameters of the model’s spread edges.
This is consistent with how the
set_params()
- get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Return the parameters of the model.
It returns the combination of the call to the
Unilateral.get_params()of the HPV- and noHPV model. For the use of theas_dictandas_flatarguments, see the documentation of thetypes.Model.get_params()method.Also see the
get_spread_params()method to understand how the symmetry settings affect the return value.
- set_tumor_spread_params(*args: float, **kwargs: float) tuple[float][source]#
Set the parameters of the model’s spread from tumor to LNLs.
- set_lnl_spread_params(*args: float, **kwargs: float) tuple[float][source]#
Set the parameters of the model’s spread from LNLs to tumor.
- set_spread_params(*args: float, **kwargs: float) tuple[float][source]#
Set the parameters of the model’s spread edges.
- load_patient_data(patient_data: pd.DataFrame, side: str = 'ipsi', mapping: callable | dict[int, Any] = <function early_late_mapping>) None[source]#
Load patient data into the model.
Amounts to calling the
load_patient_data()method of both the HPV+ and the HPV- model.
- likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, t_stage: str | None = None, mode: Literal['HMM', 'BN'] = 'HMM')[source]#
Compute the (log-)likelihood of the stored data given the model (and params).
See the documentation of
types.Model.likelihood()for more information on how to use thegiven_paramsparameter.Returns the log-likelihood if
logis set toTrue. Themodeparameter determines whether the likelihood is computed for the hidden Markov model ("HMM") or the Bayesian network ("BN").Note
The computation is much faster if no parameters are given, since then the transition matrix does not need to be recomputed.
See also
Unilateral.likelihood()The corresponding unilateral function.
- state_dist(model: Unilateral, *args, **kwargs) ndarray[source]#
Compute the distribution over possible states.
See
models.Unilateral.state_dist()for more information.
- posterior_state_dist(model: Unilateral, *args, **kwargs) ndarray[source]#
Compute the posterior distribution over hidden states given a diagnosis.
See
models.Unilateral.posterior_state_dist()for more information.
- marginalize(model: Unilateral, *args, **kwargs) ndarray[source]#
Marginalize
given_state_distover matchinginvolvementpatterns.See
models.Unilateral.marginalize()for more information.
- risk(model: Unilateral, *args, **kwargs) ndarray[source]#
Compute risk of a certain
involvement, using thegiven_diagnosis.See
models.Unilateral.risk()for more information.
- class lymph.models.Bilateral(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, named_params: Sequence[str] | None = None, uni_kwargs: dict[str, Any] | None = None, ipsi_kwargs: dict[str, Any] | None = None, contra_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#
Bases:
Composite,Composite,ModelClass that models metastatic progression in a bilateral lymphatic system.
This is achieved by creating two instances of the
Unilateralmodel, one for the ipsi- and one for the contralateral side of the neck. The two sides are assumed to be independent of each other, given the diagnosis time over which we marginalize.See also
UnilateralTwo instances of this class are created as attributes. One for the ipsi- and one for the contralateral side of the neck.
- __init__(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, named_params: Sequence[str] | None = None, uni_kwargs: dict[str, Any] | None = None, ipsi_kwargs: dict[str, Any] | None = None, contra_kwargs: dict[str, Any] | None = None, **_kwargs) None[source]#
Initialize both sides of the neck as
models.Unilateral.The
graph_dictis a dictionary of tuples as keys and lists of strings as values. It is passed to bothmodels.Unilateralinstances, which in turn pass it to thegraph.Representationclass that stores the graph.With the dictionary
is_symmetricthe user can specify which aspects of the model are symmetric. Valid keys are"tumor_spread"and"lnl_spread". The values are booleans, withTruemeaning that the aspect is symmetric.Note
The symmetries of tumor and LNL spread are only guaranteed if the respective parameters are set via the
set_params()method of this bilateral model. It is still possible to set different parameters for the ipsi- and contralateral side by using their respectiveUnilateral.set_params()method.The
uni_kwargsare passed to both instances of the unilateral model, while theipsi_kwargsandcontra_kwargsare passed to the ipsi- and contralateral side, respectively. The ipsi- and contralateral kwargs override the unilateral kwargs and may also override thegraph_dict. This allows the user to specify different graphs for the two sides of the neck.
- classmethod binary(*args, **kwargs) Bilateral[source]#
Initialize a binary bilateral model.
This is a convenience method that sets the
allowed_statesof theuni_kwargsto[0, 1]. All otherargsandkwargsare passed to the__init__()method.
- classmethod trinary(*args, **kwargs) Bilateral[source]#
Initialize a trinary bilateral model.
This is a convenience method that sets the
allowed_statesof theuni_kwargsto[0, 1, 2]. All otherargsandkwargsare passed to the__init__()method.
- property is_trinary: bool#
Return whether the model is trinary.
- property is_binary: bool#
Return whether the model is binary.
- get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Return the parameters of the model’s spread from tumor to LNLs.
If the attribute dictionary
is_symmetricstores the key-value pair"tumor_spread": True, the parameters are returned as a single dictionary, since they are the same ipsi- and contralaterally. Otherwise, the parameters are returned as a dictionary with two keys,"ipsi"and"contra".
- get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Return the parameters of the model’s spread from LNLs to tumor.
Similarly to the
get_tumor_spread_params()method, this returns only one dictionary if the attribute dictionaryis_symmetricstores the key-value pair"lnl_spread": True. Otherwise, the parameters are returned as a dictionary with two keys,"ipsi"and"contra".
- get_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Return the parameters of the model’s spread edges.
Depending on the symmetries (i.e. the
is_symmetricattribute), this returns different results:If
is_symmetric["tumor_spread"] = False, the flattened (as_flat=True) dictionary (as_dict=True) will contain keys of the formipsi_Tto<lnl>_spreadandcontra_Tto<lnl>_spread, where<lnl>is the name of the lymph node level. However, if the tumor spread is set to be symmetric, the leadingipsi_orcontra_is omitted, since it’s valid for both sides.This is consistent with how the
set_params()method expects the keyword arguments in case of the symmetry configurations.>>> model = Bilateral(graph_dict={ ... ("tumor", "T"): ["II", "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... }) >>> num_dims = model.get_num_dims() >>> num_dims 5 >>> model.set_spread_params( ... *np.round(np.linspace(0., 1., num_dims+1), 2), ... ) (1.0,) >>> model.get_spread_params(as_flat=False) {'ipsi': {'TtoII': {'spread': 0.0}, 'TtoIII': {'spread': 0.2}}, 'contra': {'TtoII': {'spread': 0.4}, 'TtoIII': {'spread': 0.6}}, 'IItoIII': {'spread': 0.8}} >>> model.get_spread_params(as_flat=True) {'ipsi_TtoII_spread': 0.0, 'ipsi_TtoIII_spread': 0.2, 'contra_TtoII_spread': 0.4, 'contra_TtoIII_spread': 0.6, 'IItoIII_spread': 0.8}
- get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Return the parameters of the model.
It returns the combination of the call to the
Unilateral.get_params()of the ipsi- and contralateral side. For the use of theas_dictandas_flatarguments, see the documentation of thetypes.Model.get_params()method.Also see the
get_spread_params()method to understand how the symmetry settings affect the return value.
- set_tumor_spread_params(*args: float, **kwargs: float) tuple[float][source]#
Set the parameters of the model’s spread from tumor to LNLs.
- set_lnl_spread_params(*args: float, **kwargs: float) tuple[float][source]#
Set the parameters of the model’s spread from LNLs to tumor.
- set_spread_params(*args: float, **kwargs: float) tuple[float][source]#
Set the parameters of the model’s spread edges.
- set_params(*args: float, **kwargs: float) tuple[float][source]#
Set new parameters to the model.
This works almost exactly as the unilateral model’s
Unilateral.set_params()method. However, this one allows the user to set the parameters of individual sides of the neck by prefixing the keyword arguments’ names with"ipsi_"or"contra_".Anything not prefixed by
"ipsi_"or"contra_"is passed to both sides of the neck. This does obviously not work with positional arguments.When setting the parameters via positional arguments, the order is important:
The parameters of the edges from tumor to LNLs:
first the ipsilateral parameters,
if
is_symmetric["tumor_spread"]isFalse, the contralateral parameters. Otherwise, the ipsilateral parameters are used for both sides.
The parameters of the edges from LNLs to tumor:
again, first the ipsilateral parameters,
if
is_symmetric["lnl_spread"]isFalse, the contralateral parameters. Otherwise, the ipsilateral parameters are used for both sides.
The parameters of the parametric distributions for marginalizing over diagnosis times.
When still some positional arguments remain after that, they are returned in a tuple.
- load_patient_data(patient_data: pd.DataFrame, mapping: callable | dict[int, Any] = <function early_late_mapping>) None[source]#
Load patient data into the model.
Amounts to calling the
load_patient_data()method of both sides of the neck.
- state_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#
Compute the joint distribution over the ipsi- & contralateral hidden states.
This computes the state distributions of both sides and returns their outer product. In case
modeis"HMM"(default), the state distributions are first marginalized over the diagnosis time distributions of the respectivet_stage.See also
Unilateral.state_dist()The corresponding unilateral function. Note that this method returns a 2D array, because it computes the probability of any possible combination of ipsi- and contralateral states.
- obs_dist(given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#
Compute the joint distribution over the ipsi- & contralateral observations.
See also
Unilateral.obs_dist()The corresponding unilateral function. Note that this method returns a 2D array, because it computes the probability of any possible combination of ipsi- and contralateral observations.
- patient_likelihoods(t_stage: str, mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#
Compute the likelihood of each patient individually.
- likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, t_stage: str | None = None, mode: Literal['HMM', 'BN'] = 'HMM')[source]#
Compute the (log-)likelihood of the stored data given the model (and params).
See the documentation of
types.Model.likelihood()for more information on how to use thegiven_paramsparameter.Returns the log-likelihood if
logis set toTrue. Themodeparameter determines whether the likelihood is computed for the hidden Markov model ("HMM") or the Bayesian network ("BN").Note
The computation is much faster if no parameters are given, since then the transition matrix does not need to be recomputed.
See also
Unilateral.likelihood()The corresponding unilateral function.
- posterior_state_dist(given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]]] | None = None, t_stage: str | int = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#
Compute joint post. dist. over ipsi & contra states,
given_diagnosis.given_diagnosisis a dictionary storing onetypes.DiagnosisTypeeach for the"ipsi"and"contra"side of the neck.Essentially, this is the risk for any possible combination of ipsi- and contralateral involvement, given the provided diagnosis.
Warning
As in the
Unilateral.posterior_state_dist()method, one may provide a precomputed (joint) state dist via thegiven_state_distargument (should be a square matrix). In this case, thegiven_paramsare ignored and the model does not need to recompute e.g. thetransition_matrix()orstate_dist(), making the computation much faster.However, this will mean that
t_stageandmodeare also ignored, since these are only used to compute the state distribution.
- marginalize(involvement: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]], given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float[source]#
Marginalize
given_state_distover matchinginvolvementpatterns.Any state that matches the provided
involvementpattern is marginalized over. For this, thematrix.compute_encoding()function is used.If
given_state_distisNone, it will be computed by callingstate_dist()with the givent_stageandmode. These arguments are ignored ifgiven_state_distis provided.
- risk(involvement: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]], given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]]] | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float[source]#
Compute risk of the
involvementpatterns, given parameters and diagnosis.The
involvementof interest is expected to be aPatternTypefor each side of the neck ("ipsi"and"contra"). This method then marginalizes over those posterior state probabilities that match theinvolvementpatterns.If
involvementis not provided, the method returns the posterior state distribution as computed by theposterior_state_dist()method. See its docstring for more details on the remaining arguments.
- draw_patients(num: int, stage_dist: Iterable[float], rng: Generator | None = None, seed: int = 42, **_kwargs) DataFrame[source]#
Draw
numrandom patients from the parametrized model.See also
diagnosis_times.Distribution.draw_diag_times()Method to draw diagnosis times from a distribution.
Unilateral.draw_diagnosis()Method to draw individual diagnosis from a unilateral model.
Unilateral.draw_patients()The unilateral method to draw a synthetic dataset.
- class lymph.models.Midline(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, use_mixing: bool = True, use_central: bool = False, use_midext_evo: bool = True, named_params: Sequence[str] | None = None, marginalize_unknown: bool = True, uni_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#
Bases:
Composite,Composite,ModelModels metastatic progression bilaterally with tumor lateralization.
Model a bilateral lymphatic system where an additional risk factor can be provided in the data: Whether or not the primary tumor extended over the mid-sagittal line, or is located on the mid-sagittal line.
It is reasonable to assume (and supported by data) that an extension of the primary tumor significantly increases the risk for metastatic spread to the contralateral side of the neck. This class attempts to capture this using a simple assumption: We assume that the probability of spread to the contralateral side for patients with midline extension is larger than for patients without it, but smaller than the probability of spread to the ipsilateral side. Formally:
\[b_c^{\in} = \alpha \cdot b_i + (1 - \alpha) \cdot b_c^{\not\in}\]where \(b_c^{\in}\) is the probability of spread from the primary tumor to the contralateral side for patients with midline extension, and \(b_c^{\not\in}\) for patients without. \(\alpha\) is the linear mixing parameter.
- __init__(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, use_mixing: bool = True, use_central: bool = False, use_midext_evo: bool = True, named_params: Sequence[str] | None = None, marginalize_unknown: bool = True, uni_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#
Initialize the model.
The class is constructed in a similar fashion to the
Bilateral: That class contains oneUnilateralfor each side of the neck, while this class will contain several instances ofBilateral, one for the ipsilateral side and two to three for the the contralateral side covering the cases a) no midline extension, b) midline extension, and c) central tumor location.Added keyword arguments in this constructor are
use_mixing, which controls whether to use the above described mixture of spread parameters from tumor to the LNLs. Anduse_central, which controls whether to use a thirdBilateralmodel for the case of a central tumor location.The parameter
use_midext_evodecides whether the tumor’s midline extensions should be considered a random variable, in which case it is evolved like the state of the LNLs, or not.With
marginalize_unknown(default:True), the model will also load patients with unknown midline extension status into the model and marginalize over their state of midline extension when computing the likelihood. This extra data is stored in aBilateralinstance accessible via the attribute"unknown". Note that this bilateral instance does not get updated parameters or any other kind of attention. It is solely used to store the data and generate diagnosis matrices for those data.The
uni_kwargsare passed to all bilateral models.See also
Bilateral: Two to four of these are held as attributes by this class. One for the case of a mid-sagittal extension of the primary tumor, one for the case of no such extension, (possibly) one for the case of a central/symmetric tumor, and (possibly) one for the case of unknown midline extension status.
- property is_trinary: bool#
Return whether the model is trinary.
- property midext_prob: float#
Return the probability of midline extension.
- property mixing_param: float | None#
Return the mixing parameter.
- property use_mixing: bool#
Return whether the model uses a mixing parameter.
- property use_central: bool#
Return whether the model uses a central model.
- property marginalize_unknown: bool#
Return whether the model marginalizes over unknown midline extension.
- get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) dict[str, float] | Iterable[float][source]#
Return the tumor spread parameters of the model.
If the model uses the mixing parameter, the returned params will contain the ipsilateral spread from tumor to LNLs, the contralateral ones for the case of no midline extension, and the mixing parameter. Otherwise, it will contain the contralateral params for the cases of present and absent midline extension.
- get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) dict[str, float] | Iterable[float][source]#
Return the LNL spread parameters of the model.
Depending on the value of
is_symmetric["lnl_spread"], the returned params may contain only one set of spread parameters (ifTrue) or one for the ipsi- and one for the contralateral side (ifFalse).
- get_spread_params(as_dict: bool = True, as_flat: bool = True) dict[str, float] | Iterable[float][source]#
Return the spread parameters of the model.
This combines the returned values from the calls to
get_tumor_spread_params()andget_lnl_spread_params().
- get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#
Return all the parameters of the model.
Includes the spread parameters from the call to
get_spread_params()and the distribution parameters from the call toget_distribution_params(). It also appends the probability of midline extension to the end of the returned params.
- set_tumor_spread_params(*args: float, **kwargs: float) Iterable[float] | dict[str, float][source]#
Set the spread parameters of the midline model.
In analogy to the
get_tumor_spread_params()method, this method sets the parameters describing how the tumor spreads to the LNLs. How many params to provide to this model depends on the value of theuse_mixingand theuse_centralattributes. Have a look at what theget_tumor_spread_params()method returns for an insight in what you can provide.
- set_lnl_spread_params(*args: float, **kwargs: float) Iterable[float][source]#
Set the LNL spread parameters of the midline model.
This works exactly like the
Bilateral.set_lnl_spread_params()for the user, but under the hood, the parameters also need to be distributed to two or three instances ofBilateraldepending on the value of theuse_centralattribute.
- set_spread_params(*args: float, **kwargs: float) Iterable[float][source]#
Set the spread parameters of the midline model.
- set_params(*args: float, **kwargs: float) Iterable[float][source]#
Set all parameters of the model.
Combines the calls to
set_spread_params()andset_distribution_params(). Additionally, it sets the probability for midline extension. Note that this parameter is always the last one that is set after the spread and distribution parameters.
- load_patient_data(patient_data: ~pandas.core.frame.DataFrame, mapping: callable = <function early_late_mapping>) None[source]#
Load patient data into the model.
This amounts to sorting the patients into three bins:
Patients whose tumor is clearly lateralized, meaning the column
("tumor", "core", "extension")reportsFalse. These get assigned to thenoextattribute.Those with a central tumor, indicated by
Truein the column("tumor", "core", "central"). If theuse_centralattribute is set toTrue, these patients are assigned to thecentralmodel. Otherwise, they are assigned to theextmodel.The rest, which amounts to patients whose tumor extends over the mid-sagittal line but is not central, i.e., symmetric w.r.t to the mid-sagittal line. These are assigned to the
extmodel.
The split data is sent to the
Bilateral.load_patient_data()method of the respective models.
- contra_state_dist_evo() tuple[ndarray, ndarray][source]#
Evolve contra side as mixture of with & without midline extension.
This computes the evolution of the contralateral state distribution for both absent and present midline extension and returns them as a tuple.
The first element of the tuple is the evolution of the contralateral state distribution while having no midline extension. This means that e.g. the value at index
[t,i]is the probability of being in stateiat timet, AND not having midline extension after thesettime steps.The second element of the tuple is the evolution of the contralateral state distribution where midline extension occurs at some time point. For example, the value at index
[t,i]is the probability of being in stateiat timet, AND having developed midline extension at some time point before.To compute this second evolution, we need to mix the model without and with midline extension at each time step, following a recusion formula.
- state_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM', central: bool = False) ndarray[source]#
Compute the joint over ipsi- & contralateral hidden states and midline ext.
If
central=False, the result has shape (2, num_states, num_states), where the first axis is for the midline extension status, the second for the ipsilateral state, and the third for the contralateral state.If
central=True, the result will be the state distribution of the central model’sBilateral.state_dist()method.
- obs_dist(given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM', central: bool = False) ndarray[source]#
Compute the joint distribution over the ipsi- & contralateral observations.
If
given_state_distis provided,t_stage,mode, andcentralare ignored. The provided state distribution may be 2D or 3D. The returned distribution will have the same dimensionality.See also
Unilateral.obs_dist()The corresponding unilateral function. Note that this method returns a 2D array, because it computes the probability of any possible combination of ipsi- and contralateral observations.
- patient_likelihoods(t_stage: str | None = None, mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#
Compute the likelihood of each patient individually.
- likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, t_stage: str | None = None, mode: Literal['HMM', 'BN'] = 'HMM') float[source]#
Compute the (log-)likelihood of the stored data given the model (and params).
See the documentation of
types.Model.likelihood()for more information on how to use thegiven_paramsparameter.Returns the log-likelihood if
logis set toTrue. Note that in contrast to theBilateralmodel, the midline model does not support the Bayesian network mode.Note
The computation is faster if no parameters are given, since then the transition matrix does not need to be recomputed.
See also
Unilateral.likelihood()The corresponding unilateral function.
- posterior_state_dist(given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]]] | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM', midext: bool | None = None, central: bool = False) float[source]#
Compute the posterior state distribution.
Using either the
given_paramsor thegiven_state_distargument, this method computes the posterior state distribution of the model for thegiven_diagnosis, a specifict_stage, whether the tumor extends over the mid-sagittal line (midext), and whether it is central (central, only used ifuse_centralisTrue).See also
types.Model.posterior_state_dist()The corresponding method in the base class.
Bilateral.posterior_state_dist()The bilateral method that is ultimately called by this one.
- marginalize(involvement: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]] | None = None, given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM', midext: bool | None = None, central: bool = False) float[source]#
Marginalize
given_state_distover matchinginvolvementpatterns.Any state that matches the provided
involvementpattern is marginalized over. For this, thematrix.compute_encoding()function is used.The arguments
t_stage,mode, andcentralare only used ifgiven_state_distisNone. In this case they are passed to thestate_dist()method.
- risk(involvement: dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None] | None = None, given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]]] | None = None, t_stage: str = 'early', midext: bool | None = None, central: bool = False, mode: Literal['HMM', 'BN'] = 'HMM') float[source]#
Compute the risk of nodal involvement
given_diagnosis.In addition to the arguments of the
Bilateral.risk()method, this also allows specifying if the patient’s tumor extended over the mid-sagittal line (midext=True) or if it was even located right on that line (central=True).For logical reasons,
midext=Falsemakes no sense ifcentral=Trueand is thus ignored.Warning
As in the
Bilateral.posterior_state_dist()method, you may provide a precomputed (joint) state distribution in thegiven_state_distargument. Here, thisgiven_state_distmay be a 2D array, in which case it is assumed you know how it was computed and the argumentst_stage,midext,central, andmodeare ignored. If it is 3D, it should have the shape(2, num_states, num_states)and be the output of theMidline.state_dist()method. In this case, themidextargument is not ignored: It may be used to select the correct state distribution (whenTrueorFalse), or marginalize over the midline extension status (whenmidext=None).