Lymphatic Progression Models#
This module implements the core classes to model lymphatic tumor progression.
- class lymph.models.Unilateral(graph_dict: dict[tuple[str, str], list[str]], tumor_state: int | None = None, allowed_states: list[int] | None = None, max_time: int = 10, **_kwargs)[source]#
Bases:
Composite
,Composite
,Model
Class that models metastatic progression in a unilateral lymphatic system.
It does this by representing it as a directed graph (DAG), which is stored in and managed by the attribute
graph
. The progression itself can be modelled via hidden Markov models (HMM) or Bayesian networks (BN). In both cases, instances of this class allow to calculate the probability of a certain hidden pattern of involvement, given an individual diagnosis of a patient.- __init__(graph_dict: dict[tuple[str, str], list[str]], tumor_state: int | None = None, allowed_states: list[int] | None = None, max_time: int = 10, **_kwargs) None [source]#
Create a new instance of the
Unilateral
class.The
graph_dict
that represents the lymphatic system should given as a dictionary. Its keys are tuples of the form("tumor", "<tumor_name>")
or("lnl", "<lnl_name>")
. The values are lists of strings that represent the names of the nodes that are connected to the node given by the key.Note
Do make sure the values in the dictionary are of type
list
and notset
. Sets do not preserve the order of the elements and thus the order of the edges in the graph. This may lead to inconsistencies in the model.For example, the following graph represents a lymphatic system with one tumors and three lymph node levels:
graph = { ("tumor", "T"): ["II", "III", "IV"], ("lnl", "II"): ["III"], ("lnl", "III"): ["IV"], ("lnl", "IV"): [], }
The
tumor_state
is the initial (and unchangeable) state of the tumor. Typically, this can be omitted and is then set to be the maximum of theallowed_states
, which is the states the LNLs can take on. The default is a binary representation withallowed_states=[0, 1]
. For this, one can also use the classmethodbinary()
. For a trinary representation withallowed_states=[0, 1, 2]
use the classmethodtrinary()
.The
max_time
parameter defines the latest possible time step for a diagnosis. In the HMM case, the probability disitrubtion over all hidden states is evolved from \(t=0\) tomax_time
. In the BN case, this parameter has no effect.The
is_micro_mod_shared
andis_growth_shared
parameters determine whether the microscopic involvement and growth parameters are shared among all LNLs. If they are set toTrue
, the parameters are set globally for all LNLs. If they are set toFalse
, the parameters are set individually for each LNL.
- classmethod binary(graph_dict: dict[tuple[str, str], list[str]], **kwargs) Unilateral [source]#
Create an instance of the
Unilateral
class with binary LNLs.
- classmethod trinary(graph_dict: dict[tuple[str, str], list[str]], **kwargs) Unilateral [source]#
Create an instance of the
Unilateral
class with trinary LNLs.
- property is_trinary: bool#
Return whether the model is trinary.
- property is_binary: bool#
Return whether the model is binary.
- get_t_stages(which: Literal['valid', 'distributions', 'data'] = 'valid') list[str] [source]#
Return the T-stages of the model.
- get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float] [source]#
Get the parameters of the tumor spread edges.
- get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float] [source]#
Get the parameters of the LNL spread edges.
In the trinary case, this includes the growth parameters as well as the microscopic modification parameters.
- get_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float] [source]#
Get the parameters of the spread edges.
- get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float] [source]#
Get the parameters of the model.
If
as_dict
isTrue
, the parameters are returned as a dictionary. Ifas_flat
isTrue
, the dictionary is flattened, i.e., all nested dictionaries are merged into one, usingflatten()
.
- set_tumor_spread_params(*args: float, **kwargs: float) tuple[float] [source]#
Assign new parameters to the tumor spread edges.
- set_lnl_spread_params(*args: float, **kwargs: float) tuple[float] [source]#
Assign new parameters to the LNL spread edges.
- set_spread_params(*args: float, **kwargs: float) tuple[float] [source]#
Assign new parameters to the spread edges.
- set_params(*args: float, **kwargs: float) tuple[float] [source]#
Assign new parameters to the model.
The parameters can be provided either via positional arguments or via keyword arguments. The positional arguments are used up one by one first by the
lymph.graph.set_params()
method and then by thelymph.models.Unilateral.set_distribution_params()
method.The keyword arguments can be of the format
"<edge_name>_<param_name>"
or"<t_stage>_<param_name>"
for the distributions over diagnose times. If only a"<param_name>"
is provided, it is assumed to be a global parameter and is sent to all edges or distributions. But the more specific keyword arguments override the global ones, which in turn override the positional arguments.>>> graph = { ... ("tumor", "T"): ["II", "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... } >>> model = Unilateral.trinary( ... graph_dict=graph, ... is_micro_mod_shared=True, ... is_growth_shared=True, ... ) >>> model.set_params(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.99, AtoB_param="not_used") (0.99,) >>> model.get_params(as_dict=True) {'TtoII_spread': 0.1, 'TtoIII_spread': 0.2, 'II_growth': 0.3, 'IItoIII_spread': 0.4, 'IItoIII_micro': 0.5, 'III_growth': 0.6} >>> _ = model.set_params(growth=0.123) >>> model.get_params(as_dict=True) {'TtoII_spread': 0.1, 'TtoIII_spread': 0.2, 'II_growth': 0.123, 'IItoIII_spread': 0.4, 'IItoIII_micro': 0.5, 'III_growth': 0.123}
- transition_prob(newstate: list[int], assign: bool = False) float [source]#
Computes probability to transition to
newstate
, given its current state.The probability is computed as the product of the transition probabilities of the individual LNLs. If
assign
isTrue
, the new state is assigned to the model using the methodlymph.graph.Representation.set_state()
.
- diagnose_prob(diagnoses: Series | dict[str, dict[str, bool]]) float [source]#
Compute the probability to observe a diagnose given the current state.
The
diagnoses
is either a pandasSeries
object corresponding to one row of a patient data table, or a dictionary with keys of diagnostic modalities and values of dictionaries holding the observation for each LNL under the respective key.It returns the probability of observing this particular combination of diagnoses, given the current state of the system.
- property obs_list#
Return the list of all possible observations.
They are ordered the same way as the
graph.Representation.state_list
, but additionally by modality. E.g., for two LNLs II, III and two modalities CT, pathology, the list would look like this:>>> model = Unilateral(graph_dict={ ... ("tumor", "T"): ["II" , "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... }) >>> model.set_modality("CT", spec=0.8, sens=0.8) >>> model.set_modality("pathology", spec=1.0, sens=1.0) >>> model.obs_list array([[0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 0, 1, 1], ... [1, 1, 0, 1], [1, 1, 1, 0], [1, 1, 1, 1]])
The first two columns correspond to the observation of LNLs II and III under modality CT, the second two columns correspond to the same LNLs under the pathology modality.
- transition_matrix() ndarray [source]#
Matrix encoding the probabilities to transition from one state to another.
This is the crucial object for modelling the evolution of the probabilistic system in the context of the hidden Markov model. It has the shape \(2^N \times 2^N\) where \(N\) is the number of nodes in the graph. The \(i\)-th row and \(j\)-th column encodes the probability to transition from the \(i\)-th state to the \(j\)-th state. The states are ordered as in the
graph.Representation.state_list
.See also
generate_transition()
The function actually computing the transition matrix.
>>> model = Unilateral(graph_dict={ ... ("tumor", "T"): ["II", "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... }) >>> model.set_params(0.7, 0.3, 0.2) () >>> model.transition_matrix() array([[0.21, 0.09, 0.49, 0.21], [0. , 0.3 , 0. , 0.7 ], [0. , 0. , 0.56, 0.44], [0. , 0. , 0. , 1. ]])
- observation_matrix() ndarray [source]#
The matrix encoding the probabilities to observe a certain diagnosis.
Every element in this matrix holds a probability to observe a certain diagnosis (or combination of diagnoses, when using multiple diagnostic modalities) given the current state of the system. It has the shape \(2^N \times 2^\{N \times M\}\) where \(N\) is the number of nodes in the graph and \(M\) is the number of diagnostic modalities.
See also
generate_observation()
The function actually computing the observation matrix.
- data_matrix(t_stage: str | None = None) ndarray [source]#
Extract the data matrix for a given
t_stage
.The data matrix is a binary encoding of the patient data. For every patient, it encodes the information which observational state could have led to the observed diagnosis. If a diagnosis is complete, i.e., for every diagnostic modality and every LNL we have an observation, the data matrix is a one-hot encoding of the observed diagnoses. Otherwise it may contain multiple 1s, indicating over which observational state one should marginalize.
The data matrix is used to compute the
diagnose_matrix
, which in turn is used to compute the likelihood of the model given the patient data.See also
matrix.generate_data_encoding()
This function actually computes the data encoding.
- diagnose_matrix(t_stage: str | None = None) ndarray [source]#
Extract the diagnose matrix for a given
t_stage
.For every patient this matrix stores the probability to observe this patient’s diagnosis, given one of the possible hidden states of the model. It is computed by multiplying the
data_matrix()
with theobservation_matrix()
.
- load_patient_data(patient_data: pd.DataFrame, side: str = 'ipsi', mapping: callable | dict[int, Any] | None = None) None [source]#
Load patient data in LyProX format into the model.
Since the LyProX data format contains information on both sides (i.e., ipsi- and contralateral) of the neck, the
side
parameter is used to select the for which of the two to store the involvement data.With the
mapping
function or dictionary, the reported T-stages (usually 0, 1, 2, 3, and 4) can be mapped to any keys also used to access the corresponding distribution over diagnose times. The default mapping is to map 0, 1, and 2 to “early” and 3 and 4 to “late”.What this method essentially does is to copy the entire data frame, check all necessary information is present, and add a new top-level header
"_model"
to the data frame. Under this header, columns are assembled that contain all the information necessary to compute the observation and diagnose matrices.
- property patient_data: DataFrame#
Return the patient data loaded into the model.
After succesfully loading the data with the method
load_patient_data()
, the copied patient data now contains the additional top-level header"_model"
. Under it, the observed per LNL involvement is listed for every diagnostic modality in the dictionary returned byget_all_modalities()
and for each of the LNLs in the listgraph.Representation.lnls
.It also contains information on the patient’s T-stage under the header
("_model", "#", "t_stage")
.Additionally, it holds the data encodings and probability of diagnosis given the hidden states for each patient under the headers
("_model", "_encoding", <obs_state>)
and("_model", "_diagnose_prob", <hidden_state>)
, respectively.
- evolve(state_dist: ndarray, num_steps: int) ndarray [source]#
Evolve the
state_dist
of possible states overnum_steps
.This is done by multiplying the
state_dist
with the transition matrix from the leftnum_steps
times. The result is a new distribution over possible states at a new time-step \(t' = t + n\), where \(n\) is the number of stepsnum_steps
.
- state_dist_evo() ndarray [source]#
Compute an evolution of the model’s state distribution over time steps.
This returns a matrix with the distribution over the possible states for each time step from \(t = 0\) to \(t = T\), where \(T\) is the maximum diagnose time stored in the model’s attribute
max_time
.Note that at this point, the distributions are not weighted with the distribution over diagnose times that are stored and managed for each T-stage in the dictionary returned by
get_all_distributions()
.
- state_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray [source]#
Compute the distribution over possible states.
Do this either for a given
t_stage
, whenmode
is set to"HMM"
, which is essentially a marginalization of the evolution over the possible states as computed bystate_dist_evo()
with the distribution over diagnose times for the given T-stage from the dictionary returned byget_all_distributions()
.Or, when
mode
is set to"BN"
, compute the distribution over states for the Bayesian network. In that case, thet_stage
parameter is ignored.
- obs_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray [source]#
Compute the distribution over all possible observations for a given T-stage.
Returns an array of probabilities for each possible complete observation. This entails multiplying the distribution over states as returned by the
state_dist()
method with theobservation_matrix()
.Note that since the
observation_matrix
can become very large, this method is not very efficient for inference. Instead, we compute thediagnose_matrix()
from theobservation_matrix()
and thedata_matrix()
and use these to compute the likelihood.
- likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, mode: Literal['HMM', 'BN'] = 'HMM', for_t_stage: str | None = None) float [source]#
Compute the (log-)likelihood of the stored data given the model (and params).
See the documentation of
lymph.types.Model.likelihood()
for more information on how to use thegiven_params
parameter.Returns the log-likelihood if
log
is set toTrue
. Themode
parameter determines whether the likelihood is computed for the hidden Markov model ("HMM"
) or the Bayesian network ("BN"
).
- compute_encoding(given_diagnoses: dict[str, dict[str, bool | str | NAType | None]] | None = None) ndarray [source]#
Compute one-hot vector encoding of a given diagnosis.
- posterior_state_dist(given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnoses: dict[str, dict[str, bool | str | NAType | None]] | None = None, t_stage: str | int = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray [source]#
Compute the posterior distribution over hidden states given a diagnosis.
The
given_diagnoses
is a dictionary of diagnoses for each modality. E.g., this could look like this:given_diagnoses = { "MRI": {"II": True, "III": False, "IV": False}, "PET": {"II": True, "III": True, "IV": None}, }
The
t_stage
parameter determines the T-stage for which the posterior is computed. Themode
parameter determines whether the posterior is computed for the hidden Markov model ("HMM"
) or the Bayesian network ("BN"
). In case of the Bayesian network mode, thet_stage
parameter is ignored.Warning
To speed up repetitive computations, one can provide precomputed state distributions via the
given_state_dist
parameter. When provided, the method will ignore thegiven_params
,t_stage
, andmode
arguments, but compute the posterior much quicker.
- risk(involvement: dict[str, bool | str | NAType | None] | None = None, given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnoses: dict[str, dict[str, bool | str | NAType | None]] | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float | ndarray [source]#
Compute risk of a certain
involvement
, using thegiven_diagnoses
.If an
involvement
pattern of interest is provided, this method computes the risk of seeing just that pattern for the set of given parameters and a dictionary of diagnoses for each modality.If no
involvement
is provided, this will simply return the posterior distribution over hidden states, given the diagnoses, as computed by theposterior_state_dist()
method. See its documentaiton for more details about the arguments and the return value.
- draw_diagnoses(diag_times: list[int], rng: Generator | None = None, seed: int = 42) ndarray [source]#
Given some
diag_times
, draw diagnoses for each LNL.>>> model = Unilateral(graph_dict={ ... ("tumor", "T"): ["II" , "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... }) >>> model.set_modality("CT", spec=0.8, sens=0.8) >>> model.draw_diagnoses([0, 1, 2, 3, 4]) array([[False, True], [False, False], [ True, False], [False, True], [False, False]]) >>> draw_diagnoses( # this is the same as the previous example ... diagnose_times=[0, 1, 2, 3, 4], ... state_evolution=model.state_dist_evo(), ... observation_matrix=model.observation_matrix(), ... possible_diagnoses=model.obs_list, ... ) array([[False, True], [False, False], [ True, False], [False, True], [False, False]])
- draw_patients(num: int, stage_dist: Iterable[float], rng: Generator | None = None, seed: int = 42, **_kwargs) DataFrame [source]#
Draw
num
random patients from the model.For this, a
stage_dist
, i.e., a distribution over the T-stages, needs to be defined. This must be an iterable of probabilities with as many elements as there are defined T-stages in the model (accessible viaget_all_distributions()
).A random number generator can be provided as
rng
. IfNone
, a new one is initialized with the givenseed
(or42
, by default).See also
lymph.diagnose_times.Distribution.draw_diag_times()
Method to draw diagnose times from a distribution.
lymph.models.Unilateral.draw_diagnoses()
Method to draw individual diagnoses.
lymph.models.Bilateral.draw_patients()
The corresponding bilateral method.
- class lymph.models.Bilateral(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, uni_kwargs: dict[str, Any] | None = None, ipsi_kwargs: dict[str, Any] | None = None, contra_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#
Bases:
Composite
,Composite
,Model
Class that models metastatic progression in a bilateral lymphatic system.
This is achieved by creating two instances of the
Unilateral
model, one for the ipsi- and one for the contralateral side of the neck. The two sides are assumed to be independent of each other, given the diagnose time over which we marginalize.See also
Unilateral
Two instances of this class are created as attributes. One for the ipsi- and one for the contralateral side of the neck.
- __init__(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, uni_kwargs: dict[str, Any] | None = None, ipsi_kwargs: dict[str, Any] | None = None, contra_kwargs: dict[str, Any] | None = None, **_kwargs) None [source]#
Initialize both sides of the neck as
models.Unilateral
.The
graph_dict
is a dictionary of tuples as keys and lists of strings as values. It is passed to bothmodels.Unilateral
instances, which in turn pass it to thegraph.Representation
class that stores the graph.With the dictionary
is_symmetric
the user can specify which aspects of the model are symmetric. Valid keys are"tumor_spread"
and"lnl_spread"
. The values are booleans, withTrue
meaning that the aspect is symmetric.Note
The symmetries of tumor and LNL spread are only guaranteed if the respective parameters are set via the
set_params()
method of this bilateral model. It is still possible to set different parameters for the ipsi- and contralateral side by using their respectiveUnilateral.set_params()
method.The
uni_kwargs
are passed to both instances of the unilateral model, while theipsi_kwargs
andcontra_kwargs
are passed to the ipsi- and contralateral side, respectively. The ipsi- and contralateral kwargs override the unilateral kwargs and may also override thegraph_dict
. This allows the user to specify different graphs for the two sides of the neck.
- classmethod binary(*args, **kwargs) Bilateral [source]#
Initialize a binary bilateral model.
This is a convenience method that sets the
allowed_states
of theuni_kwargs
to[0, 1]
. All otherargs
andkwargs
are passed to the__init__()
method.
- classmethod trinary(*args, **kwargs) Bilateral [source]#
Initialize a trinary bilateral model.
This is a convenience method that sets the
allowed_states
of theuni_kwargs
to[0, 1, 2]
. All otherargs
andkwargs
are passed to the__init__()
method.
- property is_trinary: bool#
Return whether the model is trinary.
- property is_binary: bool#
Return whether the model is binary.
- get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float] [source]#
Return the parameters of the model’s spread from tumor to LNLs.
If the attribute dictionary
is_symmetric
stores the key-value pair"tumor_spread": True
, the parameters are returned as a single dictionary, since they are the same ipsi- and contralaterally. Otherwise, the parameters are returned as a dictionary with two keys,"ipsi"
and"contra"
.
- get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float] [source]#
Return the parameters of the model’s spread from LNLs to tumor.
Similarily to the
get_tumor_spread_params()
method, this returns only one dictionary if the attribute dictionaryis_symmetric
stores the key-value pair"lnl_spread": True
. Otherwise, the parameters are returned as a dictionary with two keys,"ipsi"
and"contra"
.
- get_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float] [source]#
Return the parameters of the model’s spread edges.
Depending on the symmetries (i.e. the
is_symmetric
attribute), this returns different results:If
is_symmetric["tumor_spread"] = False
, the flattened (as_flat=True
) dictionary (as_dict=True
) will contain keys of the formipsi_Tto<lnl>_spread
andcontra_Tto<lnl>_spread
, where<lnl>
is the name of the lymph node level. However, if the tumor spread is set to be symmetric, the leadingipsi_
orcontra_
is omitted, since it’s valid for both sides.This is consistent with how the
set_params()
method expects the keyword arguments in case of the symmetry configurations.>>> model = Bilateral(graph_dict={ ... ("tumor", "T"): ["II", "III"], ... ("lnl", "II"): ["III"], ... ("lnl", "III"): [], ... }) >>> num_dims = model.get_num_dims() >>> model.set_spread_params(*np.round(np.linspace(0., 1., num_dims+1), 2)) (1.0,) >>> model.get_spread_params(as_flat=False) {'ipsi': {'TtoII': {'spread': 0.0}, 'TtoIII': {'spread': 0.2}}, 'contra': {'TtoII': {'spread': 0.4}, 'TtoIII': {'spread': 0.6}}, 'IItoIII': {'spread': 0.8}} >>> model.get_spread_params(as_flat=True) {'ipsi_TtoII_spread': 0.0, 'ipsi_TtoIII_spread': 0.2, 'contra_TtoII_spread': 0.4, 'contra_TtoIII_spread': 0.6, 'IItoIII_spread': 0.8}
- get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float] [source]#
Return the parameters of the model.
It returns the combination of the call to the
Unilateral.get_params()
of the ipsi- and contralateral side. For the use of theas_dict
andas_flat
arguments, see the documentation of thetypes.Model.get_params()
method.Also see the
get_spread_params()
method to understand how the symmetry settings affect the return value.
- set_tumor_spread_params(*args: float, **kwargs: float) tuple[float] [source]#
Set the parameters of the model’s spread from tumor to LNLs.
- set_lnl_spread_params(*args: float, **kwargs: float) tuple[float] [source]#
Set the parameters of the model’s spread from LNLs to tumor.
- set_spread_params(*args: float, **kwargs: float) tuple[float] [source]#
Set the parameters of the model’s spread edges.
- set_params(*args: float, **kwargs: float) tuple[float] [source]#
Set new parameters to the model.
This works almost exactly as the unilateral model’s
Unilateral.set_params()
method. However, this one allows the user to set the parameters of individual sides of the neck by prefixing the keyword arguments’ names with"ipsi_"
or"contra_"
.Anything not prefixed by
"ipsi_"
or"contra_"
is passed to both sides of the neck. This does obviously not work with positional arguments.When setting the parameters via positional arguments, the order is important:
The parameters of the edges from tumor to LNLs:
first the ipsilateral parameters,
if
is_symmetric["tumor_spread"]
isFalse
, the contralateral parameters. Otherwise, the ipsilateral parameters are used for both sides.
The parameters of the edges from LNLs to tumor:
again, first the ipsilateral parameters,
if
is_symmetric["lnl_spread"]
isFalse
, the contralateral parameters. Otherwise, the ipsilateral parameters are used for both sides.
The parameters of the parametric distributions for marginalizing over diagnose times.
When still some positional arguments remain after that, they are returned in a tuple.
- load_patient_data(patient_data: pd.DataFrame, mapping: callable | dict[int, Any] = <function early_late_mapping>) None [source]#
Load patient data into the model.
This amounts to calling the
load_patient_data()
method of both sides of the neck.
- state_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray [source]#
Compute the joint distribution over the ipsi- & contralateral hidden states.
This computes the state distributions of both sides and returns their outer product. In case
mode
is"HMM"
(default), the state distributions are first marginalized over the diagnose time distribtions of the respectivet_stage
.See also
Unilateral.state_dist()
The corresponding unilateral function. Note that this method returns a 2D array, because it computes the probability of any possible combination of ipsi- and contralateral states.
- obs_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray [source]#
Compute the joint distribution over the ipsi- & contralateral observations.
See also
Unilateral.obs_dist()
The corresponding unilateral function. Note that this method returns a 2D array, because it computes the probability of any possible combination of ipsi- and contralateral observations.
- patient_likelihoods(t_stage: str, mode: Literal['HMM', 'BN'] = 'HMM') ndarray [source]#
Compute the likelihood of each patient individually.
- likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, mode: Literal['HMM', 'BN'] = 'HMM', for_t_stage: str | None = None)[source]#
Compute the (log-)likelihood of the stored data given the model (and params).
See the documentation of
types.Model.likelihood()
for more information on how to use thegiven_params
parameter.Returns the log-likelihood if
log
is set toTrue
. Themode
parameter determines whether the likelihood is computed for the hidden Markov model ("HMM"
) or the Bayesian network ("BN"
).Note
The computation is much faster if no parameters are given, since then the transition matrix does not need to be recomputed.
See also
Unilateral.likelihood()
The corresponding unilateral function.
- posterior_state_dist(given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnoses: dict[str, dict[str, dict[str, bool | str | NAType | None]]] | None = None, t_stage: str | int = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray [source]#
Compute joint post. dist. over ipsi & contra states,
given_diagnoses
.The
given_diagnoses
is a dictionary storing onetypes.DiagnoseType
each for the"ipsi"
and"contra"
side of the neck.Essentially, this is the risk for any possible combination of ipsi- and contralateral involvement, given the provided diagnoses.
Warning
As in the
Unilateral.posterior_state_dist()
method, one may provide a precomputed (joint) state distribution via thegiven_state_dist
argument (should be a square matric). In this case, thegiven_params
are ignored and the model does not need to recompute e.g. thetransition_matrix()
orstate_dist()
, making the computation much faster.However, this will mean that
t_stage
andmode
are also ignored, since these are only used to compute the state distribution.
- risk(involvement: dict[str, dict[str, bool | str | NAType | None]] | None = None, given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnoses: dict[str, dict[str, dict[str, bool | str | NAType | None]]] | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float [source]#
Compute risk of the
involvement
patterns, given parameters and diagnoses.The
involvement
of interest is expected to be aPatternType
for each side of the neck ("ipsi"
and"contra"
). This method then marginalizes over those posterior state probabilities that match theinvolvement
patterns.If
involvement
is not provided, the method returns the posterior state distribution as computed by theposterior_state_dist()
method. See its docstring for more details on the remaining arguments.
- draw_patients(num: int, stage_dist: Iterable[float], rng: Generator | None = None, seed: int = 42, **_kwargs) DataFrame [source]#
Draw
num
random patients from the parametrized model.See also
diagnose_times.Distribution.draw_diag_times()
Method to draw diagnose times from a distribution.
Unilateral.draw_diagnoses()
Method to draw individual diagnoses from a unilateral model.
Unilateral.draw_patients()
The unilateral method to draw a synthetic dataset.
- class lymph.models.Midline(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, use_mixing: bool = True, use_central: bool = True, use_midext_evo: bool = True, marginalize_unknown: bool = True, uni_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#
Bases:
Composite
,Composite
,Model
Models metastatic progression bilaterally with tumor lateralization.
Model a bilateral lymphatic system where an additional risk factor can be provided in the data: Whether or not the primary tumor extended over the mid-sagittal line, or is located on the mid-saggital line.
It is reasonable to assume (and supported by data) that an extension of the primary tumor significantly increases the risk for metastatic spread to the contralateral side of the neck. This class attempts to capture this using a simple assumption: We assume that the probability of spread to the contralateral side for patients with midline extension is larger than for patients without it, but smaller than the probability of spread to the ipsilateral side. Formally:
\[b_c^{\in} = \alpha \cdot b_i + (1 - \alpha) \cdot b_c^{\not\in}\]where \(b_c^{\in}\) is the probability of spread from the primary tumor to the contralateral side for patients with midline extension, and \(b_c^{\not\in}\) for patients without. \(\alpha\) is the linear mixing parameter.
- __init__(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, use_mixing: bool = True, use_central: bool = True, use_midext_evo: bool = True, marginalize_unknown: bool = True, uni_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#
Initialize the model.
The class is constructed in a similar fashion to the
Bilateral
: That class contains oneUnilateral
for each side of the neck, while this class will contain several instances ofBilateral
, one for the ipsilateral side and two to three for the the contralateral side covering the cases a) no midline extension, b) midline extension, and c) central tumor location.Added keyword arguments in this constructor are
use_mixing
, which controls whether to use the above described mixture of spread parameters from tumor to the LNLs. Anduse_central
, which controls whether to use a thirdBilateral
model for the case of a central tumor location.The parameter
use_midext_evo
decides whether the tumor’s midline extions should be considered a random variable, in which case it is evolved like the state of the LNLs, or not.With
marginalize_unknown
(default:True
), the model will also load patients with unknown midline extension status into the model and marginalize over their state of midline extension when computing the likelihood. This extra data is stored in aBilateral
instance accessible via the attribute"unknown"
. Note that this bilateral instance does not get updated parameters or any other kind of attention. It is solely used to store the data and generate diagnose matrices for those data.The
uni_kwargs
are passed to all bilateral models.See also
Bilateral
: Two to four of these are held as attributes by this class. One for the case of a mid-sagittal extension of the primary tumor, one for the case of no such extension, (possibly) one for the case of a central/symmetric tumor, and (possibly) one for the case of unknown midline extension status.
- property is_trinary: bool#
Return whether the model is trinary.
- property midext_prob: float#
Return the probability of midline extension.
- property mixing_param: float | None#
Return the mixing parameter.
- property use_mixing: bool#
Return whether the model uses a mixing parameter.
- property use_central: bool#
Return whether the model uses a central model.
- property marginalize_unknown: bool#
Return whether the model marginalizes over unknown midline extension.
- get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) dict[str, float] | Iterable[float] [source]#
Return the tumor spread parameters of the model.
If the model uses the mixing parameter, the returned params will contain the ipsilateral spread from tumor to LNLs, the contralateral ones for the case of no midline extension, and the mixing parameter. Otherwise, it will contain the contralateral params for the cases of present and absent midline extension.
- get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) dict[str, float] | Iterable[float] [source]#
Return the LNL spread parameters of the model.
Depending on the value of
is_symmetric["lnl_spread"]
, the returned params may contain only one set of spread parameters (ifTrue
) or one for the ipsi- and one for the contralateral side (ifFalse
).
- get_spread_params(as_dict: bool = True, as_flat: bool = True) dict[str, float] | Iterable[float] [source]#
Return the spread parameters of the model.
This combines the returned values from the calls to
get_tumor_spread_params()
andget_lnl_spread_params()
.
- get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float] [source]#
Return all the parameters of the model.
This includes the spread parameters from the call to
get_spread_params()
and the distribution parameters from the call toget_distribution_params()
.
- set_tumor_spread_params(*args: float, **kwargs: float) Iterable[float] | dict[str, float] [source]#
Set the spread parameters of the midline model.
In analogy to the
get_tumor_spread_params()
method, this method sets the parameters describing how the tumor spreads to the LNLs. How many params to provide to this model depends on the value of theuse_mixing
and theuse_central
attributes. Have a look at what theget_tumor_spread_params()
method returns for an insight in what you can provide.
- set_lnl_spread_params(*args: float, **kwargs: float) Iterable[float] [source]#
Set the LNL spread parameters of the midline model.
This works exactly like the
Bilateral.set_lnl_spread_params()
for the user, but under the hood, the parameters also need to be distributed to two or three instances ofBilateral
depending on the value of theuse_central
attribute.
- set_spread_params(*args: float, **kwargs: float) Iterable[float] [source]#
Set the spread parameters of the midline model.
- set_params(*args: float, **kwargs: float) Iterable[float] | dict[str, float] [source]#
Set all parameters of the model.
Combines the calls to
set_spread_params()
andset_distribution_params()
.
- load_patient_data(patient_data: ~pandas.core.frame.DataFrame, mapping: callable = <function early_late_mapping>) None [source]#
Load patient data into the model.
This amounts to sorting the patients into three bins:
Patients whose tumor is clearly laterlaized, meaning the column
("tumor", "1", "extension")
reportsFalse
. These get assigned to thenoext
attribute.Those with a central tumor, indicated by
True
in the column("tumor", "1", "central")
. If theuse_central
attribute is set toTrue
, these patients are assigned to thecentral
model. Otherwise, they are assigned to theext
model.The rest, which amounts to patients whose tumor extends over the mid-sagittal line but is not central, i.e., symmetric w.r.t to the mid-sagittal line. These are assigned to the
ext
model.
The split data is sent to the
Bilateral.load_patient_data()
method of the respective models.
- contra_state_dist_evo() tuple[ndarray, ndarray] [source]#
Evolve contra side as mixture of with & without midline extension.
- state_dist(t_stage: str = 'early', central: bool = False, mode: Literal['HMM', 'BN'] = 'HMM') ndarray [source]#
Compute the joint over ipsi- & contralaleral hidden states and midline ext.
If
central=False
, the result has shape (2, num_states, num_states), where the first axis is for the midline extension status, the second for the ipsilateral state, and the third for the contralateral state.If
central=True
, the result will be the state distribution of the central model’sBilateral.state_dist()
method.
- likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, mode: Literal['HMM', 'BN'] = 'HMM', for_t_stage: str | None = None) float [source]#
Compute the (log-)likelihood of the stored data given the model (and params).
See the documentation of
types.Model.likelihood()
for more information on how to use thegiven_params
parameter.Returns the log-likelihood if
log
is set toTrue
. Note that in contrast to theBilateral
model, the midline model does not support the Bayesian network mode.Note
The computation is faster if no parameters are given, since then the transition matrix does not need to be recomputed.
See also
Unilateral.likelihood()
The corresponding unilateral function.
- risk(involvement: dict[str, bool | str | NAType | None] | None = None, given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnoses: dict[str, dict[str, dict[str, bool | str | NAType | None]]] | None = None, t_stage: str = 'early', midext: bool | None = None, central: bool = False, mode: Literal['HMM', 'BN'] = 'HMM') float [source]#
Compute the risk of nodal involvement
given_diagnoses
.In addition to the arguments of the
Bilateral.risk()
method, this also allows specifying if the patient’s tumor extended over the mid-sagittal line (midext=True
) or if it was even located right on that line (central=True
).For logical reasons,
midext=False
makes no sense ifcentral=True
and is thus ignored.Warning
As in the
Bilateral.posterior_state_dist()
method, you may provide a precomputed (joint) state distribution in thegiven_state_dist
argument. Here, thisgiven_state_dist
may be a 2D array, in which case it is assumed you know how it was computed and the argumentst_stage
,midext
,central
, andmode
are ignored. If it is 3D, it should have the shape(2, num_states, num_states)
and be the output of theMidline.state_dist()
method. In this case, themidext
argument is not ignored: It may be used to select the correct state distribution (whenTrue
orFalse
), or marginalize over the midline extension status (whenmidext=None
).