Lymphatic Progression Models

Contents

Lymphatic Progression Models#

This module implements the core classes to model lymphatic tumor progression.

class lymph.models.Unilateral(graph_dict: dict[tuple[str, str], list[str]], tumor_state: int | None = None, allowed_states: list[int] | None = None, max_time: int = 10, **_kwargs)[source]#

Bases: Composite, Composite, Model

Class that models metastatic progression in a unilateral lymphatic system.

It does this by representing it as a directed graph (DAG), which is stored in and managed by the attribute graph. The progression itself can be modelled via hidden Markov models (HMM) or Bayesian networks (BN). In both cases, instances of this class allow to calculate the probability of a certain hidden pattern of involvement, given an individual diagnosis of a patient.

__init__(graph_dict: dict[tuple[str, str], list[str]], tumor_state: int | None = None, allowed_states: list[int] | None = None, max_time: int = 10, **_kwargs) None[source]#

Create a new instance of the Unilateral class.

The graph_dict that represents the lymphatic system should given as a dictionary. Its keys are tuples of the form ("tumor", "<tumor_name>") or ("lnl", "<lnl_name>"). The values are lists of strings that represent the names of the nodes that are connected to the node given by the key.

Note

Do make sure the values in the dictionary are of type list and not set. Sets do not preserve the order of the elements and thus the order of the edges in the graph. This may lead to inconsistencies in the model.

For example, the following graph represents a lymphatic system with one tumors and three lymph node levels:

graph = {
    ("tumor", "T"): ["II", "III", "IV"],
    ("lnl", "II"): ["III"],
    ("lnl", "III"): ["IV"],
    ("lnl", "IV"): [],
}

The tumor_state is the initial (and unchangeable) state of the tumor. Typically, this can be omitted and is then set to be the maximum of the allowed_states, which is the states the LNLs can take on. The default is a binary representation with allowed_states=[0, 1]. For this, one can also use the classmethod binary(). For a trinary representation with allowed_states=[0, 1, 2] use the classmethod trinary().

The max_time parameter defines the latest possible time step for a diagnosis. In the HMM case, the probability disitrubtion over all hidden states is evolved from \(t=0\) to max_time. In the BN case, this parameter has no effect.

The is_micro_mod_shared and is_growth_shared parameters determine whether the microscopic involvement and growth parameters are shared among all LNLs. If they are set to True, the parameters are set globally for all LNLs. If they are set to False, the parameters are set individually for each LNL.

classmethod binary(graph_dict: dict[tuple[str, str], list[str]], **kwargs) Unilateral[source]#

Create an instance of the Unilateral class with binary LNLs.

classmethod trinary(graph_dict: dict[tuple[str, str], list[str]], **kwargs) Unilateral[source]#

Create an instance of the Unilateral class with trinary LNLs.

property is_trinary: bool#

Return whether the model is trinary.

property is_binary: bool#

Return whether the model is binary.

get_t_stages(which: Literal['valid', 'distributions', 'data'] = 'valid') list[str][source]#

Return the T-stages of the model.

get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Get the parameters of the tumor spread edges.

get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Get the parameters of the LNL spread edges.

In the trinary case, this includes the growth parameters as well as the microscopic modification parameters.

get_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Get the parameters of the spread edges.

get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Get the parameters of the model.

If as_dict is True, the parameters are returned as a dictionary. If as_flat is True, the dictionary is flattened, i.e., all nested dictionaries are merged into one, using flatten().

set_tumor_spread_params(*args: float, **kwargs: float) tuple[float][source]#

Assign new parameters to the tumor spread edges.

set_lnl_spread_params(*args: float, **kwargs: float) tuple[float][source]#

Assign new parameters to the LNL spread edges.

set_spread_params(*args: float, **kwargs: float) tuple[float][source]#

Assign new parameters to the spread edges.

set_params(*args: float, **kwargs: float) tuple[float][source]#

Assign new parameters to the model.

The parameters can be provided either via positional arguments or via keyword arguments. The positional arguments are used up one by one first by the lymph.graph.set_params() method and then by the lymph.models.Unilateral.set_distribution_params() method.

The keyword arguments can be of the format "<edge_name>_<param_name>" or "<t_stage>_<param_name>" for the distributions over diagnosis times. If only a "<param_name>" is provided, it is assumed to be a global parameter and is sent to all edges or distributions. But the more specific keyword arguments override the global ones, which in turn override the positional arguments.

>>> graph = {
...     ("tumor", "T"): ["II", "III"],
...     ("lnl", "II"): ["III"],
...     ("lnl", "III"): [],
... }
>>> model = Unilateral.trinary(
...     graph_dict=graph,
...     is_micro_mod_shared=True,
...     is_growth_shared=True,
... )
>>> model.set_params(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.99, AtoB_param="not_used")
(0.99,)
>>> model.get_params(as_dict=True)  
{'TtoII_spread': 0.1,
 'TtoIII_spread': 0.2,
 'II_growth': 0.3,
 'IItoIII_spread': 0.4,
 'IItoIII_micro': 0.5,
 'III_growth': 0.6}
>>> _ = model.set_params(growth=0.123)
>>> model.get_params(as_dict=True)  
{'TtoII_spread': 0.1,
 'TtoIII_spread': 0.2,
 'II_growth': 0.123,
 'IItoIII_spread': 0.4,
 'IItoIII_micro': 0.5,
 'III_growth': 0.123}
transition_prob(newstate: list[int], assign: bool = False) float[source]#

Computes probability to transition to newstate, given its current state.

The probability is computed as the product of the transition probabilities of the individual LNLs. If assign is True, the new state is assigned to the model using the method lymph.graph.Representation.set_state().

diagnosis_prob(diagnosis: Series | dict[str, dict[str, bool]]) float[source]#

Compute the probability to observe a diagnosis given the current state.

The diagnosis is either a pandas Series object corresponding to one row of a patient data table, or a dictionary with keys of diagnostic modalities and values of dictionaries holding the observation for each LNL under the respective key.

It returns the probability of observing this particular combination of diagnosis, given the current state of the system.

property obs_list#

Return the list of all possible observations.

They are ordered the same way as the graph.Representation.state_list, but additionally by modality. E.g., for two LNLs II, III and two modalities CT, pathology, the list would look like this:

>>> model = Unilateral(graph_dict={
...     ("tumor", "T"): ["II" , "III"],
...     ("lnl", "II"): ["III"],
...     ("lnl", "III"): [],
... })
>>> model.set_modality("CT", spec=0.8, sens=0.8)
>>> model.set_modality("pathology", spec=1.0, sens=1.0)
>>> model.obs_list  
array([[0, 0, 0, 0],
       [0, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 0, 1, 1],
       ...
       [1, 1, 0, 1],
       [1, 1, 1, 0],
       [1, 1, 1, 1]])

The first two columns correspond to the observation of LNLs II and III under modality CT, the second two columns correspond to the same LNLs under the pathology modality.

transition_matrix() ndarray[source]#

Matrix encoding the probabilities to transition from one state to another.

This is the crucial object for modelling the evolution of the probabilistic system in the context of the hidden Markov model. It has the shape \(2^N \times 2^N\) where \(N\) is the number of nodes in the graph. The \(i\)-th row and \(j\)-th column encodes the probability to transition from the \(i\)-th state to the \(j\)-th state. The states are ordered as in the graph.Representation.state_list.

See also

generate_transition()

The function actually computing the transition matrix.

>>> model = Unilateral(graph_dict={
...     ("tumor", "T"): ["II", "III"],
...     ("lnl", "II"): ["III"],
...     ("lnl", "III"): [],
... })
>>> model.set_params(0.7, 0.3, 0.2)  
()
>>> model.transition_matrix()
array([[0.21, 0.09, 0.49, 0.21],
       [0.  , 0.3 , 0.  , 0.7 ],
       [0.  , 0.  , 0.56, 0.44],
       [0.  , 0.  , 0.  , 1.  ]])
observation_matrix() ndarray[source]#

The matrix encoding the probabilities to observe a certain diagnosis.

Every element in this matrix holds a probability to observe a certain diagnosis (or combination of diagnosis, when using multiple diagnostic modalities) given the current state of the system. It has the shape \(2^N \times 2^\{N \times M\}\) where \(N\) is the number of nodes in the graph and \(M\) is the number of diagnostic modalities.

See also

generate_observation()

The function actually computing the observation matrix.

data_matrix(t_stage: str | None = None) ndarray[source]#

Extract the data matrix for a given t_stage.

The data matrix is a binary encoding of the patient data. For every patient, it encodes the information which observational state could have led to the observed diagnosis. If a diagnosis is complete, i.e., for every diagnostic modality and every LNL we have an observation, the data matrix is a one-hot encoding of the observed diagnosis. Otherwise it may contain multiple 1s, indicating over which observational state one should marginalize.

The data matrix is used to compute the diagnosis_matrix, which in turn is used to compute the likelihood of the model given the patient data.

See also

matrix.generate_data_encoding()

This function actually computes the data encoding.

diagnosis_matrix(t_stage: str | None = None) ndarray[source]#

Extract the diagnosis matrix for a given t_stage.

For every patient this matrix stores the probability to observe this patient’s diagnosis, given one of the possible hidden states of the model. It is computed by multiplying the data_matrix() with the observation_matrix().

load_patient_data(patient_data: pd.DataFrame, side: str = 'ipsi', mapping: callable | dict[int, Any] | None = None) None[source]#

Load patient data in LyProX format into the model.

Since the LyProX data format contains information on both sides (i.e., ipsi- and contralateral) of the neck, the side parameter is used to select the for which of the two to store the involvement data.

With the mapping function or dictionary, the reported T-stages (usually 0, 1, 2, 3, and 4) can be mapped to any keys also used to access the corresponding distribution over diagnosis times. The default mapping is to map 0, 1, and 2 to “early” and 3 and 4 to “late”.

What this method essentially does is to copy the entire data frame, check all necessary information is present, and add a new top-level header "_model" to the data frame. Under this header, columns are assembled that contain all the information necessary to compute the observation and diagnosis matrices.

property patient_data: DataFrame#

Return the patient data loaded into the model.

After succesfully loading the data with the method load_patient_data(), the copied patient data now contains the additional top-level header "_model". Under it, the observed per LNL involvement is listed for every diagnostic modality in the dictionary returned by get_all_modalities() and for each of the LNLs in the list graph.Representation.lnls.

It also contains information on the patient’s T-stage under the header ("_model", "#", "t_stage").

Additionally, it holds the data encodings and probability of diagnosis given the hidden states for each patient under the headers ("_model", "_encoding", <obs_state>) and ("_model", "_diagnosis_prob", <hidden_state>), respectively.

evolve(state_dist: ndarray, num_steps: int) ndarray[source]#

Evolve the state_dist of possible states over num_steps.

This is done by multiplying the state_dist with the transition matrix from the left num_steps times. The result is a new distribution over possible states at a new time-step \(t' = t + n\), where \(n\) is the number of steps num_steps.

state_dist_evo() ndarray[source]#

Compute an evolution of the model’s state distribution over time steps.

This returns a matrix with the distribution over the possible states for each time step from \(t = 0\) to \(t = T\), where \(T\) is the maximum diagnosis time stored in the model’s attribute max_time.

Note that at this point, the distributions are not weighted with the distribution over diagnosis times that are stored and managed for each T-stage in the dictionary returned by get_all_distributions().

state_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#

Compute the distribution over possible states.

Do this either for a given t_stage, when mode is set to "HMM", which is essentially a marginalization of the evolution over the possible states as computed by state_dist_evo() with the distribution over diagnosis times for the given T-stage from the dictionary returned by get_all_distributions().

Or, when mode is set to "BN", compute the distribution over states for the Bayesian network. In that case, the t_stage parameter is ignored.

obs_dist(given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#

Compute the distribution over all possible observations for a given T-stage.

Returns an array of probabilities for each possible complete observation. This entails multiplying the distribution over states as returned by the state_dist() method with the observation_matrix().

Note that since the observation_matrix can become very large, this method is not very efficient for inference. Instead, we compute the diagnosis_matrix() from the observation_matrix() and the data_matrix() and use these to compute the likelihood.

likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, t_stage: str | None = None, mode: Literal['HMM', 'BN'] = 'HMM') float[source]#

Compute the (log-)likelihood of the stored data given the model (and params).

See the documentation of lymph.types.Model.likelihood() for more information on how to use the given_params parameter.

Returns the log-likelihood if log is set to True. The mode parameter determines whether the likelihood is computed for the hidden Markov model ("HMM") or the Bayesian network ("BN").

compute_encoding(given_diagnosis: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]] | None = None) ndarray[source]#

Compute one-hot vector encoding of a given diagnosis.

posterior_state_dist(given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]] | None = None, t_stage: str | int = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#

Compute the posterior distribution over hidden states given a diagnosis.

The given_diagnosis is a dictionary of diagnosis for each modality. E.g., this could look like this:

given_diagnosis = {
    "MRI": {"II": True, "III": False, "IV": False},
    "PET": {"II": True, "III": True, "IV": None},
}

The t_stage parameter determines the T-stage for which the posterior is computed. The mode parameter determines whether the posterior is computed for the hidden Markov model ("HMM") or the Bayesian network ("BN"). In case of the Bayesian network mode, the t_stage parameter is ignored.

Warning

To speed up repetitive computations, one can provide precomputed state distributions via the given_state_dist parameter. When provided, the method will ignore the given_params, t_stage, and mode arguments, but compute the posterior much quicker.

marginalize(involvement: dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None] | None = None, given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float[source]#

Marginalize given_state_dist over matching involvement patterns.

Any state that matches the provided involvement pattern is marginalized over. For this, the matrix.compute_encoding() function is used.

If given_state_dist is None, it will be computed by calling state_dist() with the given t_stage and mode. These arguments are ignored if given_state_dist is provided.

risk(involvement: dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None] | None = None, given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]] | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float[source]#

Compute risk of a certain involvement, using the given_diagnosis.

If an involvement pattern of interest is provided, this method computes the risk of seeing just that pattern for the set of given parameters and a dictionary of diagnosis for each modality.

If no involvement is provided, this will simply return the posterior distribution over hidden states, given the diagnosis, as computed by the posterior_state_dist() method. See its documentaiton for more details about the arguments and the return value.

draw_diagnosis(diag_times: list[int], rng: Generator | None = None, seed: int = 42) ndarray[source]#

Given some diag_times, draw diagnosis for each LNL.

>>> model = Unilateral(graph_dict={
...     ("tumor", "T"): ["II" , "III"],
...     ("lnl", "II"): ["III"],
...     ("lnl", "III"): [],
... })
>>> model.set_modality("CT", spec=0.8, sens=0.8)
>>> model.draw_diagnosis([0, 1, 2, 3, 4])       
array([[False,  True],
       [False, False],
       [ True, False],
       [False,  True],
       [False, False]])
>>> draw_diagnosis(                   # this is the same as the previous example
...     diagnosis_times=[0, 1, 2, 3, 4],
...     state_evolution=model.state_dist_evo(),
...     observation_matrix=model.observation_matrix(),
...     possible_diagnosis=model.obs_list,
... )
array([[False,  True],
       [False, False],
       [ True, False],
       [False,  True],
       [False, False]])
draw_patients(num: int, stage_dist: Iterable[float], rng: Generator | None = None, seed: int = 42, **_kwargs) DataFrame[source]#

Draw num random patients from the model.

For this, a stage_dist, i.e., a distribution over the T-stages, needs to be defined. This must be an iterable of probabilities with as many elements as there are defined T-stages in the model (accessible via get_all_distributions()).

A random number generator can be provided as rng. If None, a new one is initialized with the given seed (or 42, by default).

See also

lymph.diagnosis_times.Distribution.draw_diag_times()

Method to draw diagnosis times from a distribution.

lymph.models.Unilateral.draw_diagnosis()

Method to draw individual diagnosis.

lymph.models.Bilateral.draw_patients()

The corresponding bilateral method.

class lymph.models.Bilateral(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, uni_kwargs: dict[str, Any] | None = None, ipsi_kwargs: dict[str, Any] | None = None, contra_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#

Bases: Composite, Composite, Model

Class that models metastatic progression in a bilateral lymphatic system.

This is achieved by creating two instances of the Unilateral model, one for the ipsi- and one for the contralateral side of the neck. The two sides are assumed to be independent of each other, given the diagnosis time over which we marginalize.

See also

Unilateral

Two instances of this class are created as attributes. One for the ipsi- and one for the contralateral side of the neck.

__init__(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, uni_kwargs: dict[str, Any] | None = None, ipsi_kwargs: dict[str, Any] | None = None, contra_kwargs: dict[str, Any] | None = None, **_kwargs) None[source]#

Initialize both sides of the neck as models.Unilateral.

The graph_dict is a dictionary of tuples as keys and lists of strings as values. It is passed to both models.Unilateral instances, which in turn pass it to the graph.Representation class that stores the graph.

With the dictionary is_symmetric the user can specify which aspects of the model are symmetric. Valid keys are "tumor_spread" and "lnl_spread". The values are booleans, with True meaning that the aspect is symmetric.

Note

The symmetries of tumor and LNL spread are only guaranteed if the respective parameters are set via the set_params() method of this bilateral model. It is still possible to set different parameters for the ipsi- and contralateral side by using their respective Unilateral.set_params() method.

The uni_kwargs are passed to both instances of the unilateral model, while the ipsi_kwargs and contra_kwargs are passed to the ipsi- and contralateral side, respectively. The ipsi- and contralateral kwargs override the unilateral kwargs and may also override the graph_dict. This allows the user to specify different graphs for the two sides of the neck.

classmethod binary(*args, **kwargs) Bilateral[source]#

Initialize a binary bilateral model.

This is a convenience method that sets the allowed_states of the uni_kwargs to [0, 1]. All other args and kwargs are passed to the __init__() method.

classmethod trinary(*args, **kwargs) Bilateral[source]#

Initialize a trinary bilateral model.

This is a convenience method that sets the allowed_states of the uni_kwargs to [0, 1, 2]. All other args and kwargs are passed to the __init__() method.

property is_trinary: bool#

Return whether the model is trinary.

property is_binary: bool#

Return whether the model is binary.

get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Return the parameters of the model’s spread from tumor to LNLs.

If the attribute dictionary is_symmetric stores the key-value pair "tumor_spread": True, the parameters are returned as a single dictionary, since they are the same ipsi- and contralaterally. Otherwise, the parameters are returned as a dictionary with two keys, "ipsi" and "contra".

get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Return the parameters of the model’s spread from LNLs to tumor.

Similarily to the get_tumor_spread_params() method, this returns only one dictionary if the attribute dictionary is_symmetric stores the key-value pair "lnl_spread": True. Otherwise, the parameters are returned as a dictionary with two keys, "ipsi" and "contra".

get_spread_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Return the parameters of the model’s spread edges.

Depending on the symmetries (i.e. the is_symmetric attribute), this returns different results:

If is_symmetric["tumor_spread"] = False, the flattened (as_flat=True) dictionary (as_dict=True) will contain keys of the form ipsi_Tto<lnl>_spread and contra_Tto<lnl>_spread, where <lnl> is the name of the lymph node level. However, if the tumor spread is set to be symmetric, the leading ipsi_ or contra_ is omitted, since it’s valid for both sides.

This is consistent with how the set_params() method expects the keyword arguments in case of the symmetry configurations.

>>> model = Bilateral(graph_dict={
...     ("tumor", "T"): ["II", "III"],
...     ("lnl", "II"): ["III"],
...     ("lnl", "III"): [],
... })
>>> num_dims = model.get_num_dims()
>>> model.set_spread_params(*np.round(np.linspace(0., 1., num_dims+1), 2))
(1.0,)
>>> model.get_spread_params(as_flat=False)   
{'ipsi':    {'TtoII': {'spread': 0.0},
             'TtoIII': {'spread': 0.2}},
 'contra':  {'TtoII': {'spread': 0.4},
             'TtoIII': {'spread': 0.6}},
 'IItoIII': {'spread': 0.8}}
>>> model.get_spread_params(as_flat=True)    
{'ipsi_TtoII_spread': 0.0,
 'ipsi_TtoIII_spread': 0.2,
 'contra_TtoII_spread': 0.4,
 'contra_TtoIII_spread': 0.6,
 'IItoIII_spread': 0.8}
get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Return the parameters of the model.

It returns the combination of the call to the Unilateral.get_params() of the ipsi- and contralateral side. For the use of the as_dict and as_flat arguments, see the documentation of the types.Model.get_params() method.

Also see the get_spread_params() method to understand how the symmetry settings affect the return value.

set_tumor_spread_params(*args: float, **kwargs: float) tuple[float][source]#

Set the parameters of the model’s spread from tumor to LNLs.

set_lnl_spread_params(*args: float, **kwargs: float) tuple[float][source]#

Set the parameters of the model’s spread from LNLs to tumor.

set_spread_params(*args: float, **kwargs: float) tuple[float][source]#

Set the parameters of the model’s spread edges.

set_params(*args: float, **kwargs: float) tuple[float][source]#

Set new parameters to the model.

This works almost exactly as the unilateral model’s Unilateral.set_params() method. However, this one allows the user to set the parameters of individual sides of the neck by prefixing the keyword arguments’ names with "ipsi_" or "contra_".

Anything not prefixed by "ipsi_" or "contra_" is passed to both sides of the neck. This does obviously not work with positional arguments.

When setting the parameters via positional arguments, the order is important:

  1. The parameters of the edges from tumor to LNLs:

    1. first the ipsilateral parameters,

    2. if is_symmetric["tumor_spread"] is False, the contralateral parameters. Otherwise, the ipsilateral parameters are used for both sides.

  2. The parameters of the edges from LNLs to tumor:

    1. again, first the ipsilateral parameters,

    2. if is_symmetric["lnl_spread"] is False, the contralateral parameters. Otherwise, the ipsilateral parameters are used for both sides.

  3. The parameters of the parametric distributions for marginalizing over diagnosis times.

When still some positional arguments remain after that, they are returned in a tuple.

load_patient_data(patient_data: pd.DataFrame, mapping: callable | dict[int, Any] = <function early_late_mapping>) None[source]#

Load patient data into the model.

This amounts to calling the load_patient_data() method of both sides of the neck.

state_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#

Compute the joint distribution over the ipsi- & contralateral hidden states.

This computes the state distributions of both sides and returns their outer product. In case mode is "HMM" (default), the state distributions are first marginalized over the diagnosis time distribtions of the respective t_stage.

See also

Unilateral.state_dist()

The corresponding unilateral function. Note that this method returns a 2D array, because it computes the probability of any possible combination of ipsi- and contralateral states.

obs_dist(given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#

Compute the joint distribution over the ipsi- & contralateral observations.

See also

Unilateral.obs_dist()

The corresponding unilateral function. Note that this method returns a 2D array, because it computes the probability of any possible combination of ipsi- and contralateral observations.

patient_likelihoods(t_stage: str, mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#

Compute the likelihood of each patient individually.

likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, t_stage: str | None = None, mode: Literal['HMM', 'BN'] = 'HMM')[source]#

Compute the (log-)likelihood of the stored data given the model (and params).

See the documentation of types.Model.likelihood() for more information on how to use the given_params parameter.

Returns the log-likelihood if log is set to True. The mode parameter determines whether the likelihood is computed for the hidden Markov model ("HMM") or the Bayesian network ("BN").

Note

The computation is much faster if no parameters are given, since then the transition matrix does not need to be recomputed.

See also

Unilateral.likelihood()

The corresponding unilateral function.

posterior_state_dist(given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]]] | None = None, t_stage: str | int = 'early', mode: Literal['HMM', 'BN'] = 'HMM') ndarray[source]#

Compute joint post. dist. over ipsi & contra states, given_diagnosis.

The given_diagnosis is a dictionary storing one types.DiagnosisType each for the "ipsi" and "contra" side of the neck.

Essentially, this is the risk for any possible combination of ipsi- and contralateral involvement, given the provided diagnosis.

Warning

As in the Unilateral.posterior_state_dist() method, one may provide a precomputed (joint) state distribution via the given_state_dist argument (should be a square matric). In this case, the given_params are ignored and the model does not need to recompute e.g. the transition_matrix() or state_dist(), making the computation much faster.

However, this will mean that t_stage and mode are also ignored, since these are only used to compute the state distribution.

marginalize(involvement: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]] | None = None, given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float[source]#

Marginalize given_state_dist over matching involvement patterns.

Any state that matches the provided involvement pattern is marginalized over. For this, the matrix.compute_encoding() function is used.

If given_state_dist is None, it will be computed by calling state_dist() with the given t_stage and mode. These arguments are ignored if given_state_dist is provided.

risk(involvement: dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]] | None = None, given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]]] | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM') float[source]#

Compute risk of the involvement patterns, given parameters and diagnosis.

The involvement of interest is expected to be a PatternType for each side of the neck ("ipsi" and "contra"). This method then marginalizes over those posterior state probabilities that match the involvement patterns.

If involvement is not provided, the method returns the posterior state distribution as computed by the posterior_state_dist() method. See its docstring for more details on the remaining arguments.

draw_patients(num: int, stage_dist: Iterable[float], rng: Generator | None = None, seed: int = 42, **_kwargs) DataFrame[source]#

Draw num random patients from the parametrized model.

See also

diagnosis_times.Distribution.draw_diag_times()

Method to draw diagnosis times from a distribution.

Unilateral.draw_diagnosis()

Method to draw individual diagnosis from a unilateral model.

Unilateral.draw_patients()

The unilateral method to draw a synthetic dataset.

class lymph.models.Midline(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, use_mixing: bool = True, use_central: bool = True, use_midext_evo: bool = True, marginalize_unknown: bool = True, uni_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#

Bases: Composite, Composite, Model

Models metastatic progression bilaterally with tumor lateralization.

Model a bilateral lymphatic system where an additional risk factor can be provided in the data: Whether or not the primary tumor extended over the mid-sagittal line, or is located on the mid-saggital line.

It is reasonable to assume (and supported by data) that an extension of the primary tumor significantly increases the risk for metastatic spread to the contralateral side of the neck. This class attempts to capture this using a simple assumption: We assume that the probability of spread to the contralateral side for patients with midline extension is larger than for patients without it, but smaller than the probability of spread to the ipsilateral side. Formally:

\[b_c^{\in} = \alpha \cdot b_i + (1 - \alpha) \cdot b_c^{\not\in}\]

where \(b_c^{\in}\) is the probability of spread from the primary tumor to the contralateral side for patients with midline extension, and \(b_c^{\not\in}\) for patients without. \(\alpha\) is the linear mixing parameter.

__init__(graph_dict: dict[tuple[str, str], list[str]], is_symmetric: dict[str, bool] | None = None, use_mixing: bool = True, use_central: bool = True, use_midext_evo: bool = True, marginalize_unknown: bool = True, uni_kwargs: dict[str, Any] | None = None, **_kwargs)[source]#

Initialize the model.

The class is constructed in a similar fashion to the Bilateral: That class contains one Unilateral for each side of the neck, while this class will contain several instances of Bilateral, one for the ipsilateral side and two to three for the the contralateral side covering the cases a) no midline extension, b) midline extension, and c) central tumor location.

Added keyword arguments in this constructor are use_mixing, which controls whether to use the above described mixture of spread parameters from tumor to the LNLs. And use_central, which controls whether to use a third Bilateral model for the case of a central tumor location.

The parameter use_midext_evo decides whether the tumor’s midline extions should be considered a random variable, in which case it is evolved like the state of the LNLs, or not.

With marginalize_unknown (default: True), the model will also load patients with unknown midline extension status into the model and marginalize over their state of midline extension when computing the likelihood. This extra data is stored in a Bilateral instance accessible via the attribute "unknown". Note that this bilateral instance does not get updated parameters or any other kind of attention. It is solely used to store the data and generate diagnosis matrices for those data.

The uni_kwargs are passed to all bilateral models.

See also

Bilateral: Two to four of these are held as attributes by this class. One for the case of a mid-sagittal extension of the primary tumor, one for the case of no such extension, (possibly) one for the case of a central/symmetric tumor, and (possibly) one for the case of unknown midline extension status.

classmethod trinary(*args, **kwargs) Midline[source]#

Create a trinary model.

property is_trinary: bool#

Return whether the model is trinary.

property midext_prob: float#

Return the probability of midline extension.

property mixing_param: float | None#

Return the mixing parameter.

property use_mixing: bool#

Return whether the model uses a mixing parameter.

property use_central: bool#

Return whether the model uses a central model.

property central: Bilateral#

Return the central model.

property marginalize_unknown: bool#

Return whether the model marginalizes over unknown midline extension.

property unknown: Bilateral#

Return the model storing the patients with unknown midline extension.

get_tumor_spread_params(as_dict: bool = True, as_flat: bool = True) dict[str, float] | Iterable[float][source]#

Return the tumor spread parameters of the model.

If the model uses the mixing parameter, the returned params will contain the ipsilateral spread from tumor to LNLs, the contralateral ones for the case of no midline extension, and the mixing parameter. Otherwise, it will contain the contralateral params for the cases of present and absent midline extension.

get_lnl_spread_params(as_dict: bool = True, as_flat: bool = True) dict[str, float] | Iterable[float][source]#

Return the LNL spread parameters of the model.

Depending on the value of is_symmetric["lnl_spread"], the returned params may contain only one set of spread parameters (if True) or one for the ipsi- and one for the contralateral side (if False).

get_spread_params(as_dict: bool = True, as_flat: bool = True) dict[str, float] | Iterable[float][source]#

Return the spread parameters of the model.

This combines the returned values from the calls to get_tumor_spread_params() and get_lnl_spread_params().

get_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Return all the parameters of the model.

This includes the spread parameters from the call to get_spread_params() and the distribution parameters from the call to get_distribution_params().

set_tumor_spread_params(*args: float, **kwargs: float) Iterable[float] | dict[str, float][source]#

Set the spread parameters of the midline model.

In analogy to the get_tumor_spread_params() method, this method sets the parameters describing how the tumor spreads to the LNLs. How many params to provide to this model depends on the value of the use_mixing and the use_central attributes. Have a look at what the get_tumor_spread_params() method returns for an insight in what you can provide.

set_lnl_spread_params(*args: float, **kwargs: float) Iterable[float][source]#

Set the LNL spread parameters of the midline model.

This works exactly like the Bilateral.set_lnl_spread_params() for the user, but under the hood, the parameters also need to be distributed to two or three instances of Bilateral depending on the value of the use_central attribute.

set_spread_params(*args: float, **kwargs: float) Iterable[float][source]#

Set the spread parameters of the midline model.

set_params(*args: float, **kwargs: float) Iterable[float] | dict[str, float][source]#

Set all parameters of the model.

Combines the calls to set_spread_params() and set_distribution_params().

load_patient_data(patient_data: ~pandas.core.frame.DataFrame, mapping: callable = <function early_late_mapping>) None[source]#

Load patient data into the model.

This amounts to sorting the patients into three bins:

  1. Patients whose tumor is clearly laterlaized, meaning the column ("tumor", "1", "extension") reports False. These get assigned to the noext attribute.

  2. Those with a central tumor, indicated by True in the column ("tumor", "1", "central"). If the use_central attribute is set to True, these patients are assigned to the central model. Otherwise, they are assigned to the ext model.

  3. The rest, which amounts to patients whose tumor extends over the mid-sagittal line but is not central, i.e., symmetric w.r.t to the mid-sagittal line. These are assigned to the ext model.

The split data is sent to the Bilateral.load_patient_data() method of the respective models.

midext_evo() ndarray[source]#

Evolve only the state of the midline extension.

contra_state_dist_evo() tuple[ndarray, ndarray][source]#

Evolve contra side as mixture of with & without midline extension.

state_dist(t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM', central: bool = False) ndarray[source]#

Compute the joint over ipsi- & contralaleral hidden states and midline ext.

If central=False, the result has shape (2, num_states, num_states), where the first axis is for the midline extension status, the second for the ipsilateral state, and the third for the contralateral state.

If central=True, the result will be the state distribution of the central model’s Bilateral.state_dist() method.

obs_dist(given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM', central: bool = False) ndarray[source]#

Compute the joint distribution over the ipsi- & contralateral observations.

If given_state_dist is provided, t_stage, mode, and central are ignored. The provided state distribution may be 2D or 3D. The returned distribution will have the same dimensionality.

See also

Unilateral.obs_dist()

The corresponding unilateral function. Note that this method returns a 2D array, because it computes the probability of any possible combination of ipsi- and contralateral observations.

likelihood(given_params: Iterable[float] | dict[str, float] | None = None, log: bool = True, t_stage: str | None = None, mode: Literal['HMM', 'BN'] = 'HMM') float[source]#

Compute the (log-)likelihood of the stored data given the model (and params).

See the documentation of types.Model.likelihood() for more information on how to use the given_params parameter.

Returns the log-likelihood if log is set to True. Note that in contrast to the Bilateral model, the midline model does not support the Bayesian network mode.

Note

The computation is faster if no parameters are given, since then the transition matrix does not need to be recomputed.

See also

Unilateral.likelihood()

The corresponding unilateral function.

posterior_state_dist(given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]]] | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM', midext: bool | None = None, central: bool = False) float[source]#

Compute the posterior state distribution.

Using either the given_params or the given_state_dist argument, this method computes the posterior state distribution of the model for the given_diagnosis, a specific t_stage, whether the tumor extends over the mid-sagittal line (midext), and whether it is central (central, only used if use_central is True).

See also

types.Model.posterior_state_dist()

The corresponding method in the base class.

Bilateral.posterior_state_dist()

The bilateral method that is ultimately called by this one.

marginalize(involvement: dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None] | None = None, given_state_dist: ndarray | None = None, t_stage: str = 'early', mode: Literal['HMM', 'BN'] = 'HMM', midext: bool | None = None, central: bool = False) float[source]#

Marginalize given_state_dist over matching involvement patterns.

Any state that matches the provided involvement pattern is marginalized over. For this, the matrix.compute_encoding() function is used.

The arguments t_stage, mode, and central are only used if given_state_dist is None. In this case they are passed to the state_dist() method.

risk(involvement: dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None] | None = None, given_params: Iterable[float] | dict[str, float] | None = None, given_state_dist: ndarray | None = None, given_diagnosis: dict[str, dict[str, dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro'] | None]]] | None = None, t_stage: str = 'early', midext: bool | None = None, central: bool = False, mode: Literal['HMM', 'BN'] = 'HMM') float[source]#

Compute the risk of nodal involvement given_diagnosis.

In addition to the arguments of the Bilateral.risk() method, this also allows specifying if the patient’s tumor extended over the mid-sagittal line (midext=True) or if it was even located right on that line (central=True).

For logical reasons, midext=False makes no sense if central=True and is thus ignored.

Warning

As in the Bilateral.posterior_state_dist() method, you may provide a precomputed (joint) state distribution in the given_state_dist argument. Here, this given_state_dist may be a 2D array, in which case it is assumed you know how it was computed and the arguments t_stage, midext, central, and mode are ignored. If it is 3D, it should have the shape (2, num_states, num_states) and be the output of the Midline.state_dist() method. In this case, the midext argument is not ignored: It may be used to select the correct state distribution (when True or False), or marginalize over the midline extension status (when midext=None).

draw_patients(num: int, stage_dist: Iterable[float], rng: Generator | None = None, seed: int = 42) DataFrame[source]#

Draw num patients from the parameterized model.