Components#

Below we document the core components implementing the Bayesian network and hidden Markov model we use to compute probabilistic predictions of lymphatic tumor spread.

Diagnostic Modalities#

Module implementing management of the diagnostic modalities.

This allows the user to define diagnostic modalities and their sensitivity/specificity values. This is necessary to compute the likelihood of a dataset (that was created by recoding the output of diagnostic modalities), given the model and its parameters (which we want to learn).

class lymph.modalities.Modality(spec: float, sens: float, is_trinary: bool = False)[source]#

Bases: object

Stores the confusion matrix of a diagnostic modality.

__init__(spec: float, sens: float, is_trinary: bool = False) None[source]#
__hash__() int[source]#

Return a hash of the modality.

This is computed from the confusion matrix of the modality.

property spec: float#

Return the specificity of the modality.

property sens: float#

Return the sensitivity of the modality.

compute_confusion_matrix() ndarray[source]#

Compute the confusion matrix of the modality.

property confusion_matrix: ndarray#

Return the confusion matrix of the modality.

check_confusion_matrix(value: ndarray) None[source]#

Check if the confusion matrix is valid.

class lymph.modalities.Clinical(spec: float, sens: float, is_trinary: bool = False)[source]#

Bases: Modality

Stores the confusion matrix of a clinical modality.

compute_confusion_matrix() ndarray[source]#

Compute the confusion matrix of the clinical modality.

class lymph.modalities.Pathological(spec: float, sens: float, is_trinary: bool = False)[source]#

Bases: Modality

Stores the confusion matrix of a pathological modality.

compute_confusion_matrix() ndarray[source]#

Return the confusion matrix of the pathological modality.

class lymph.modalities.Composite(modality_children: dict[str, Composite] | None = None, is_modality_leaf: bool = False)[source]#

Bases: ABC

Abstract base class implementing the composite pattern for diagnostic modalities.

Any class inheriting from this class should be able to handle the definition of diagnostic modalities and their sensitivity/specificity values,

__init__(modality_children: dict[str, Composite] | None = None, is_modality_leaf: bool = False) None[source]#

Initialize the modality composite.

abstract property is_trinary: bool#

Return whether the modality is trinary.

modalities_hash() int[source]#

Compute a hash from all stored modalities.

See the Modality.__hash__() method for more information.

get_modality(name: str) Modality[source]#

Return the modality with the given name.

get_all_modalities() dict[str, Modality][source]#

Return all modalities of the composite.

This will issue a warning if it finds that not all modalities of the composite are equal. Note that it will always return the modalities of the first child. This means one should NOT try to set the modalities via the returned dictionary of this method. Instead, use the set_modality() method.

set_modality(name: str, spec: float, sens: float, kind: Literal['clinical', 'pathological'] = 'clinical') None[source]#

Set the modality with the given name.

del_modality(name: str) None[source]#

Delete the modality with the given name.

replace_all_modalities(modalities: dict[str, Modality]) None[source]#

Replace all modalities of the composite with new modalities.

clear_modalities() None[source]#

Clear all modalities of the composite.

Marginalization over Diagnosis Times#

Module for marginalizing over diagnosis times.

The hidden Markov model we implement assumes that every patient started off with a healthy neck, meaning no lymph node levels harboured any metastases. This is a valid assumption, but brings with it the issue of determining how long ago this likely was.

This module allows the user to define a distribution over abstract time-steps that indicate for different T-categories how probable a diagnosis at this time-step was. That allows us to treat T1 and T4 patients fundamentally in the same way, even with the same parameters, except for the parametrization of their respective distribution over the time of diagnosis.

exception lymph.diagnosis_times.SupportError[source]#

Bases: Exception

Error that is raised when no support for a distribution is provided.

class lymph.diagnosis_times.Distribution(distribution: Iterable[float] | callable, max_time: int | None = None, **kwargs)[source]#

Bases: object

Class that provides a way of storing distributions over diagnosis times.

__init__(distribution: Iterable[float] | callable, max_time: int | None = None, **kwargs) None[source]#

Initialize a distribution over diagnosis times.

This object can either be created by passing a parametrized function (e.g., scipy.stats distribution) or by passing a list of probabilities for each diagnosis time.

The signature of the function must be func(support, **kwargs), where support is the support of the distribution from 0 to max_time. The function must return a list of probabilities for each diagnosis time.

Note

All arguments except support must have default values and if some parameters have bounds (like the binomial distribution’s p), the function must raise a ValueError if the parameter is invalid.

Since max_time specifies the support of the distribution (ranging from 0 to max_time), it must be provided if a parametrized function is passed. If a list of probabilities is passed, max_time is inferred from the length of the list and can be omitted. But an error is raised if the length of the list and max_time + 1 don’t match, in case it is accidentally provided.

static extract_kwargs(distribution: callable) dict[str, Any][source]#

Extract the keyword arguments from the provided parametric distribution.

The signature of the provided parametric distribution must be func(support, **kwargs). The first argument is the support of the distribution, which is a list or array of integers from 0 to max_time. The **kwargs are keyword parameters that are passed to the function to update it.

__hash__() int[source]#

Return a hash of the distribution.

This is computed from the stored frozen distribution and – if is_updateable() returns True – the stored keyword arguments of the parametric distribution.

property max_time: int#

Return the maximum time for the distribution.

static normalize(distribution: ndarray) ndarray[source]#

Normalize a distribution.

property pmf: ndarray#

Return the probability mass function of the distribution if it is frozen.

property is_updateable: bool#

True if instance can be updated via set_param().

get_params(as_dict: bool = True, **_kwargs) Iterable[float] | dict[str, float][source]#

If updateable, return the dist’s param value or all params in a dict.

See also

lymph.diagnosis_times.DistributionsUserDict.get_params() lymph.graph.Edge.get_params() lymph.models.Unilateral.get_params() lymph.models.Bilateral.get_params()

set_params(*args: float, **kwargs: float) tuple[float][source]#

Update distribution by setting its parameters and storing the frozen PMF.

Parameters can be set via positional arguments - which are used up one by one in the order they are provided and are then returned - or keyword arguments. Keyword arguments override positional arguments. If the distribution is not updateable, a warning is issued and all args and kwargs are returned.

If any of the parameters is invalid, a ValueError is raised and the original parameters are restored.

draw_diag_times(num: int | None = None, rng: Generator | None = None, seed: int = 42) ndarray[source]#

Draw num samples of diagnosis times from the stored PMF.

A random number generator can be provided as rng. If None, a new one is initialized with the given seed (or 42, by default).

class lymph.diagnosis_times.Composite(max_time: int | None = None, distribution_children: dict[str, Composite] | None = None, is_distribution_leaf: bool = False)[source]#

Bases: ABC

Abstract base class implementing the composite pattern for distributions.

Any class inheriting from this class should be able to handle the definition of distributions over diagnosis times.

>>> class MyComposite(Composite):
...     pass
>>> leaf1 = MyComposite(is_distribution_leaf=True, max_time=1)
>>> leaf2 = MyComposite(is_distribution_leaf=True, max_time=1)
>>> leaf3 = MyComposite(is_distribution_leaf=True, max_time=1)
>>> branch1 = MyComposite(distribution_children={"L1": leaf1, "L2": leaf2})
>>> branch2 = MyComposite(distribution_children={"L3": leaf3})
>>> root = MyComposite(distribution_children={"B1": branch1, "B2": branch2})
>>> root.set_distribution("T1", Distribution([0.1, 0.9]))
>>> root.get_distribution("T1")
Distribution([0.1, 0.9])
>>> leaf1.get_distribution("T1")
Distribution([0.1, 0.9])
__init__(max_time: int | None = None, distribution_children: dict[str, Composite] | None = None, is_distribution_leaf: bool = False) None[source]#

Initialize the distribution composite.

property max_time: int#

Return the maximum time for the distributions.

property t_stages: list[str]#

Return the T-stages for which distributions are defined.

get_distribution(t_stage: str) Distribution[source]#

Return the distribution for the given t_stage.

get_all_distributions() dict[str, Distribution][source]#

Return all distributions.

This will issue a warning if it finds that not all distributions of the composite are equal. Note that it will always return the distributions of the first child. This means one should NOT try to set the distributions via the returned dictionary of this method. Instead, use the set_distribution() method.

set_distribution(t_stage: str, distribution: Distribution | Iterable[float] | callable) None[source]#

Set/update the distribution for the given t_stage.

del_distribution(t_stage: str) None[source]#

Delete the distribution for the given t_stage.

replace_all_distributions(distributions: dict[str, Distribution]) None[source]#

Replace all distributions with the given ones.

clear_distributions() None[source]#

Remove all distributions.

distributions_hash() int[source]#

Return a hash of all distributions.

get_distribution_params(as_dict: bool = True, as_flat: bool = True) Iterable[float] | dict[str, float][source]#

Return the parameters of all distributions.

set_distribution_params(*args: float, **kwargs: float) tuple[float][source]#

Set the parameters of all distributions.

Matrices#

Methods & classes to manage matrices of the Unilateral class.

lymph.matrix.generate_transition(lnls: Iterable[LymphNodeLevel], num_states: int) ndarray[source]#

Compute the transition matrix of the lymph model.

lymph.matrix.generate_observation(modalities: Iterable[Modality], num_lnls: int, base: int = 2) ndarray[source]#

Generate the observation matrix of the lymph model.

lymph.matrix.compute_encoding(lnls: list[str], pattern: Series | dict[str, Literal[False, 0, 'healthy', True, 1, 'involved', 'micro', 'macro', 'notmacro']], base: int = 2) ndarray[source]#

Compute the encoding of a particular pattern of involvement.

A pattern holds information about the involvement of each LNL and the function transforms this into a binary encoding which is True for all possible complete states/diagnosis that are compatible with the given pattern.

In the binary case (base=2), the value behind pattern[lnl] can be one of the following things:

  • False: The LNL is healthy.

  • "healthy": The LNL is healthy.

  • True: The LNL is involved.

  • "involved": The LNL is involved.

  • pd.isna(pattern[lnl]) == True: The involvement of the LNL is unknown.

In the trinary case (base=3), the value behind pattern[lnl] can be one of these things:

  • False: The LNL is healthy.

  • "healthy": The LNL is healthy.

  • True: The LNL is involved (micro- or macroscopic).

  • "involved": The LNL is involved (micro- or macroscopic).

  • "micro": The LNL is involved microscopically only.

  • "macro": The LNL is involved macroscopically only.

  • "notmacro": The LNL is healthy or involved microscopically.

Missing values are treated as unknown involvement.

>>> compute_encoding(["II", "III"], {"II": True, "III": False})
array([False, False,  True, False])
>>> compute_encoding(["II", "III"], {"II": "involved"})
array([False, False,  True,  True])
>>> compute_encoding(
...     lnls=["II", "III"],
...     pattern={"II": True, "III": False},
...     base=3,
... )
array([False, False, False,  True, False, False,  True, False, False])
>>> compute_encoding(
...     lnls=["II", "III"],
...     pattern={"II": "micro", "III": "notmacro"},
...     base=3,
... )
array([False, False, False,  True,  True, False, False, False, False])
lymph.matrix.generate_data_encoding(patient_data: DataFrame, modalities: dict[str, Modality], lnls: list[str]) ndarray[source]#

Generate the data matrix for a specific T-stage from patient data.

The models.Unilateral.patient_data needs to contain the column "_model", which is constructed when loading the data into the model. From this, a data matrix is constructed for all present diagnostic modalities.

The returned matrix has the shape \(2^{N \cdot \mathcal{O}} \times M\), where \(N\) is the number of lymph node levels, \(\mathcal{O}\) is the number of diagnostic modalities and \(M\) is the number of patients.

lymph.matrix.evolve_midext(max_time: int, midext_prob: int) ndarray[source]#

Compute the evolution over the state of a tumor’s midline extension.

lymph.matrix.fast_trace(left: ndarray, right: ndarray) ndarray[source]#

Compute the trace of a product of two matrices (left and right).

This is based on the observation that the trace of a product of two matrices is equal to the sum of the element-wise products of the two matrices. See Wikipedia and StackOverflow for more information.