Detailed API

The human lymph system (or rather parts of it) are modelled as directed graphs here. Hence, a System consists of multiple Node and Edge instances, which are represented by a python class each.

Lymph system

class lymph.System(graph={})

Class that describes a whole lymphatic system with its lymph node levels (LNLs) and the connections between them.

Parameters

graph (dict) – For every key in the dictionary, the system will create a node that represents a binary random variable. The values in the dictionary should then be the a list of names to which edges from the current key should be created.

combined_likelihood(theta, t_stage=['early', 'late'], time_prior_dict={}, T_max=10)

Likelihood for learning both the system’s parameters and the center of a Binomially shaped time prior.

Parameters
  • theta (ndarray) – Set of parameters, consisting of the base probabilities \(b\) (as many as the system has nodes), the transition probabilities \(t\) (as many as the system has edges) and - in this particular case - the binomial parameters for all but the first T-stage’s time prior.

  • t_stage (List[str]) – keywords of T-stages that are present in the dictionary of C matrices. (default: ["early", "late"])

  • time_prior_dict (dict) – Dictionary with keys of T-stages in t_stage and values of time priors for each of those T-stages.

  • T_max (int) – maximum number of time steps.

Return type

float

Returns

The combined likelihood of observing patients with different T-stages, given the spread probabilities as well as the parameters for the later (except the first) T-stage’s binomial time prior.

find_edge(startname, endname)

Finds and returns the edge instance which has a parent node named startname and ends with node endname.

Return type

Optional[Edge]

find_node(name)

Finds and returns a node with name name.

Return type

Optional[Node]

get_graph()

Lists the graph as it was provided when the system was created

Return type

dict

get_theta()

Returns the parameters currently set. It will return the transition probabilities in the order they appear in the graph dictionary. This deviates somewhat from the notation in the paper, where base and transition probabilities are distinguished as probabilities along edges from primary tumour to LNL and from LNL to LNL respectively.

Return type

List[float]

likelihood(theta, t_stage=[1, 2, 3, 4], time_prior_dict={}, mode='HMM')

Computes the likelihood of a set of parameters, given the already stored data(set).

Parameters
  • theta (ndarray) – Set of parameters, consisting of the base probabilities \(b\) (as many as the system has nodes) and the transition probabilities \(t\) (as many as the system has edges).

  • t_stage (List[int]) – List of T-stages that should be included in the learning process. (default: [1,2,3,4])

  • time_prior_dict (dict) – Dictionary with keys of T-stages in t_stage and values of time priors for each of those T-stages.

  • mode (str) – "HMM" for hidden Markov model and "BN" for Bayesian network. (default: "HMM")

Return type

float

Returns

The log-likelihood of a parameter sample.

list_edges()

Lists all edges of the system with its corresponding start and end states

Return type

List[Edge]

load_data(data, t_stage=[1, 2, 3, 4], spsn_dict={'path': [1.0, 1.0]}, mode='HMM')

Generates the matrix C that marginalizes over multiple states for data with incomplete observation, as well as how often these obser- vations occur in the dataset. In the end the computation \(\mathbf{p} = \boldsymbol{\pi} \cdot \mathbf{A}^t \cdot \mathbf{B} \cdot \mathbf{C}\) results in an array of probabilities that can - together with the frequencies \(f\) - be used to compute the likelihood. This also works for the Bayesian network case: \(\mathbf{p} = \mathbf{a} \cdot \mathbf{C}\) where \(a\) is an array containing the probability for each state.

Parameters
  • data (DataFrame) – Contains rows of patient data. The columns must include the T-stage and at least one diagnostic modality.

  • t_stage (List[int]) – List of T-stages that should be included in the learning process. (default: [1,2,3,4])

  • spsn_dict (Dict[str, List[float]]) – Dictionary of specificity \(s_P\) and \(s_N\) (in that order) for each observational/diagnostic modality. (default: {"path": [1., 1.]})

  • mode (str) – "HMM" for hidden Markov model and "BN" for Bayesian network. (default: "HMM")

obs_prob(diagnoses_dict, log=False)

Computes the probability to see certain diagnoses, given the system’s current state.

Parameters
  • diagnoses_dict (Dict[str, List[int]]) – Dictionary of diagnoses (one for each diagnostic modality). A diagnose must be an array of integers that is as long as the the system has LNLs.

  • log (bool) – If True, the log probability is computed. (default: False)

Return type

float

Returns

The probability to see the given diagnoses.

print_graph()

Print info about the structure and parameters of the graph.

risk(inv, obs, time_prior=[], mode='HMM')

Computes the risk for involvement (or no involvement), given some observations and a time distribution for the Markov model (and the Bayesian network).

Parameters
  • inv (ndarray) – Pattern of involvement that we want to compute the risk for. Values can take on the values 0 (negative), 1 (positive) and None of we don’t care if this is involved or not.

  • obs (Dict[str, ndarray]) – Holds a diagnose of similar kind as inv for each diagnostic modality. An incomplete diagnose can be filled with None.

  • time_prior (List[float]) – Discrete distribution over the time steps. Must hence sum to 1.

  • mode (str) – "HMM" for hidden Markov model and "BN" for Bayesian network. (default: "HMM")

Return type

float

Returns

The risk for the involvement of interest, given an observation.

set_modalities(spsn_dict={'path': [1.0, 1.0]})

Given some 2x2 matrices for each diagnostic modality based on their specificity and sensitivity, compute observation matrix \(\mathbf{B}\) and store the details of the diagnostic modalities.

set_state(newstate)

Sets the state of the system to newstate.

set_theta(theta, mode='HMM')

Fills the system with new base and transition probabilities and also computes the transition matrix A again, if one is in mode “HMM”.

Parameters
  • theta (ndarray) – The new parameters that should be fed into the system. They all represent the transition probabilities along the edges of the network and will be set in the order they appear in the graph dictionary. As mentioned in the get_theta() function, this includes the spread probabilities from the primary tumour to the LNLs, as well as the spread among the LNLs.

  • mode (str) – If one is in “BN” mode (Bayesian network), then it is not necessary to compute the transition matrix A again, so it is skipped. (default: "HMM")

trans_prob(newstate, log=False, acquire=False)

Computes the probability to transition to newstate, given its current state.

Parameters
  • newstate (List[int]) – List of new states for each LNL in the lymphatic system. The transition probability \(t\) will be computed from the current states to these states.

  • log (bool) – if True, the log-probability is computed. (default: False)

  • acquire (bool) – if True, after computing and returning the probability, the system updates its own state to be newstate. (default: False)

Return type

float

Returns

Transition probability \(t\).

unparametrized_epoch(t_stage=[1, 2, 3, 4], time_prior_dict={}, T=1.0, scale=0.01)

An attempt at unparametrized sampling, where the algorithm samples A from the full solution space of row-stochastic matrices with zeros where transitions are forbidden.

Parameters
  • t_stage (List[int]) – List of T-stages that should be included in the learning process. (default: [1,2,3,4])

  • time_prior_dict (dict) – Dictionary with keys of T-stages in t_stage and values of time priors for each of those T-stages.

  • T (float) – Temperature of the epoch. Can be reduced from a starting value down to almost zero for an annealing approach to sampling. (default: 1.)

Return type

float

Returns

The log-likelihood of the epoch.

Edge

Represents a lymphatic drainage pathway and therefore are spread probability.

class lymph.Edge(start, end, t=0.0)

Class for the connections between lymph node levels (LNLs) represented by the Node class.

Parameters
  • start (Node) – Parent node

  • end (Node) – Child node

  • t (float) – Transition probability in case start-Node has state 1 (microscopic involvement).

report()

Just quickly prints infos about the edge

Node

Represents a lymph node level (LNL) or rather a random variable associated with it. It encodes the microscopic involvement of the LNL and - if involved - might spread along outgoing edges.

class lymph.Node(name, state=0, typ=None)

Class for lymph node levels (LNLs) in a lymphatic system.

Parameters
  • name (str) – Name of the node

  • state (int) – Current state this LNL is in. Can be in {0, 1}

  • typ (Optional[str]) – Can be either “lnl”, “tumor” or None. If it is the latter, the type will be inferred from the name of the node. A node starting with a t (case-insensitive), then it will be a tumor node and a lymph node levle (lnl) otherwise. (default: None)

bn_prob(log=False)

Computes the conditional probability of a node being in the state it is in, given its parents are in the states they are in.

Parameters

log (bool) – If True, returns the log-probability. (default: False)

Return type

float

Returns

The conditional (log-)probability.

obs_prob(obs, obstable=array([[1.0, 0.0], [0.0, 1.0]]), log=False)

Compute the probability of observing a certain diagnose, given its current state.

Parameters
  • obs (int) – Diagnose/observation for the node.

  • obstable (ndarray) – 2x2 matrix containing info about sensitivity and specificty of the observational/diagnostic modality from which obs was obtained.

  • log (bool) – If True, method returns the log-prob.

Return type

float

Returns

The probability of observing the given diagnose.

report()

Just quickly print infos about the node.

trans_prob(log=False)

Computes the transition probabilities from the current state to all other possible states.

Parameters

log (bool) – If True method returns the log-probability. (default: False)

Return type

float

Returns

The transition probabilities from current state to all two other

states.