Detailed API
The human lymph system (or rather parts of it) are modelled as directed graphs here. Hence, a System
consists of multiple Node
and Edge
instances, which are represented by a python class each.
Lymph system
- class lymph.System(graph={})
Class that describes a whole lymphatic system with its lymph node levels (LNLs) and the connections between them.
- Parameters
graph (
dict
) – For every key in the dictionary, thesystem
will create anode
that represents a binary random variable. The values in the dictionary should then be the a list of names to whichedges
from the current key should be created.
- combined_likelihood(theta, t_stage=['early', 'late'], time_prior_dict={}, T_max=10)
Likelihood for learning both the system’s parameters and the center of a Binomially shaped time prior.
- Parameters
theta (
ndarray
) – Set of parameters, consisting of the base probabilities \(b\) (as many as the system has nodes), the transition probabilities \(t\) (as many as the system has edges) and - in this particular case - the binomial parameters for all but the first T-stage’s time prior.t_stage (
List
[str
]) – keywords of T-stages that are present in the dictionary of C matrices. (default:["early", "late"]
)time_prior_dict (
dict
) – Dictionary with keys of T-stages int_stage
and values of time priors for each of those T-stages.T_max (
int
) – maximum number of time steps.
- Return type
float
- Returns
The combined likelihood of observing patients with different T-stages, given the spread probabilities as well as the parameters for the later (except the first) T-stage’s binomial time prior.
- find_edge(startname, endname)
Finds and returns the edge instance which has a parent node named
startname
and ends with nodeendname
.- Return type
Optional
[Edge
]
- get_graph()
Lists the graph as it was provided when the system was created
- Return type
dict
- get_theta()
Returns the parameters currently set. It will return the transition probabilities in the order they appear in the graph dictionary. This deviates somewhat from the notation in the paper, where base and transition probabilities are distinguished as probabilities along edges from primary tumour to LNL and from LNL to LNL respectively.
- Return type
List
[float
]
- likelihood(theta, t_stage=[1, 2, 3, 4], time_prior_dict={}, mode='HMM')
Computes the likelihood of a set of parameters, given the already stored data(set).
- Parameters
theta (
ndarray
) – Set of parameters, consisting of the base probabilities \(b\) (as many as the system has nodes) and the transition probabilities \(t\) (as many as the system has edges).t_stage (
List
[int
]) – List of T-stages that should be included in the learning process. (default:[1,2,3,4]
)time_prior_dict (
dict
) – Dictionary with keys of T-stages int_stage
and values of time priors for each of those T-stages.mode (
str
) –"HMM"
for hidden Markov model and"BN"
for Bayesian network. (default:"HMM"
)
- Return type
float
- Returns
The log-likelihood of a parameter sample.
- list_edges()
Lists all edges of the system with its corresponding start and end states
- Return type
List
[Edge
]
- load_data(data, t_stage=[1, 2, 3, 4], spsn_dict={'path': [1.0, 1.0]}, mode='HMM')
Generates the matrix C that marginalizes over multiple states for data with incomplete observation, as well as how often these obser- vations occur in the dataset. In the end the computation \(\mathbf{p} = \boldsymbol{\pi} \cdot \mathbf{A}^t \cdot \mathbf{B} \cdot \mathbf{C}\) results in an array of probabilities that can - together with the frequencies \(f\) - be used to compute the likelihood. This also works for the Bayesian network case: \(\mathbf{p} = \mathbf{a} \cdot \mathbf{C}\) where \(a\) is an array containing the probability for each state.
- Parameters
data (
DataFrame
) – Contains rows of patient data. The columns must include the T-stage and at least one diagnostic modality.t_stage (
List
[int
]) – List of T-stages that should be included in the learning process. (default:[1,2,3,4]
)spsn_dict (
Dict
[str
,List
[float
]]) – Dictionary of specificity \(s_P\) and \(s_N\) (in that order) for each observational/diagnostic modality. (default:{"path": [1., 1.]}
)mode (
str
) –"HMM"
for hidden Markov model and"BN"
for Bayesian network. (default:"HMM"
)
- obs_prob(diagnoses_dict, log=False)
Computes the probability to see certain diagnoses, given the system’s current state.
- Parameters
diagnoses_dict (
Dict
[str
,List
[int
]]) – Dictionary of diagnoses (one for each diagnostic modality). A diagnose must be an array of integers that is as long as the the system has LNLs.log (
bool
) – IfTrue
, the log probability is computed. (default:False
)
- Return type
float
- Returns
The probability to see the given diagnoses.
- print_graph()
Print info about the structure and parameters of the graph.
- risk(inv, obs, time_prior=[], mode='HMM')
Computes the risk for involvement (or no involvement), given some observations and a time distribution for the Markov model (and the Bayesian network).
- Parameters
inv (
ndarray
) – Pattern of involvement that we want to compute the risk for. Values can take on the values0
(negative),1
(positive) andNone
of we don’t care if this is involved or not.obs (
Dict
[str
,ndarray
]) – Holds a diagnose of similar kind asinv
for each diagnostic modality. An incomplete diagnose can be filled withNone
.time_prior (
List
[float
]) – Discrete distribution over the time steps. Must hence sum to 1.mode (
str
) –"HMM"
for hidden Markov model and"BN"
for Bayesian network. (default:"HMM"
)
- Return type
float
- Returns
The risk for the involvement of interest, given an observation.
- set_modalities(spsn_dict={'path': [1.0, 1.0]})
Given some 2x2 matrices for each diagnostic modality based on their specificity and sensitivity, compute observation matrix \(\mathbf{B}\) and store the details of the diagnostic modalities.
- set_state(newstate)
Sets the state of the system to
newstate
.
- set_theta(theta, mode='HMM')
Fills the system with new base and transition probabilities and also computes the transition matrix A again, if one is in mode “HMM”.
- Parameters
theta (
ndarray
) – The new parameters that should be fed into the system. They all represent the transition probabilities along the edges of the network and will be set in the order they appear in the graph dictionary. As mentioned in theget_theta()
function, this includes the spread probabilities from the primary tumour to the LNLs, as well as the spread among the LNLs.mode (
str
) – If one is in “BN” mode (Bayesian network), then it is not necessary to compute the transition matrix A again, so it is skipped. (default:"HMM"
)
- trans_prob(newstate, log=False, acquire=False)
Computes the probability to transition to newstate, given its current state.
- Parameters
newstate (
List
[int
]) – List of new states for each LNL in the lymphatic system. The transition probability \(t\) will be computed from the current states to these states.log (
bool
) – ifTrue
, the log-probability is computed. (default:False
)acquire (
bool
) – ifTrue
, after computing and returning the probability, the system updates its own state to benewstate
. (default:False
)
- Return type
float
- Returns
Transition probability \(t\).
- unparametrized_epoch(t_stage=[1, 2, 3, 4], time_prior_dict={}, T=1.0, scale=0.01)
An attempt at unparametrized sampling, where the algorithm samples A from the full solution space of row-stochastic matrices with zeros where transitions are forbidden.
- Parameters
t_stage (
List
[int
]) – List of T-stages that should be included in the learning process. (default:[1,2,3,4]
)time_prior_dict (
dict
) – Dictionary with keys of T-stages in t_stage and values of time priors for each of those T-stages.T (
float
) – Temperature of the epoch. Can be reduced from a starting value down to almost zero for an annealing approach to sampling. (default:1.
)
- Return type
float
- Returns
The log-likelihood of the epoch.
Edge
Represents a lymphatic drainage pathway and therefore are spread probability.
Node
Represents a lymph node level (LNL) or rather a random variable associated with it. It encodes the microscopic involvement of the LNL and - if involved - might spread along outgoing edges.
- class lymph.Node(name, state=0, typ=None)
Class for lymph node levels (LNLs) in a lymphatic system.
- Parameters
name (
str
) – Name of the nodestate (
int
) – Current state this LNL is in. Can be in {0, 1}typ (
Optional
[str
]) – Can be either “lnl”, “tumor” or None. If it is the latter, the type will be inferred from the name of the node. A node starting with a t (case-insensitive), then it will be a tumor node and a lymph node levle (lnl) otherwise. (default: None)
- bn_prob(log=False)
Computes the conditional probability of a node being in the state it is in, given its parents are in the states they are in.
- Parameters
log (
bool
) – IfTrue
, returns the log-probability. (default:False
)- Return type
float
- Returns
The conditional (log-)probability.
- obs_prob(obs, obstable=array([[1.0, 0.0], [0.0, 1.0]]), log=False)
Compute the probability of observing a certain diagnose, given its current state.
- Parameters
obs (
int
) – Diagnose/observation for the node.obstable (
ndarray
) – 2x2 matrix containing info about sensitivity and specificty of the observational/diagnostic modality from which obs was obtained.log (
bool
) – IfTrue
, method returns the log-prob.
- Return type
float
- Returns
The probability of observing the given diagnose.
- report()
Just quickly print infos about the node.
- trans_prob(log=False)
Computes the transition probabilities from the current state to all other possible states.
- Parameters
log (
bool
) – IfTrue
method returns the log-probability. (default:False
)- Return type
float
- Returns
- The transition probabilities from current state to all two other
states.