kinactive.distances module
Distance matrix computation and io.
- class kinactive.distances.DistanceMatrix(df: DataFrame, pos_sup: list[int])[source]
Bases:
objectA symmetric distance matrix encapsulating pairwise RMSD of superposed structures.
- classmethod build(structures: Iterable[ChainStructure], cfg: MatrixConfig = MatrixConfig(dir=PosixPath('clustering'), n_super_pos=30, pk_map_name='PK', n_proc=None, chunksize=5000, df_pos=(141, 142), bb_atoms=('CA',), phe_atoms=('CA', 'CB', 'CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'), asp_atoms=('CA', 'CB', 'CG', 'OD1', 'OD2'))) DistanceMatrix[source]
Build a new distance matrix from provided structures.
The method will obtain a list of positions most covered by the reference. It will use these in a superposition protocol defined in
lXtractor.- Parameters:
structures – A list of chain structures mapped to a single reference.
cfg – A configuration file. The options are explained within (
kinactive.config.MatrixConfig).
- Returns:
The constructed distance matrix.
- closest_to(id_: str, n: int, col: str = 'RMSD_CA') Generator[str, None, None][source]
Find
nstructures closest to some structure.- Parameters:
id – An ID of a structure.
n – How many closest structures to find.
col –
- Returns:
A generator of closest IDs.
- classmethod load(base_path: Path)[source]
Load the distance matrix data and initialize a new instance.
- Parameters:
base_path – base dir.
- Returns:
A new instance.
- matrix_ids(n: int) Generator[str, None, None][source]
- Parameters:
n – The number of observations used for constructing the matrix.
- Returns:
An iterator over IDs of original observations.
- save(base_path: Path = PosixPath('clustering')) None[source]
Save the distance matrix data – a dataframe and a list of super-positions. The file names are hardcoded.
- Parameters:
base_path – base dir.
- superpose(structures: Iterable[ChainStructure], choose_ref_by: str = 'RMSD_CA', key: str = 'min', **kwargs) list[SuperposeOutput][source]
Superpose a group of structures to a single reference structure. The latter is a structure having minimum average distance to other structures in the list. Consequently, the method assumes that the distance matrix encompasses the provided structures and will warn a user if it’s not the case.
- Parameters:
structures – An iterable with structures to superpose.
choose_ref_by – A column name in a distance matrix to choose the reference structure by.
key – A selector of averaged distances to choose the reference by; either “min” or “max”.
kwargs – passed to
superpose_pairwise()protocol.
- Returns:
It will return the original
superpose_pairwise()output and transform the coordinates of the provided structures according to this output (inplace).
- df: DataFrame
A table with three columns: (1-2) object IDs, (3) RMSD of super positions, and (4) RMSD of target positions. It is assumed to be sorted by object IDs and contain combinations
itertools.combinations(ids, 2)would output.
- pos_sup: list[int]
A list of positions used for superposing pairs of structures. A position is “covered” if (1) it was successfully mapped to a reference, and (2) it has a “CA” atom.
- kinactive.distances.ca_pos_per_str(strs: Iterable[ChainStructure], ref_name: str) dict[int, list[str]][source]
Get a mapping from HMM positions to a list of structure they covered.
- Parameters:
strs – An iterable over chain structures.
ref_name – Reference object name structure sequences were mapped to.
- Returns:
A dictionary
Pos => [IDS].
- kinactive.distances.covered_pos(s: ChainStructure, ref_name: str) Iterator[int][source]
Get a list of covered positions. A position is “covered” if (1) it was successfully mapped to a reference, and (2) it has a “CA” atom.
- Parameters:
s – A chain structure.
ref_name – Reference object name structure sequences were mapped to.
- Returns:
An iterator over covered positions.
- kinactive.distances.super_pos(strs: Iterable[ChainStructure], n: int, ref_name: str) tuple[dict[int, list[str]], list[int]][source]
Get coverage data of reference positions and find the most covered positions.
- Parameters:
strs – An iterable over chain structures.
n – The number of positions to get.
ref_name – Reference object name structure sequences were mapped to.
- Returns:
A tuple with mappings
Pos => [IDS]and a list ofnmost covered positions.