kinactive.distances module

Distance matrix computation and io.

class kinactive.distances.DistanceMatrix(df: DataFrame, pos_sup: list[int])[source]

Bases: object

A symmetric distance matrix encapsulating pairwise RMSD of superposed structures.

__init__(df: DataFrame, pos_sup: list[int])[source]
classmethod build(structures: Iterable[ChainStructure], cfg: MatrixConfig = MatrixConfig(dir=PosixPath('clustering'), n_super_pos=30, pk_map_name='PK', n_proc=None, chunksize=5000, df_pos=(141, 142), bb_atoms=('CA',), phe_atoms=('CA', 'CB', 'CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'), asp_atoms=('CA', 'CB', 'CG', 'OD1', 'OD2'))) DistanceMatrix[source]

Build a new distance matrix from provided structures.

The method will obtain a list of positions most covered by the reference. It will use these in a superposition protocol defined in lXtractor.

Parameters:
  • structures – A list of chain structures mapped to a single reference.

  • cfg – A configuration file. The options are explained within (kinactive.config.MatrixConfig).

Returns:

The constructed distance matrix.

closest_to(id_: str, n: int, col: str = 'RMSD_CA') Generator[str, None, None][source]

Find n structures closest to some structure.

Parameters:
  • id – An ID of a structure.

  • n – How many closest structures to find.

  • col

Returns:

A generator of closest IDs.

fetch()[source]
classmethod load(base_path: Path)[source]

Load the distance matrix data and initialize a new instance.

Parameters:

base_path – base dir.

Returns:

A new instance.

matrix_ids(n: int) Generator[str, None, None][source]
Parameters:

n – The number of observations used for constructing the matrix.

Returns:

An iterator over IDs of original observations.

save(base_path: Path = PosixPath('clustering')) None[source]

Save the distance matrix data – a dataframe and a list of super-positions. The file names are hardcoded.

Parameters:

base_path – base dir.

superpose(structures: Iterable[ChainStructure], choose_ref_by: str = 'RMSD_CA', key: str = 'min', **kwargs) list[SuperposeOutput][source]

Superpose a group of structures to a single reference structure. The latter is a structure having minimum average distance to other structures in the list. Consequently, the method assumes that the distance matrix encompasses the provided structures and will warn a user if it’s not the case.

Parameters:
  • structures – An iterable with structures to superpose.

  • choose_ref_by – A column name in a distance matrix to choose the reference structure by.

  • key – A selector of averaged distances to choose the reference by; either “min” or “max”.

  • kwargs – passed to superpose_pairwise() protocol.

Returns:

It will return the original superpose_pairwise() output and transform the coordinates of the provided structures according to this output (inplace).

df: DataFrame

A table with three columns: (1-2) object IDs, (3) RMSD of super positions, and (4) RMSD of target positions. It is assumed to be sorted by object IDs and contain combinations itertools.combinations(ids, 2) would output.

pos_sup: list[int]

A list of positions used for superposing pairs of structures. A position is “covered” if (1) it was successfully mapped to a reference, and (2) it has a “CA” atom.

kinactive.distances.ca_pos_per_str(strs: Iterable[ChainStructure], ref_name: str) dict[int, list[str]][source]

Get a mapping from HMM positions to a list of structure they covered.

Parameters:
  • strs – An iterable over chain structures.

  • ref_name – Reference object name structure sequences were mapped to.

Returns:

A dictionary Pos => [IDS].

kinactive.distances.covered_pos(s: ChainStructure, ref_name: str) Iterator[int][source]

Get a list of covered positions. A position is “covered” if (1) it was successfully mapped to a reference, and (2) it has a “CA” atom.

Parameters:
  • s – A chain structure.

  • ref_name – Reference object name structure sequences were mapped to.

Returns:

An iterator over covered positions.

kinactive.distances.super_pos(strs: Iterable[ChainStructure], n: int, ref_name: str) tuple[dict[int, list[str]], list[int]][source]

Get coverage data of reference positions and find the most covered positions.

Parameters:
  • strs – An iterable over chain structures.

  • n – The number of positions to get.

  • ref_name – Reference object name structure sequences were mapped to.

Returns:

A tuple with mappings Pos => [IDS] and a list of n most covered positions.