Introduction

KinActive is a package to explore the structural kinome. It allows to prepare/load/explore the lXt-PK data collection and use the classifier models predicting DFG and active/inactive states of a PK conformation from structural features.

Installation

It is recommended to first create a conda virtual environment:

conda create -n kinactive
conda activate kinactive

See the conda docs for further details.

The package is installable via pip. From Pypi:

pip install kinactive

Or directly from GitHub:

pip install git+https://github.com/edikedik/kinactive.git

Depending on the data collection usage needs, you may need to install mafft, either directly (see the mafft docs) or using conda:

conda install -c bioconda mafft

Using the data

KinActive is not supplied with the raw data. One may fetch the data accompanying the paper (see Fetching the data) or build a new raw collection (see Build a new lXt-PK collection).

Once the data is obtained, you can load the chains as:

from pathlib import Path
from kinactive.db import DB
db = DB()
chains = db.load(Path("path/to/chains"))

Hint

To speed up the loading, one may want to increase the number of cpus (see kinactive.config.DBConfig).

Hint

One may use an iterable over dumped Chain objects, e.g., list(Path("path/to/chains").glob('*'))[:10] and supply it to kinactive.db.DB.load().

This will result in a ChainList of Chain objects, each containing a canonical UniProt ChainSequence and a list of associated ``ChainStructure``s. See the lXtractor docs for more details on what these objects are and how to use them.

Calculating the variables

Once the chains are loaded, one can use them to calculate new variables.

To calculate the default variables for loaded chains.

from kinactive.features import DefaultFeatures
fs = DefaultFeatures()
# Get domains mapped to profile positions.
domains = chains.collapse_children()
res = fs.calculate_all_vs(domains)

Hint

Provide base="path/to/dir" to automatically save the default variables

Hint

Speed-up the calculation by using multiple CPUs to calculate structural variables via the num_proc parameters.

Note

See Calculate default variables for an example of variables’ calculation.

Calculating non-default variables is a bit more involved and is covered in the lXtractor docs.

Using the models

To load the models, use:

from kinactive.io import load_dfg, load_kinactive
ka = load_kinactive()
dfg = load_dfg()

The first line will load the kinactive.model.KinActiveClassifier model. This class provides a general-purpose interface, wrapping the actual model under the kinactive.model.KinActiveClassifier.model attribute. It allows to access the features and parameters, train, use the model for predictions and so on.

The second line will load the kinactive.model.DFGClassifier model. It comprises three kinactive.model.KinActiveClassifier objects and a logistic regression meta-classifier outputting final predictions.

Both models can be used in the same manner. They require a dataset with kinactive.model.KinActiveClassifier.features() and kinactive.model.KinActiveClassifier.targets() columns to predict. Assuming the df variable to encapsulate such a dataset (as a pandas DataFrame).

ka_labels = ka.predict(df)
dfg_labels = dfg.predict(df)

Hint

kinactive.model.DFGclassifier.predict_full() and kinactive.model.KinActiveClassifier.predict_full() will preserve individual predictors’ outputs and add columns to an initial pandas DataFrame).

Building the distance matrix

The “distance matrix” is a symmetric pairwise distance matrix constructed from the extracted domain structures. The distance is the RMSD between the DFG-Asp/ DFG-Phe of a pair of superposed domain structures. The protocol will handle superpositions and RMSD calculations and output a new “long form” distance matrix with four columns: [ID1, ID2, RMSD_CA, RMSD_DFG].

Assuming the chains were loaded as described in Using the data, i.e., at the level of initial Chain, we’ll access the structure domains and supply them into kinactive.distances.DistanceMatrix.build().

from kinactive.distances import DistanceMatrix
domains = chains.collapse_children().structures
dm = DistanceMatrix().build(domains)

Hint

Similar to kinactive.db.DB, there is a config dataclass allowing to customize the calculation process. See kinactive.config.MatrixConfig.

What’s next?

If you are interested in making a similar data collection or annotating your PK domains, check out the tutorial.