Calculating default variables
In this notebook, we’ll calculate descriptor (variable) sets 1-4: 1. Canonical (UniProt) sequence variables. 2. Structure (PDB) sequence variables. 3. Structure variables (angles, distances, etc.) 4. Ligand variables.
[2]:
import logging
import warnings
from random import sample
from pathlib import Path
# Supress import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
from kinactive import DefaultFeatures, DB, DBConfig
Provide general configuration.
[3]:
logging.basicConfig(level=logging.INFO)
[4]:
N_PROC = 20
N_CHAINS = 20 # Restrict the number of chains for demonstration
BASE = Path('../data/variable_sets')
BASE.mkdir(exist_ok=True)
DB_PATH = Path('../data/db_v3')
[5]:
paths = list(DB_PATH.glob('*'))
if N_CHAINS is not None:
# Sample random chains to calculate the variables on.
paths = sample(paths, N_CHAINS)
[6]:
db = DB(DBConfig(io_cpus=N_PROC))
chains = db.load(paths)
INFO:kinactive.db:Got 20 initial paths to read
INFO:kinactive.db:Parsed 20 `Chain`s
[7]:
vs = DefaultFeatures()
?vs.calculate_all_vs
Signature:
vs.calculate_all_vs(
chains: collections.abc.Sequence[lXtractor.core.chain.chain.Chain],
map_name: str = 'PK',
num_proc: int | None = None,
verbose: bool = True,
base: pathlib.Path | None = None,
overwrite: bool = False,
) -> kinactive.features.Results
Docstring:
Calculate default variables. These include four sets::
#. A default set of sequence variables for canonical sequences.
#. A default set of sequence variables for structure sequences.
#. A default set of structure variables.
#. A default set of ligand variables.
:param chains: A sequence of chains.
:param map_name: A reference name.
:param num_proc: The number of CPUs to use.
:param verbose: Display progress bar.
:param base: Base path to save the results to. If not provided, the
results are returned but not saved.
:param overwrite: Overwrite existing files. If False, will skip the
calculation of existing variables.
:return: A named tuple with calculated variables' tables.
File: ~/Projects/kinactive/kinactive/features.py
Type: method
[8]:
vs_res = vs.calculate_all_vs(
chains.collapse_children(), num_proc=N_PROC, base=BASE, overwrite=True
)
INFO:kinactive.features:Calculating sequence variables on canonical seqs
INFO:kinactive.features:Resulting shape: (20, 799)
INFO:kinactive.features:Saved defaults_can_seq_vs.csv to ../data/variable_sets
INFO:kinactive.features:Calculating sequence variables on structure seqs
INFO:kinactive.features:Resulting shape: (186, 799)
INFO:kinactive.features:Saved defaults_str_seq_vs.csv to ../data/variable_sets
INFO:kinactive.features:Calculating ligand variables
INFO:kinactive.features:Resulting shape: (186, 793)
INFO:kinactive.features:Saved default_lig_vs.csv to ../data/variable_sets
INFO:kinactive.features:Calculating structure variables
INFO:kinactive.features:Resulting shape: (186, 1693)
INFO:kinactive.features:Saved default_str_vs.csv to ../data/variable_sets
INFO:kinactive.features:Finished calculations
Calculating all four sets on all domains takes ~1h on 20 cores.