kinactive.db module

A DB class for the PK data collection creation and io.

class kinactive.db.DB(cfg: DBConfig = DBConfig(verbose=True, target_dir=PosixPath('db'), pdb_dir=PosixPath('pdb/structures'), pdb_dir_info=PosixPath('pdb/info'), seq_dir=PosixPath('uniprot/fasta'), max_fetch_trials=2, io_cpus=1, init_cpus=1, init_map_numbering_cpus=1, init_add_structure_cpus=1, init_tolerate_failures=True, profile=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/kinactive/checkouts/latest/kinactive/resources/PF00069.hmm'), tk2pk=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/kinactive/checkouts/latest/kinactive/resources/tk2pk.json'), pk_map_name='PK', pk_min_score=50, pk_min_seq_domain_size=150, pk_min_str_domain_size=100, pk_min_cov_hmm=0.5, pk_min_cov_seq=0.5, pk_min_str_seq_match=0.8, min_seq_size=150, max_seq_size=5000, pdb_fmt='mmtf.gz', pdb_num_fetch_threads=10, pdb_str_min_size=100, uniprot_chunk_size=100, uniprot_num_fetch_threads=10))[source]

Bases: object

An object encapsulating methods for building/saving/loading an lXtractor “database” – a collection of Chain’s.

__init__(cfg: DBConfig = DBConfig(verbose=True, target_dir=PosixPath('db'), pdb_dir=PosixPath('pdb/structures'), pdb_dir_info=PosixPath('pdb/info'), seq_dir=PosixPath('uniprot/fasta'), max_fetch_trials=2, io_cpus=1, init_cpus=1, init_map_numbering_cpus=1, init_add_structure_cpus=1, init_tolerate_failures=True, profile=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/kinactive/checkouts/latest/kinactive/resources/PF00069.hmm'), tk2pk=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/kinactive/checkouts/latest/kinactive/resources/tk2pk.json'), pk_map_name='PK', pk_min_score=50, pk_min_seq_domain_size=150, pk_min_str_domain_size=100, pk_min_cov_hmm=0.5, pk_min_cov_seq=0.5, pk_min_str_seq_match=0.8, min_seq_size=150, max_seq_size=5000, pdb_fmt='mmtf.gz', pdb_num_fetch_threads=10, pdb_str_min_size=100, uniprot_chunk_size=100, uniprot_num_fetch_threads=10))[source]
build(uniprot_ids: Collection[str] | None = None, pdb_chain_ids: Collection[str] | None = None, n_domains: int = 0) ChainList[Chain][source]

Build a new lXt-PK data collection.

Parameters:
  • uniprot_ids – An optional list of UniProt IDs to restrict the db to.

  • pdb_chain_ids – An optional collection of PDB chains to restrict the db to. Format: “{PDB_ID}:{ChainID}”.

  • n_domains – Use n random sequence domains. It is helpful for testing the pipeline.

Returns:

A ChainList of Chain objects having at least one child PK domain with at least one PK domain structure passing the filtering thresholds.

discover_domains(seqs: ChainList[Chain | ChainSequence | ChainStructure]) ChainList[Chain | ChainSequence | ChainStructure][source]
load(dump: Path | Iterable[Path], domains: bool = True, sequences: bool = False, structures: bool = False, structures_sequences: bool = False) ChainList[Chain] | ChainList[ChainStructure] | ChainList[ChainSequence][source]

Load prepared db.

Parameters:
  • dump – Path with dumped :class:`Chain`s.

  • domains – Load domains without loading parent chains.

  • sequences – Load only canonical sequences.

  • structures – Load structures without loading canonical sequences.

  • structures_sequences – Load structure sequences without loading structures.

Returns:

A chain list with initialized :class:`Chain`s.

obtain_sifts_seqs(uniprot_ids: Sequence[str] | None = None) ChainList[Chain][source]
save(dest: Path | None = None, chains: Iterable[Chain] | None = None, *, overwrite: bool = False, summary: bool = True) None[source]

Save DB sequence to file system.

Parameters:
  • dest – Destination path to write seqs into.

  • chains – Manual chains input to save. If None, will use chains.

  • overwrite – Overwrite existing data in dest.

  • summary – Compose and save summaries to dest.

Returns:

An iterator over paths of successfully saved chains. Consume to trigger saving.

property chains: ChainList[Chain]
Returns:

Currently stored chains.