OpenStructure
Public Member Functions | Data Fields
ChainMapper Class Reference

Public Member Functions

def __init__ (self, target, resnum_alignments=False, pep_seqid_thr=95., pep_gap_thr=1.0, nuc_seqid_thr=95., nuc_gap_thr=1.0, pep_subst_mat=seq.alg.BLOSUM62, pep_gap_open=-11, pep_gap_ext=-1, nuc_subst_mat=seq.alg.NUC44, nuc_gap_open=-4, nuc_gap_ext=-4, min_pep_length=6, min_nuc_length=4, n_max_naive=1e8)
 
def target (self)
 
def polypep_seqs (self)
 
def polynuc_seqs (self)
 
def chem_groups (self)
 
def chem_group_alignments (self)
 
def chem_group_ref_seqs (self)
 
def chem_group_types (self)
 
def GetChemMapping (self, model)
 
def GetlDDTMapping (self, model, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], strategy="heuristic", steep_opt_rate=None, block_seed_size=5, block_blocks_per_chem_group=5, chem_mapping_result=None, heuristic_n_max_naive=40320)
 
def GetQSScoreMapping (self, model, contact_d=12.0, strategy="heuristic", block_seed_size=5, block_blocks_per_chem_group=5, heuristic_n_max_naive=40320, steep_opt_rate=None, chem_mapping_result=None, greedy_prune_contact_map=True)
 
def GetRMSDMapping (self, model, strategy="heuristic", subsampling=50, chem_mapping_result=None, heuristic_n_max_naive=120)
 
def GetMapping (self, model, n_max_naive=40320)
 
def GetRepr (self, substructure, model, topn=1, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], bb_only=False, only_interchain=False, chem_mapping_result=None, global_mapping=None)
 
def GetNMappings (self, model)
 
def ProcessStructure (self, ent)
 
def Align (self, s1, s2, stype)
 

Data Fields

 resnum_alignments
 
 pep_seqid_thr
 
 pep_gap_thr
 
 nuc_seqid_thr
 
 nuc_gap_thr
 
 min_pep_length
 
 min_nuc_length
 
 n_max_naive
 
 aligner
 

Detailed Description

 Class to compute chain mappings

All algorithms are performed on processed structures which fulfill
criteria as given in constructor arguments (*min_pep_length*,
"min_nuc_length") and only contain residues which have all required backbone
atoms. for peptide residues thats N, CA, C and CB (no CB for GLY), for
nucleotide residues thats O5', C5', C4', C3' and O3'.

Chain mapping is a three step process:

* Group chemically identical chains in *target* using pairwise
  alignments that are either computed with Needleman-Wunsch (NW) or
  simply derived from residue numbers (*resnum_alignments* flag).
  In case of NW, *pep_subst_mat*, *pep_gap_open* and *pep_gap_ext*
  and their nucleotide equivalents are relevant. Two chains are
  considered identical if they fulfill the thresholds given by
  *pep_seqid_thr*, *pep_gap_thr*, their nucleotide equivalents
  respectively. The grouping information is available as
  attributes of this class.

* Map chains in an input model to these groups. Generating alignments
  and the similarity criteria are the same as above. You can either
  get the group mapping with :func:`GetChemMapping` or directly call
  one of the full fletched one-to-one chain mapping functions which
  execute that step internally.

* Obtain one-to-one mapping for chains in an input model and
  *target* with one of the available mapping functions. Just to get an
  idea of complexity. If *target* and *model* are octamers, there are
  ``8! = 40320`` possible chain mappings.

:param target: Target structure onto which models are mapped.
               Computations happen on a selection only containing
               polypeptides and polynucleotides.
:type target: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param resnum_alignments: Use residue numbers instead of
                          Needleman-Wunsch to compute pairwise
                          alignments. Relevant for :attr:`~chem_groups` 
                          and related attributes.
:type resnum_alignments: :class:`bool`
:param pep_seqid_thr: Threshold used to decide when two chains are
                      identical. 95 percent tolerates the few mutations
                      crystallographers like to do.
:type pep_seqid_thr:  :class:`float`
:param pep_gap_thr: Additional threshold to avoid gappy alignments with
                    high seqid. By default this is disabled (set to 1.0).
                    This threshold checks for a maximum allowed fraction
                    of gaps in any of the two sequences after stripping
                    terminal gaps. The reason for not just normalizing
                    seqid by the longer sequence is that one sequence
                    might be a perfect subsequence of the other but only
                    cover half of it. 
:type pep_gap_thr:  :class:`float`
:param nuc_seqid_thr: Nucleotide equivalent for *pep_seqid_thr*
:type nuc_seqid_thr:  :class:`float`
:param nuc_gap_thr: Nucleotide equivalent for *nuc_gap_thr*
:type nuc_gap_thr:  :class:`float`
:param pep_subst_mat: Substitution matrix to align peptide sequences,
                      irrelevant if *resnum_alignments* is True,
                      defaults to seq.alg.BLOSUM62
:type pep_subst_mat: :class:`ost.seq.alg.SubstWeightMatrix`
:param pep_gap_open: Gap open penalty to align peptide sequences,
                     irrelevant if *resnum_alignments* is True
:type pep_gap_open: :class:`int`
:param pep_gap_ext: Gap extension penalty to align peptide sequences,
                    irrelevant if *resnum_alignments* is True
:type pep_gap_ext: :class:`int`
:param nuc_subst_mat: Nucleotide equivalent for *pep_subst_mat*,
                      defaults to seq.alg.NUC44
:type nuc_subst_mat: :class:`ost.seq.alg.SubstWeightMatrix`
:param nuc_gap_open: Nucleotide equivalent for *pep_gap_open*
:type nuc_gap_open: :class:`int`
:param nuc_gap_ext: Nucleotide equivalent for *pep_gap_ext*
:type nuc_gap_ext: :class:`int`
:param min_pep_length: Minimal number of residues for a peptide chain to be
                       considered in target and in models.
:type min_pep_length: :class:`int`
:param min_nuc_length: Minimal number of residues for a nucleotide chain to be
                       considered in target and in models.
:type min_nuc_length: :class:`int` 
:param n_max_naive: Max possible chain mappings that are enumerated in
                    :func:`~GetNaivelDDTMapping` /
                    :func:`~GetDecomposerlDDTMapping`. A
                    :class:`RuntimeError` is raised in case of bigger
                    complexity.
:type n_max_naive: :class:`int`

Definition at line 508 of file chain_mapping.py.

Constructor & Destructor Documentation

◆ __init__()

def __init__ (   self,
  target,
  resnum_alignments = False,
  pep_seqid_thr = 95.,
  pep_gap_thr = 1.0,
  nuc_seqid_thr = 95.,
  nuc_gap_thr = 1.0,
  pep_subst_mat = seq.alg.BLOSUM62,
  pep_gap_open = -11,
  pep_gap_ext = -1,
  nuc_subst_mat = seq.alg.NUC44,
  nuc_gap_open = -4,
  nuc_gap_ext = -4,
  min_pep_length = 6,
  min_nuc_length = 4,
  n_max_naive = 1e8 
)

Definition at line 596 of file chain_mapping.py.

Member Function Documentation

◆ Align()

def Align (   self,
  s1,
  s2,
  stype 
)
 Access to internal sequence alignment functionality

Alignment parameterization is setup at ChainMapper construction

:param s1: First sequence to align - must have view attached in case
           of resnum_alignments
:type s1: :class:`ost.seq.SequenceHandle`
:param s2: Second sequence to align - must have view attached in case
           of resnum_alignments
:type s2: :class:`ost.seq.SequenceHandle`
:param stype: Type of sequences to align, must be in
              [:class:`ost.mol.ChemType.AMINOACIDS`,
              :class:`ost.mol.ChemType.NUCLEOTIDES`]
:returns: Pairwise alignment of s1 and s2

Definition at line 1613 of file chain_mapping.py.

◆ chem_group_alignments()

def chem_group_alignments (   self)
MSA for each group in :attr:`~chem_groups`

Sequences in MSAs exhibit same order as in :attr:`~chem_groups` and
have the respective :class:`ost.mol.EntityView` from *target* attached.

:getter: Computed on first use (cached)
:type: :class:`ost.seq.AlignmentList`

Definition at line 685 of file chain_mapping.py.

◆ chem_group_ref_seqs()

def chem_group_ref_seqs (   self)
Reference (longest) sequence for each group in :attr:`~chem_groups`

Respective :class:`EntityView` from *target* for each sequence s are
available as ``s.GetAttachedView()``

:getter: Computed on first use (cached)
:type: :class:`ost.seq.SequenceList`

Definition at line 706 of file chain_mapping.py.

◆ chem_group_types()

def chem_group_types (   self)
ChemType of each group in :attr:`~chem_groups`

Specifying if groups are poly-peptides/nucleotides, i.e. 
:class:`ost.mol.ChemType.AMINOACIDS` or
:class:`ost.mol.ChemType.NUCLEOTIDES`

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`ost.mol.ChemType`

Definition at line 725 of file chain_mapping.py.

◆ chem_groups()

def chem_groups (   self)
Groups of chemically equivalent chains in :attr:`~target`

First chain in group is the one with longest sequence.

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`list` of :class:`str` (chain names)

Definition at line 670 of file chain_mapping.py.

◆ GetChemMapping()

def GetChemMapping (   self,
  model 
)
Maps sequences in *model* to chem_groups of target

:param model: Model from which to extract sequences, a
              selection that only includes peptides and nucleotides
              is performed and returned along other results.
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:returns: Tuple with two lists of length `len(self.chem_groups)` and
          an :class:`ost.mol.EntityView` representing *model*:
          1) Each element is a :class:`list` with mdl chain names that
          map to the chem group at that position.
          2) Each element is a :class:`ost.seq.AlignmentList` aligning
          these mdl chain sequences to the chem group ref sequences.
          3) A selection of *model* that only contains polypeptides and
          polynucleotides whose ATOMSEQ exactly matches the sequence
          info in the returned alignments.

Definition at line 746 of file chain_mapping.py.

◆ GetlDDTMapping()

def GetlDDTMapping (   self,
  model,
  inclusion_radius = 15.0,
  thresholds = [0.5, 1.0, 2.0, 4.0],
  strategy = "heuristic",
  steep_opt_rate = None,
  block_seed_size = 5,
  block_blocks_per_chem_group = 5,
  chem_mapping_result = None,
  heuristic_n_max_naive = 40320 
)
 Identify chain mapping by optimizing lDDT score

Maps *model* chain sequences to :attr:`~chem_groups` and find mapping
based on backbone only lDDT score (CA for amino acids C3' for
Nucleotides).

Either performs a naive search, i.e. enumerate all possible mappings or
executes a greedy strategy that tries to identify a (close to) optimal
mapping in an iterative way by starting from a start mapping (seed). In
each iteration, the one-to-one mapping that leads to highest increase
in number of conserved contacts is added with the additional requirement
that this added mapping must have non-zero interface counts towards the
already mapped chains. So basically we're "growing" the mapped structure
by only adding connected stuff.

The available strategies:

* **naive**: Enumerates all possible mappings and returns best        

* **greedy_fast**: perform all vs. all single chain lDDTs within the
  respective ref/mdl chem groups. The mapping with highest number of
  conserved contacts is selected as seed for greedy extension

* **greedy_full**: try multiple seeds for greedy extension, i.e. try
  all ref/mdl chain combinations within the respective chem groups and
  retain the mapping leading to the best lDDT.

* **greedy_block**: try multiple seeds for greedy extension, i.e. try
  all ref/mdl chain combinations within the respective chem groups and
  extend them to *block_seed_size*. *block_blocks_per_chem_group*
  for each chem group are selected for exhaustive extension.

* **heuristic**: Uses *naive* strategy if number of possible mappings
  is within *heuristic_n_max_naive*. The default of 40320 corresponds
  to an octamer (8!=40320). A structure with stoichiometry A6B2 would be
  6!*2!=1440 etc. If the number of possible mappings is larger,
  *greedy_full* is used.

Sets :attr:`MappingResult.opt_score` in case of no trivial one-to-one
mapping. 

:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param inclusion_radius: Inclusion radius for lDDT
:type inclusion_radius: :class:`float`
:param thresholds: Thresholds for lDDT
:type thresholds: :class:`list` of :class:`float`
:param strategy: Strategy to find mapping. Must be in ["naive",
                 "greedy_fast", "greedy_full", "greedy_block"]
:type strategy: :class:`str`
:param steep_opt_rate: Only relevant for greedy strategies.
                       If set, every *steep_opt_rate* mappings, a simple
                       optimization is executed with the goal of
                       avoiding local minima. The optimization
                       iteratively checks all possible swaps of mappings
                       within their respective chem groups and accepts
                       swaps that improve lDDT score. Iteration stops as
                       soon as no improvement can be achieved anymore.
:type steep_opt_rate: :class:`int`
:param block_seed_size: Param for *greedy_block* strategy - Initial seeds
                        are extended by that number of chains.
:type block_seed_size: :class:`int`
:param block_blocks_per_chem_group: Param for *greedy_block* strategy -
                                    Number of blocks per chem group that
                                    are extended in an initial search
                                    for high scoring local solutions.
:type block_blocks_per_chem_group: :class:`int`
:param chem_mapping_result: Pro param. The result of
                            :func:`~GetChemMapping` where you provided
                            *model*. If set, *model* parameter is not
                            used.
:type chem_mapping_result: :class:`tuple`
:returns: A :class:`MappingResult`

Definition at line 788 of file chain_mapping.py.

◆ GetMapping()

def GetMapping (   self,
  model,
  n_max_naive = 40320 
)
 Convenience function to get mapping with currently preferred method

If number of possible chain mappings is <= *n_max_naive*, a naive
QS-score mapping is performed and optimal QS-score is guaranteed.
For anything else, a QS-score mapping with the greedy_full strategy is
performed (greedy_prune_contact_map = True). The default for
*n_max_naive* of 40320 corresponds to an octamer (8!=40320). A
structure with stoichiometry A6B2 would be 6!*2!=1440 etc.

Definition at line 1224 of file chain_mapping.py.

◆ GetNMappings()

def GetNMappings (   self,
  model 
)
 Returns number of possible mappings

:param model: Model with chains that are mapped onto
              :attr:`chem_groups`
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`

Definition at line 1492 of file chain_mapping.py.

◆ GetQSScoreMapping()

def GetQSScoreMapping (   self,
  model,
  contact_d = 12.0,
  strategy = "heuristic",
  block_seed_size = 5,
  block_blocks_per_chem_group = 5,
  heuristic_n_max_naive = 40320,
  steep_opt_rate = None,
  chem_mapping_result = None,
  greedy_prune_contact_map = True 
)
 Identify chain mapping based on QSScore

Scoring is based on CA/C3' positions which are present in all chains of
a :attr:`chem_groups` as well as the *model* chains which are mapped to
that respective chem group.

The following strategies are available:

* **naive**: Naively iterate all possible mappings and return best based
             on QS score.

* **greedy_fast**: perform all vs. all single chain lDDTs within the
  respective ref/mdl chem groups. The mapping with highest number of
  conserved contacts is selected as seed for greedy extension.
  Extension is based on QS-score.

* **greedy_full**: try multiple seeds for greedy extension, i.e. try
  all ref/mdl chain combinations within the respective chem groups and
  retain the mapping leading to the best QS-score. 

* **greedy_block**: try multiple seeds for greedy extension, i.e. try
  all ref/mdl chain combinations within the respective chem groups and
  extend them to *block_seed_size*. *block_blocks_per_chem_group*
  for each chem group are selected for exhaustive extension.

* **heuristic**: Uses *naive* strategy if number of possible mappings
  is within *heuristic_n_max_naive*. The default of 40320 corresponds
  to an octamer (8!=40320). A structure with stoichiometry A6B2 would be
  6!*2!=1440 etc. If the number of possible mappings is larger,
  *greedy_full* is used.

Sets :attr:`MappingResult.opt_score` in case of no trivial one-to-one
mapping.

:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param contact_d: Max distance between two residues to be considered as 
                  contact in qs scoring
:type contact_d: :class:`float` 
:param strategy: Strategy for sampling, must be in ["naive",
                 "greedy_fast", "greedy_full", "greedy_block"]
:type strategy: :class:`str`
:param chem_mapping_result: Pro param. The result of
                            :func:`~GetChemMapping` where you provided
                            *model*. If set, *model* parameter is not
                            used.
:type chem_mapping_result: :class:`tuple`
:param greedy_prune_contact_map: Relevant for all strategies that use
                                 greedy extensions. If True, only chains
                                 with at least 3 contacts (8A CB
                                 distance) towards already mapped chains
                                 in trg/mdl are considered for
                                 extension. All chains that give a
                                 potential non-zero QS-score increase
                                 are used otherwise (at least one
                                 contact within 12A). The consequence
                                 is reduced runtime and usually no
                                 real reduction in accuracy.
:returns: A :class:`MappingResult`

Definition at line 943 of file chain_mapping.py.

◆ GetRepr()

def GetRepr (   self,
  substructure,
  model,
  topn = 1,
  inclusion_radius = 15.0,
  thresholds = [0.5, 1.0, 2.0, 4.0],
  bb_only = False,
  only_interchain = False,
  chem_mapping_result = None,
  global_mapping = None 
)
 Identify *topn* representations of *substructure* in *model*

*substructure* defines a subset of :attr:`~target` for which one
wants the *topn* representations in *model*. Representations are scored
and sorted by lDDT.

:param substructure: A :class:`ost.mol.EntityView` which is a subset of
                     :attr:`~target`. Should be selected with the
                     OpenStructure query language. Example: if you're
                     interested in residues with number 42,43 and 85 in
                     chain A:
                     ``substructure=mapper.target.Select("cname=A and rnum=42,43,85")``
                     A :class:`RuntimeError` is raised if *substructure*
                     does not refer to the same underlying
                     :class:`ost.mol.EntityHandle` as :attr:`~target`.
:type substructure: :class:`ost.mol.EntityView`
:param model: Structure in which one wants to find representations for
              *substructure*
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param topn: Max number of representations that are returned
:type topn: :class:`int`
:param inclusion_radius: Inclusion radius for lDDT
:type inclusion_radius: :class:`float`
:param thresholds: Thresholds for lDDT
:type thresholds: :class:`list` of :class:`float`
:param bb_only: Only consider backbone atoms in lDDT computation
:type bb_only: :class:`bool`
:param only_interchain: Only score interchain contacts in lDDT. Useful
                        if you want to identify interface patches.
:type only_interchain: :class:`bool`
:param chem_mapping_result: Pro param. The result of
                            :func:`~GetChemMapping` where you provided
                            *model*. If set, *model* parameter is not
                            used.
:type chem_mapping_result: :class:`tuple`
:param global_mapping: Pro param. Specify a global mapping result. This
                       fully defines the desired representation in the
                       model but extracts it and enriches it with all
                       the nice attributes of :class:`ReprResult`.
                       The target attribute in *global_mapping* must be
                       of the same entity as self.target and the model
                       attribute of *global_mapping* must be of the same
                       entity as *model*.
:type global_mapping: :class:`MappingResult`
:returns: :class:`list` of :class:`ReprResult`

Definition at line 1238 of file chain_mapping.py.

◆ GetRMSDMapping()

def GetRMSDMapping (   self,
  model,
  strategy = "heuristic",
  subsampling = 50,
  chem_mapping_result = None,
  heuristic_n_max_naive = 120 
)
Identify chain mapping based on minimal RMSD superposition

Superposition and scoring is based on CA/C3' positions which are present
in all chains of a :attr:`chem_groups` as well as the *model*
chains which are mapped to that respective chem group.

The following strategies are available:

* **naive**: Naively iterate all possible mappings and return the one
  with lowes RMSD.

* **greedy_single**: perform all vs. all single chain superpositions
  within the respective ref/mdl chem groups to use as starting points.
  For each starting point, iteratively add the model/target chain pair
  with lowest RMSD until a full mapping is achieved. The mapping with
  lowest RMSD is returned.

* **greedy_iterative**: Same as greedy_single_rmsd exept that the
  transformation gets updated with each added chain pair.

* **heuristic**: Uses *naive* strategy if number of possible mappings
  is within *heuristic_n_max_naive*. The default of 120 corresponds
  to a homo-pentamer (5!=120). If the number of possible mappings is
  larger, *greedy_iterative* is used.

:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param strategy: Strategy for sampling. Must be in ["naive",
                 "greedy_single", "greedy_iterative"]
:type strategy: :class:`str`
:param subsampling: If given, only an equally distributed subset
                    of CA/C3' positions in each chain are used for
                    superposition/scoring.
:type subsampling: :class:`int`
:param chem_mapping_result: Pro param. The result of
                            :func:`~GetChemMapping` where you provided
                            *model*. If set, *model* parameter is not
                            used.
:type chem_mapping_result: :class:`tuple`
:returns: A :class:`MappingResult`

Definition at line 1080 of file chain_mapping.py.

◆ polynuc_seqs()

def polynuc_seqs (   self)
Sequences of nucleotide chains in :attr:`~target`

Respective :class:`EntityView` from *target* for each sequence s are
available as ``s.GetAttachedView()``

:type: :class:`ost.seq.SequenceList`

Definition at line 659 of file chain_mapping.py.

◆ polypep_seqs()

def polypep_seqs (   self)
Sequences of peptide chains in :attr:`~target`

Respective :class:`EntityView` from *target* for each sequence s are
available as ``s.GetAttachedView()``

:type: :class:`ost.seq.SequenceList`

Definition at line 648 of file chain_mapping.py.

◆ ProcessStructure()

def ProcessStructure (   self,
  ent 
)
 Entity processing for chain mapping

* Selects view containing peptide and nucleotide residues which have 
  required backbone atoms present - for peptide residues thats
  N, CA, C and CB (no CB for GLY), for nucleotide residues thats
  O5', C5', C4', C3' and O3'.
* filters view by chain lengths, see *min_pep_length* and
  *min_nuc_length* in constructor
* Extracts atom sequences for each chain in that view
* Attaches corresponding :class:`ost.mol.EntityView` to each sequence
* If residue number alignments are used, strictly increasing residue
  numbers without insertion codes are ensured in each chain

:param ent: Entity to process
:type ent: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:returns: Tuple with 3 elements: 1) :class:`ost.mol.EntityView`
          containing peptide and nucleotide residues 2)
          :class:`ost.seq.SequenceList` containing ATOMSEQ sequences
          for each polypeptide chain in returned view, sequences have
          :class:`ost.mol.EntityView` of according chains attached
          3) same for polynucleotide chains

Definition at line 1502 of file chain_mapping.py.

◆ target()

def target (   self)
Target structure that only contains peptides/nucleotides

Contains only residues that have the backbone representatives
(CA for peptide and C3' for nucleotides) to avoid ATOMSEQ alignment
inconsistencies when switching between all atom and backbone only
representations.

:type: :class:`ost.mol.EntityView`

Definition at line 635 of file chain_mapping.py.

Field Documentation

◆ aligner

aligner

Definition at line 622 of file chain_mapping.py.

◆ min_nuc_length

min_nuc_length

Definition at line 612 of file chain_mapping.py.

◆ min_pep_length

min_pep_length

Definition at line 611 of file chain_mapping.py.

◆ n_max_naive

n_max_naive

Definition at line 613 of file chain_mapping.py.

◆ nuc_gap_thr

nuc_gap_thr

Definition at line 610 of file chain_mapping.py.

◆ nuc_seqid_thr

nuc_seqid_thr

Definition at line 609 of file chain_mapping.py.

◆ pep_gap_thr

pep_gap_thr

Definition at line 608 of file chain_mapping.py.

◆ pep_seqid_thr

pep_seqid_thr

Definition at line 607 of file chain_mapping.py.

◆ resnum_alignments

resnum_alignments

Definition at line 606 of file chain_mapping.py.


The documentation for this class was generated from the following file: