OpenStructure
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
Public Member Functions | Data Fields
QSscorer Class Reference

Public Member Functions

def __init__
 
def chem_mapping
 
def ent_to_cm_1
 
def ent_to_cm_2
 
def symm_1
 
def symm_2
 
def SetSymmetries
 
def chain_mapping
 
def chain_mapping_scheme
 
def alignments
 
def mapped_residues
 
def global_score
 
def best_score
 
def superposition
 
def clustalw_bin
 
def GetOligoLDDTScorer
 

Data Fields

 qs_ent_1
 
 qs_ent_2
 
 res_num_alignment
 
 calpha_only
 
 max_ca_per_chain_for_cm
 

Detailed Description

Object to compute QS scores.

Simple usage without any precomputed contacts, symmetries and mappings:

.. code-block:: python

  import ost
  from ost.mol.alg import qsscoring

  # load two biounits to compare
  ent_full = ost.io.LoadPDB('3ia3', remote=True)
  ent_1 = ent_full.Select('cname=A,D')
  ent_2 = ent_full.Select('cname=B,C')
  # get score
  ost.PushVerbosityLevel(3)
  try:
    qs_scorer = qsscoring.QSscorer(ent_1, ent_2)
    ost.LogScript('QSscore:', str(qs_scorer.global_score))
    ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping))
    # commonly you want the QS global score as output
    qs_score = qs_scorer.global_score
  except qsscoring.QSscoreError as ex:
    # default handling: report failure and set score to 0
    ost.LogError('QSscore failed:', str(ex))
    qs_score = 0

For maximal performance when computing QS scores of the same entity with many
others, it is advisable to construct and reuse :class:`QSscoreEntity` objects.

Any known / precomputed information can be filled into the appropriate
attribute here (no checks done!). Otherwise most quantities are computed on
first access and cached (lazy evaluation). Setters are provided to set values
with extra checks (e.g. :func:`SetSymmetries`).

All necessary seq. alignments are done by global BLOSUM62-based alignment. A
multiple sequence alignment is performed with ClustalW unless
:attr:`chain_mapping` is provided manually. You will need to have an
executable ``clustalw`` or ``clustalw2`` in your ``PATH`` or you must set
:attr:`clustalw_bin` accordingly. Otherwise an exception
(:class:`ost.settings.FileNotFound`) is thrown.

Formulas for QS scores:

::

  - QS_best = weighted_scores / (weight_sum + weight_extra_mapped)
  - QS_global = weighted_scores / (weight_sum + weight_extra_all)
  -> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared
  -> weight_sum = sum(w(min(d1,d2))) for shared
  -> weight_extra_mapped = sum(w(d)) for all mapped but non-shared
  -> weight_extra_all = sum(w(d)) for all non-shared
  -> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else

In the formulas above:

* "d": CA/CB-CA/CB distance of an "inter-chain contact" ("d1", "d2" for
  "shared" contacts).
* "mapped": we could map chains of two structures and align residues in
  :attr:`alignments`.
* "shared": pairs of residues which are "mapped" and have
  "inter-chain contact" in both structures.
* "inter-chain contact": CB-CB pairs (CA for GLY) with distance <= 12 A
  (fallback to CA-CA if :attr:`calpha_only` is True).
* "w(d)": weighting function (prob. of 2 res. to interact given CB distance)
  from `Xu et al. 2009 <https://dx.doi.org/10.1016%2Fj.jmb.2008.06.002>`_.

:param ent_1: First structure to be scored.
:type ent_1:  :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
              :class:`~ost.mol.EntityView`
:param ent_2: Second structure to be scored.
:type ent_2:  :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
              :class:`~ost.mol.EntityView`
:param res_num_alignment: Sets :attr:`res_num_alignment`

:raises: :class:`QSscoreError` if input structures are invalid or are monomers
         or have issues that make it impossible for a QS score to be computed.

.. attribute:: qs_ent_1

  :class:`QSscoreEntity` object for *ent_1* given at construction.
  If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
  set it to 'pdb_1' using :func:`~QSscoreEntity.SetName`.

.. attribute:: qs_ent_2

  :class:`QSscoreEntity` object for *ent_2* given at construction.
  If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
  set it to 'pdb_2' using :func:`~QSscoreEntity.SetName`.

.. attribute:: calpha_only

  True if any of the two structures is CA-only (after cleanup).

  :type: :class:`bool`

.. attribute:: max_ca_per_chain_for_cm

  Maximal number of CA atoms to use in each chain to determine chain mappings.
  Setting this to -1 disables the limit. Limiting it speeds up determination
  of symmetries and chain mappings. By default it is set to 100.

  :type: :class:`int`

.. attribute:: res_num_alignment

  Forces each alignment in :attr:`alignments` to be based on residue numbers
  instead of using a global BLOSUM62-based alignment.

  :type: :class:`bool`

Definition at line 42 of file qsscoring.py.

Constructor & Destructor Documentation

def __init__ (   self,
  ent_1,
  ent_2,
  res_num_alignment = False 
)

Definition at line 153 of file qsscoring.py.

Member Function Documentation

def alignments (   self)
List of successful sequence alignments using :attr:`chain_mapping`.

There will be one alignment for each mapped chain and they are ordered by
their chain names in :attr:`qs_ent_1`.

The first sequence of each alignment belongs to :attr:`qs_ent_1` and the
second one to :attr:`qs_ent_2`. The sequences are named according to the
mapped chain names and have views attached into :attr:`QSscoreEntity.ent`
of :attr:`qs_ent_1` and :attr:`qs_ent_2`.

If :attr:`res_num_alignment` is False, each alignment is performed using a
global BLOSUM62-based alignment. Otherwise, the positions in the alignment
sequences are simply given by the residue number so that residues with
matching numbers are aligned.

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`~ost.seq.AlignmentHandle`

Definition at line 403 of file qsscoring.py.

def best_score (   self)
QS-score without penalties.

Like :attr:`global_score`, but neglecting additional residues or chains in
one of the biounits (i.e. the score is calculated considering only mapped
chains and residues).

:getter: Computed on first use (cached)
:type: :class:`float`
:raises: :class:`QSscoreError` if only one chain is mapped

Definition at line 463 of file qsscoring.py.

def chain_mapping (   self)
Mapping from :attr:`ent_to_cm_1` to :attr:`ent_to_cm_2`.

Properties:

- Mapping is between chains of same chem. group (see :attr:`chem_mapping`)
- Each chain can appear only once in mapping
- All chains of complex with less chains are mapped
- Symmetry (:attr:`symm_1`, :attr:`symm_2`) is taken into account

Details on algorithms used to find mapping:

- We try all pairs of chem. mapped chains within symmetry group and get
  superpose-transformation for them
- First option: check for "sufficient overlap" of other chain-pairs

  - For each chain-pair defined above: apply superposition to full oligomer
and map chains based on structural overlap
  - Structural overlap = X% of residues in second oligomer covered within Y
Angstrom of a (chem. mapped) chain in first oligomer. We successively
try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in
mapping (warning shown for most permissive one).
  - If multiple possible mappings are found, we choose the one which leads
to the lowest multi-chain-RMSD given the superposition

- Fallback option: try all mappings to find minimal multi-chain-RMSD
  (warning shown)

  - For each chain-pair defined above: apply superposition, try all (!)
possible chain mappings (within symmetry group) and keep mapping with
lowest multi-chain-RMSD
  - Repeat procedure above to resolve symmetry. Within the symmetry group we
can use the chain mapping computed before and we just need to find which
symmetry group in first oligomer maps to which in the second one. We
again try all possible combinations...
  - Limitations:

- Trying all possible mappings is a combinatorial nightmare (factorial).
  We throw an exception if too many combinations (e.g. octomer vs
  octomer with no usable symmetry)
- The mapping is forced: the "best" mapping will be chosen independently
  of how badly they fit in terms of multi-chain-RMSD
- As a result, such a forced mapping can lead to a large range of
  resulting QS scores. An extreme example was observed between 1on3.1
  and 3u9r.1, where :attr:`global_score` can range from 0.12 to 0.43
  for mappings with very similar multi-chain-RMSD.

:getter: Computed on first use (cached)
:type: :class:`dict` with key / value = :class:`str` (chain names, key
   for :attr:`ent_to_cm_1`, value for :attr:`ent_to_cm_2`)
:raises: :class:`QSscoreError` if there are too many combinations to check
     to find a chain mapping.

Definition at line 313 of file qsscoring.py.

def chain_mapping_scheme (   self)
Mapping scheme used to get :attr:`chain_mapping`.

Possible values:

- 'strict': 80% overlap needed within 4 Angstrom (overlap based mapping).
- 'tolerant': 40% overlap needed within 6 Angstrom (overlap based mapping).
- 'permissive': 20% overlap needed within 8 Angstrom (overlap based
  mapping). It's best if you check mapping manually!
- 'extensive': Extensive search used for mapping detection (fallback). This
  approach has known limitations and may be removed in future versions.
  Mapping should be checked manually!
- 'user': :attr:`chain_mapping` was set by user before first use of this
  attribute.

:getter: Computed with :attr:`chain_mapping` on first use (cached)
:type: :class:`str`
:raises: :class:`QSscoreError` as in :attr:`chain_mapping`.

Definition at line 374 of file qsscoring.py.

def chem_mapping (   self)
Inter-complex mapping of chemical groups.

Each group (see :attr:`QSscoreEntity.chem_groups`) is mapped according to
highest sequence identity. Alignment is between longest sequences in groups.

Limitations:

- If different numbers of groups, we map only the groups for the complex
  with less groups (rest considered unmapped and shown as warning)
- The mapping is forced: the "best" mapping will be chosen independently of
  how low the seq. identity may be

:getter: Computed on first use (cached)
:type: :class:`dict` with key = :class:`tuple` of chain names in
   :attr:`qs_ent_1` and value = :class:`tuple` of chain names in
   :attr:`qs_ent_2`.

:raises: :class:`QSscoreError` if we end up having no chains for either
     entity in the mapping (can happen if chains do not have CA atoms).

Definition at line 193 of file qsscoring.py.

def clustalw_bin (   self)
Full path to ``clustalw`` or ``clustalw2`` executable to use for multiple
sequence alignments (unless :attr:`chain_mapping` is provided manually).

:getter: Located in path on first use (cached)
:type: :class:`str`

Definition at line 499 of file qsscoring.py.

def ent_to_cm_1 (   self)
Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries.

Properties:

- Includes only residues aligned according to :attr:`chem_mapping`
- Includes only 1 CA atom per residue
- Has at least 5 and at most :attr:`max_ca_per_chain_for_cm` atoms per chain
- All chains of the same chemical group have the same number of atoms
  (also in :attr:`ent_to_cm_2` according to :attr:`chem_mapping`)
- All chains appearing in :attr:`chem_mapping` appear in this entity
  (so the two can be safely used together)

This entity might be transformed (i.e. all positions rotated/translated by
same transformation matrix) if this can speed up computations. So do not
assume fixed global positions (but relative distances will remain fixed).

:getter: Computed on first use (cached)
:type: :class:`~ost.mol.EntityHandle`

:raises: :class:`QSscoreError` if any chain ends up having less than 5 res.

Definition at line 219 of file qsscoring.py.

def ent_to_cm_2 (   self)
Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries
(see :attr:`ent_to_cm_1` for details).

Definition at line 246 of file qsscoring.py.

def GetOligoLDDTScorer (   self,
  settings,
  penalize_extra_chains = True 
)
:return: :class:`OligoLDDTScorer` object, setup for this QS scoring problem.
:param settings: Passed to :class:`OligoLDDTScorer` constructor.
:param penalize_extra_chains: Passed to :class:`OligoLDDTScorer` constructor.

Definition at line 511 of file qsscoring.py.

def global_score (   self)
QS-score with penalties.

The range of the score is between 0 (i.e. no interface residues are shared
between biounits) and 1 (i.e. the interfaces are identical).

The global QS-score is computed applying penalties when interface residues
or entire chains are missing (i.e. anything that is not mapped in
:attr:`mapped_residues` / :attr:`chain_mapping`) in one of the biounits.

:getter: Computed on first use (cached)
:type: :class:`float`
:raises: :class:`QSscoreError` if only one chain is mapped

Definition at line 444 of file qsscoring.py.

def mapped_residues (   self)
Mapping of shared residues in :attr:`alignments`.

:getter: Computed on first use (cached)
:type: :class:`dict` *mapped_residues[c1][r1] = r2* with:
   *c1* = Chain name in first entity (= first sequence in aln),
   *r1* = Residue number in first entity,
   *r2* = Residue number in second entity

Definition at line 430 of file qsscoring.py.

def SetSymmetries (   self,
  symm_1,
  symm_2 
)
Set user-provided symmetry groups.

These groups are restricted to chain names appearing in :attr:`ent_to_cm_1`
and :attr:`ent_to_cm_2` respectively. They are only valid if they cover all
chains and both *symm_1* and *symm_2* have same lengths of symmetry group
tuples. Otherwise trivial symmetry group used (see :attr:`symm_1`).

:param symm_1: Value to set for :attr:`symm_1`.
:param symm_2: Value to set for :attr:`symm_2`.

Definition at line 293 of file qsscoring.py.

def superposition (   self)
Superposition result based on shared CA atoms in :attr:`alignments`.

The superposition can be used to map :attr:`QSscoreEntity.ent` of
:attr:`qs_ent_1` onto the one of :attr:`qs_ent_2`. Use
:func:`ost.geom.Invert` if you need the opposite transformation.

:getter: Computed on first use (cached)
:type: :class:`ost.mol.alg.SuperpositionResult`

Definition at line 479 of file qsscoring.py.

def symm_1 (   self)
Symmetry groups for :attr:`qs_ent_1` used to speed up chain mapping.

This is a list of chain-lists where each chain-list can be used reconstruct
the others via cyclic C or dihedral D symmetry. The first chain-list is used
as a representative symmetry group. For heteromers, the group-members must
contain all different seqres in oligomer.

Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are
symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C).

Properties:

- All symmetry group tuples have the same length (num. of chains)
- All chains in :attr:`ent_to_cm_1` appear (w/o duplicates)
- For heteros: symmetry group tuples have all different chem. groups
- Trivial symmetry group = one tuple with all chains (used if inconsistent
  data provided or if no symmetry is found)
- Either compatible to :attr:`symm_2` or trivial symmetry groups used.
  Compatibility requires same lengths of symmetry group tuples and it must
  be possible to get an overlap (80% of residues covered within 6 A of a
  (chem. mapped) chain) of all chains in representative symmetry groups by
  superposing one pair of chains.

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`tuple` of :class:`str` (chain names)

Definition at line 255 of file qsscoring.py.

def symm_2 (   self)
Symmetry groups for :attr:`qs_ent_2` (see :attr:`symm_1` for details).

Definition at line 287 of file qsscoring.py.

Field Documentation

calpha_only

Definition at line 175 of file qsscoring.py.

max_ca_per_chain_for_cm

Definition at line 176 of file qsscoring.py.

qs_ent_1

Definition at line 156 of file qsscoring.py.

qs_ent_2

Definition at line 160 of file qsscoring.py.

res_num_alignment

Definition at line 174 of file qsscoring.py.


The documentation for this class was generated from the following file: