QSscorer Class Reference

Public Member Functions
def	__init__
def	chem_mapping
def	ent_to_cm_1
def	ent_to_cm_2
def	symm_1
def	symm_2
def	SetSymmetries
def	chain_mapping
def	alignments
def	mapped_residues
def	global_score
def	best_score
def	superposition
def	lddt_score
def	lddt_mdl
def	lddt_ref
def	clustalw_bin
Data Fields
	qs_ent_1
	qs_ent_2
	calpha_only
	max_ca_per_chain_for_cm

Detailed Description

Object to compute QS scores.

Simple usage without any precomputed contacts, symmetries and mappings:

.. code-block:: python

  import ost
  from ost.mol.alg import qsscoring

  # load two biounits to compare
  ent_full = ost.io.LoadPDB('3ia3', remote=True)
  ent_1 = ent_full.Select('cname=A,D')
  ent_2 = ent_full.Select('cname=B,C')
  # get score
  ost.PushVerbosityLevel(3)
  try:
    qs_scorer = qsscoring.QSscorer(ent_1, ent_2)
    ost.LogScript('QSscore:', str(qs_scorer.global_score))
    ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping))
    # commonly you want the QS global score as output
    qs_score = qs_scorer.global_score
  except qsscoring.QSscoreError as ex:
    # default handling: report failure and set score to 0
    ost.LogError('QSscore failed:', str(ex))
    qs_score = 0

For maximal performance when computing QS scores of the same entity with many
others, it is advisable to construct and reuse :class:`QSscoreEntity` objects.

Any known / precomputed information can be filled into the appropriate
attribute here (no checks done!). Otherwise most quantities are computed on
first access and cached (lazy evaluation). Setters are provided to set values
with extra checks (e.g. :func:`SetSymmetries`).

All necessary seq. alignments are done by global BLOSUM62-based alignment. A
multiple sequence alignment is performed with ClustalW unless
:attr:`chain_mapping` is provided manually. You will need to have an
executable ``clustalw`` or ``clustalw2`` in your ``PATH`` or you must set
:attr:`clustalw_bin` accordingly. Otherwise an exception
(:class:`ost.settings.FileNotFound`) is thrown.

Formulas for QS scores:

::

  - QS_best = weighted_scores / (weight_sum + weight_extra_mapped)
  - QS_global = weighted_scores / (weight_sum + weight_extra_all)
  -> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared
  -> weight_sum = sum(w(min(d1,d2))) for shared
  -> weight_extra_mapped = sum(w(d)) for all mapped but non-shared
  -> weight_extra_all = sum(w(d)) for all non-shared
  -> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else

:param ent_1: First structure to be scored.
:type ent_1:  :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
              :class:`~ost.mol.EntityView`
:param ent_2: Second structure to be scored.
:type ent_2:  :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
              :class:`~ost.mol.EntityView`

:raises: :class:`QSscoreError` if input structures are invalid or are monomers
         or have issues that make it impossible for a QS score to be computed.

.. attribute:: qs_ent_1

  :class:`QSscoreEntity` object for *ent_1* given at construction.
  If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
  set it to 'pdb_1' using :func:`~QSscoreEntity.SetName`.

.. attribute:: qs_ent_2

  :class:`QSscoreEntity` object for *ent_2* given at construction.
  If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
  set it to 'pdb_2' using :func:`~QSscoreEntity.SetName`.

.. attribute:: calpha_only

  True if any of the two structures is CA-only (after cleanup).

  :type: :class:`bool`

.. attribute:: max_ca_per_chain_for_cm

  Maximal number of CA atoms to use in each chain to determine chain mappings.
  Setting this to -1 disables the limit. Limiting it speeds up determination
  of symmetries and chain mappings. By default it is set to 100.

  :type: :class:`int`

Definition at line 39 of file qsscoring.py.

Member Function Documentation

def __init__	(	self,
		ent_1,
		ent_2
	)

Definition at line 129 of file qsscoring.py.

def alignments ( self )

List of successful sequence alignments using :attr:`chain_mapping`.

There will be one alignment for each mapped chain and they are ordered by
their chain names in :attr:`qs_ent_1`.

The sequences of the alignments have views attached into
:attr:`QSscoreEntity.ent` of :attr:`qs_ent_1` and :attr:`qs_ent_2`.

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`~ost.seq.AlignmentHandle`

Definition at line 352 of file qsscoring.py.

def best_score ( self )

QS-score without penalties.

Like :attr:`global_score`, but neglecting additional residues or chains in
one of the biounits (i.e. the score is calculated considering only mapped
chains and residues).

:getter: Computed on first use (cached)
:type: :class:`float`

Definition at line 403 of file qsscoring.py.

def chain_mapping ( self )

Mapping from :attr:`ent_to_cm_1` to :attr:`ent_to_cm_2`.

Properties:

- Mapping is between chains of same chem. group (see :attr:`chem_mapping`)
- Each chain can appear only once in mapping
- All chains of complex with less chains are mapped
- Symmetry (:attr:`symm_1`, :attr:`symm_2`) is taken into account

Details on algorithms used to find mapping:

- We try all pairs of chem. mapped chains within symmetry group and get
  superpose-transformation for them
- First option: check for "sufficient overlap" of other chain-pairs

  - For each chain-pair defined above: apply superposition to full oligomer
and map chains based on structural overlap
  - Structural overlap = X% of residues in second oligomer covered within Y
Angstrom of a (chem. mapped) chain in first oligomer. We successively
try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in
mapping (warning shown for most permissive one).
  - If multiple possible mappings are found, we choose the one which leads
to the lowest multi-chain-RMSD given the superposition

- Fallback option: try all mappings to find minimal multi-chain-RMSD
  (warning shown)

  - For each chain-pair defined above: apply superposition, try all (!)
possible chain mappings (within symmetry group) and keep mapping with
lowest multi-chain-RMSD
  - Repeat procedure above to resolve symmetry. Within the symmetry group we
can use the chain mapping computed before and we just need to find which
symmetry group in first oligomer maps to which in the second one. We
again try all possible combinations...
  - Limitations:

- Trying all possible mappings is a combinatorial nightmare (factorial).
  We throw an exception if too many combinations (e.g. octomer vs
  octomer with no usable symmetry)
- The mapping is forced: the "best" mapping will be chosen independently
  of how badly they fit in terms of multi-chain-RMSD
- As a result, such a forced mapping can lead to a large range of
  resulting QS scores. An extreme example was observed between 1on3.1
  and 3u9r.1, where :attr:`global_score` can range from 0.12 to 0.43
  for mappings with very similar multi-chain-RMSD.

:getter: Computed on first use (cached)
:type: :class:`dict` with key / value = :class:`str` (chain names, key
   for :attr:`ent_to_cm_1`, value for :attr:`ent_to_cm_2`)
:raises: :class:`QSscoreError` if there are too many combinations to check
     to find a chain mapping.

Definition at line 291 of file qsscoring.py.

def chem_mapping ( self )

Inter-complex mapping of chemical groups.

Each group (see :attr:`QSscoreEntity.chem_groups`) is mapped according to
highest sequence identity. Alignment is between longest sequences in groups.

Limitations:

- If different numbers of groups, we map only the groups for the complex
  with less groups (rest considered unmapped and shown as warning)
- The mapping is forced: the "best" mapping will be chosen independently of
  how low the seq. identity may be

:getter: Computed on first use (cached)
:type: :class:`dict` with key = :class:`tuple` of chain names in
   :attr:`qs_ent_1` and value = :class:`tuple` of chain names in
   :attr:`qs_ent_2`.

:raises: :class:`QSscoreError` if we end up having less than 2 chains for
     either entity in the mapping (can happen if chains do not have CA
     atoms).

Definition at line 170 of file qsscoring.py.

def clustalw_bin ( self )

Full path to ``clustalw`` or ``clustalw2`` executable to use for multiple
sequence alignments (unless :attr:`chain_mapping` is provided manually).

:getter: Located in path on first use (cached)
:type: :class:`str`

Definition at line 497 of file qsscoring.py.

def ent_to_cm_1 ( self )

Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries.

Properties:

- Includes only residues aligned according to :attr:`chem_mapping`
- Includes only 1 CA atom per residue
- Has at least 5 and at most :attr:`max_ca_per_chain_for_cm` atoms per chain
- All chains of the same chemical group have the same number of atoms
  (also in :attr:`ent_to_cm_2` according to :attr:`chem_mapping`)
- All chains appearing in :attr:`chem_mapping` appear in this entity
  (so the two can be safely used together)

This entity might be transformed (i.e. all positions rotated/translated by
same transformation matrix) if this can speed up computations. So do not
assume fixed global positions (but relative distances will remain fixed).

:getter: Computed on first use (cached)
:type: :class:`~ost.mol.EntityHandle`

:raises: :class:`QSscoreError` if any chain ends up having less than 5 res.

Definition at line 197 of file qsscoring.py.

def ent_to_cm_2 ( self )

Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries
(see :attr:`ent_to_cm_1` for details).

Definition at line 224 of file qsscoring.py.

def global_score ( self )

QS-score with penalties.

The range of the score is between 0 (i.e. no interface residues are shared
between biounits) and 1 (i.e. the interfaces are identical).

The global QS-score is computed applying penalties when interface residues
or entire chains are missing (i.e. anything that is not mapped in
:attr:`mapped_residues` / :attr:`chain_mapping`) in one of the biounits.

:getter: Computed on first use (cached)
:type: :class:`float`

Definition at line 385 of file qsscoring.py.

def lddt_mdl ( self )

The model entity used for lDDT scoring (:attr:`lddt_score`) and annotated
with local scores.

Local scores are available as residue properties named 'lddt' and on each
atom as a B-factor. Only CA atoms are considered if :attr:`calpha_only` is
True, otherwise this is an all-atom score.

Since, the lDDT computation requires a single chain with mapped residue
numbering, all chains are appended into a single chain X with unique residue
numbers according to the column-index in the alignment. The alignments are
in the same order as they appear in :attr:`alignments`. Additional residues
are appended at the end of the chain with unique residue numbers.

:getter: Computed on first use (cached)
:type: :class:`~ost.mol.EntityHandle`

Definition at line 461 of file qsscoring.py.

def lddt_ref ( self )

The reference entity used for lDDT scoring (:attr:`lddt_score`).

This is a single chain X with residue numbers matching ones in
:attr:`lddt_mdl` where aligned and unique numbers for additional residues.

:getter: Computed on first use (cached)
:type: :class:`~ost.mol.EntityHandle`

Definition at line 483 of file qsscoring.py.

def lddt_score ( self )

The multi-chain lDDT score.

.. note::

  lDDT is not considering over-prediction (i.e. extra chains) and hence is
  not symmetric. Here, we consider :attr:`qs_ent_1` as the reference and
  :attr:`qs_ent_2` as the model. The alignments from :attr:`alignments` are
  used to map residue numbers and chains.

The score is computed with OST's :func:`~ost.mol.alg.LocalDistDiffTest`
function with a single distance threshold of 2 A and an inclusion radius of
8 A. You can use :attr:`lddt_mdl` and :attr:`lddt_ref` to get entities on
which you can call any other lDDT function with any other set of parameters.

:getter: Computed on first use (cached)
:type: :class:`float`

Definition at line 438 of file qsscoring.py.

def mapped_residues ( self )

Mapping of shared residues in :attr:`alignments`.

:getter: Computed on first use (cached)
:type: :class:`dict` *mapped_residues[c1][r1] = r2* with:
   *c1* = Chain name in first entity (= first sequence in aln),
   *r1* = Residue number in first entity,
   *r2* = Residue number in second entity

Definition at line 371 of file qsscoring.py.

def SetSymmetries	(	self,
		symm_1,
		symm_2
	)

Set user-provided symmetry groups.

These groups are restricted to chain names appearing in :attr:`ent_to_cm_1`
and :attr:`ent_to_cm_2` respectively. They are only valid if they cover all
chains and both *symm_1* and *symm_2* have same lengths of symmetry group
tuples. Otherwise trivial symmetry group used (see :attr:`symm_1`).

:param symm_1: Value to set for :attr:`symm_1`.
:param symm_2: Value to set for :attr:`symm_2`.

Definition at line 271 of file qsscoring.py.

def superposition ( self )

Superposition result based on shared CA atoms in :attr:`alignments`.

The superposition can be used to map :attr:`QSscoreEntity.ent` of
:attr:`qs_ent_1` onto the one of :attr:`qs_ent_2`. Use
:func:`ost.geom.Invert` if you need the opposite transformation.

:getter: Computed on first use (cached)
:type: :class:`ost.mol.alg.SuperpositionResult`

Definition at line 418 of file qsscoring.py.

def symm_1 ( self )

Symmetry groups for :attr:`qs_ent_1` used to speed up chain mapping.

This is a list of chain-lists where each chain-list can be used reconstruct
the others via cyclic C or dihedral D symmetry. The first chain-list is used
as a representative symmetry group. For heteromers, the group-members must
contain all different seqres in oligomer.

Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are
symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C).

Properties:

- All symmetry group tuples have the same length (num. of chains)
- All chains in :attr:`ent_to_cm_1` appear (w/o duplicates)
- For heteros: symmetry group tuples have all different chem. groups
- Trivial symmetry group = one tuple with all chains (used if inconsistent
  data provided or if no symmetry is found)
- Either compatible to :attr:`symm_2` or trivial symmetry groups used.
  Compatibility requires same lengths of symmetry group tuples and it must
  be possible to get an overlap (80% of residues covered within 6 A of a
  (chem. mapped) chain) of all chains in representative symmetry groups by
  superposing one pair of chains.

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`tuple` of :class:`str` (chain names)

Definition at line 233 of file qsscoring.py.

def symm_2 ( self )

Symmetry groups for :attr:`qs_ent_2` (see :attr:`symm_1` for details).

Definition at line 265 of file qsscoring.py.

Field Documentation

calpha_only

Definition at line 150 of file qsscoring.py.

max_ca_per_chain_for_cm

Definition at line 151 of file qsscoring.py.

qs_ent_1

Definition at line 132 of file qsscoring.py.

qs_ent_2

Definition at line 136 of file qsscoring.py.

The documentation for this class was generated from the following file:

stage/lib64/python2.7/site-packages/ost/mol/alg/qsscoring.py

QSscorer Class Reference

Public Member Functions

Data Fields

Detailed Description

Member Function Documentation

Field Documentation