Object to compute QS scores.
Simple usage without any precomputed contacts, symmetries and mappings:
.. code-block:: python
import ost
from ost.mol.alg import qsscoring
# load two biounits to compare
ent_full = ost.io.LoadPDB('3ia3', remote=True)
ent_1 = ent_full.Select('cname=A,D')
ent_2 = ent_full.Select('cname=B,C')
# get score
ost.PushVerbosityLevel(3)
try:
qs_scorer = qsscoring.QSscorer(ent_1, ent_2)
ost.LogScript('QSscore:', str(qs_scorer.global_score))
ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping))
# commonly you want the QS global score as output
qs_score = qs_scorer.global_score
except qsscoring.QSscoreError as ex:
# default handling: report failure and set score to 0
ost.LogError('QSscore failed:', str(ex))
qs_score = 0
For maximal performance when computing QS scores of the same entity with many
others, it is advisable to construct and reuse :class:`QSscoreEntity` objects.
Any known / precomputed information can be filled into the appropriate
attribute here (no checks done!). Otherwise most quantities are computed on
first access and cached (lazy evaluation). Setters are provided to set values
with extra checks (e.g. :func:`SetSymmetries`).
All necessary seq. alignments are done by global BLOSUM62-based alignment. A
multiple sequence alignment is performed with ClustalW unless
:attr:`chain_mapping` is provided manually. You will need to have an
executable ``clustalw`` or ``clustalw2`` in your ``PATH`` or you must set
:attr:`clustalw_bin` accordingly. Otherwise an exception
(:class:`ost.settings.FileNotFound`) is thrown.
Formulas for QS scores:
::
- QS_best = weighted_scores / (weight_sum + weight_extra_mapped)
- QS_global = weighted_scores / (weight_sum + weight_extra_all)
-> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared
-> weight_sum = sum(w(min(d1,d2))) for shared
-> weight_extra_mapped = sum(w(d)) for all mapped but non-shared
-> weight_extra_all = sum(w(d)) for all non-shared
-> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else
In the formulas above:
* "d": CA/CB-CA/CB distance of an "inter-chain contact" ("d1", "d2" for
"shared" contacts).
* "mapped": we could map chains of two structures and align residues in
:attr:`alignments`.
* "shared": pairs of residues which are "mapped" and have
"inter-chain contact" in both structures.
* "inter-chain contact": CB-CB pairs (CA for GLY) with distance <= 12 A
(fallback to CA-CA if :attr:`calpha_only` is True).
* "w(d)": weighting function (prob. of 2 res. to interact given CB distance)
from `Xu et al. 2009 <https://dx.doi.org/10.1016%2Fj.jmb.2008.06.002>`_.
:param ent_1: First structure to be scored.
:type ent_1: :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
:class:`~ost.mol.EntityView`
:param ent_2: Second structure to be scored.
:type ent_2: :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
:class:`~ost.mol.EntityView`
:param res_num_alignment: Sets :attr:`res_num_alignment`
:raises: :class:`QSscoreError` if input structures are invalid or are monomers
or have issues that make it impossible for a QS score to be computed.
.. attribute:: qs_ent_1
:class:`QSscoreEntity` object for *ent_1* given at construction.
If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
set it to 'pdb_1' using :func:`~QSscoreEntity.SetName`.
.. attribute:: qs_ent_2
:class:`QSscoreEntity` object for *ent_2* given at construction.
If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
set it to 'pdb_2' using :func:`~QSscoreEntity.SetName`.
.. attribute:: calpha_only
True if any of the two structures is CA-only (after cleanup).
:type: :class:`bool`
.. attribute:: max_ca_per_chain_for_cm
Maximal number of CA atoms to use in each chain to determine chain mappings.
Setting this to -1 disables the limit. Limiting it speeds up determination
of symmetries and chain mappings. By default it is set to 100.
:type: :class:`int`
.. attribute:: max_mappings_extensive
Maximal number of chain mappings to test for 'extensive'
:attr:`chain_mapping_scheme`. The extensive chain mapping search must in the
worst case check O(N^2) * O(N!) possible mappings for complexes with N
chains. Two octamers without symmetry would require 322560 mappings to be
checked. To limit computations, a :class:`QSscoreError` is thrown if we try
more than the maximal number of chain mappings.
The value must be set before the first use of :attr:`chain_mapping`.
By default it is set to 100000.
:type: :class:`int`
.. attribute:: res_num_alignment
Forces each alignment in :attr:`alignments` to be based on residue numbers
instead of using a global BLOSUM62-based alignment.
:type: :class:`bool`
Definition at line 41 of file qsscoring.py.
def chain_mapping |
( |
|
self | ) |
|
Mapping from :attr:`ent_to_cm_1` to :attr:`ent_to_cm_2`.
Properties:
- Mapping is between chains of same chem. group (see :attr:`chem_mapping`)
- Each chain can appear only once in mapping
- All chains of complex with less chains are mapped
- Symmetry (:attr:`symm_1`, :attr:`symm_2`) is taken into account
Details on algorithms used to find mapping:
- We try all pairs of chem. mapped chains within symmetry group and get
superpose-transformation for them
- First option: check for "sufficient overlap" of other chain-pairs
- For each chain-pair defined above: apply superposition to full oligomer
and map chains based on structural overlap
- Structural overlap = X% of residues in second oligomer covered within Y
Angstrom of a (chem. mapped) chain in first oligomer. We successively
try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in
mapping (warning shown for most permissive one).
- If multiple possible mappings are found, we choose the one which leads
to the lowest multi-chain-RMSD given the superposition
- Fallback option: try all mappings to find minimal multi-chain-RMSD
(warning shown)
- For each chain-pair defined above: apply superposition, try all (!)
possible chain mappings (within symmetry group) and keep mapping with
lowest multi-chain-RMSD
- Repeat procedure above to resolve symmetry. Within the symmetry group we
can use the chain mapping computed before and we just need to find which
symmetry group in first oligomer maps to which in the second one. We
again try all possible combinations...
- Limitations:
- Trying all possible mappings is a combinatorial nightmare (factorial).
We throw an exception if too many combinations (e.g. octomer vs
octomer with no usable symmetry)
- The mapping is forced: the "best" mapping will be chosen independently
of how badly they fit in terms of multi-chain-RMSD
- As a result, such a forced mapping can lead to a large range of
resulting QS scores. An extreme example was observed between 1on3.1
and 3u9r.1, where :attr:`global_score` can range from 0.12 to 0.43
for mappings with very similar multi-chain-RMSD.
:getter: Computed on first use (cached)
:type: :class:`dict` with key / value = :class:`str` (chain names, key
for :attr:`ent_to_cm_1`, value for :attr:`ent_to_cm_2`)
:raises: :class:`QSscoreError` if there are too many combinations to check
to find a chain mapping (see :attr:`max_mappings_extensive`).
Definition at line 326 of file qsscoring.py.
Symmetry groups for :attr:`qs_ent_1` used to speed up chain mapping.
This is a list of chain-lists where each chain-list can be used reconstruct
the others via cyclic C or dihedral D symmetry. The first chain-list is used
as a representative symmetry group. For heteromers, the group-members must
contain all different seqres in oligomer.
Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are
symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C).
Properties:
- All symmetry group tuples have the same length (num. of chains)
- All chains in :attr:`ent_to_cm_1` appear (w/o duplicates)
- For heteros: symmetry group tuples have all different chem. groups
- Trivial symmetry group = one tuple with all chains (used if inconsistent
data provided or if no symmetry is found)
- Either compatible to :attr:`symm_2` or trivial symmetry groups used.
Compatibility requires same lengths of symmetry group tuples and it must
be possible to get an overlap (80% of residues covered within 6 A of a
(chem. mapped) chain) of all chains in representative symmetry groups by
superposing one pair of chains.
:getter: Computed on first use (cached)
:type: :class:`list` of :class:`tuple` of :class:`str` (chain names)
Definition at line 268 of file qsscoring.py.