Connectivity
Motivation
Traditionally the connectivity between atoms has not been reliably described in
a PDB file. Different programs adopted various ways of finding out if two atoms
are connected. One way chosen is to rely on proper naming of the atoms. For
example, the backbone atoms of the standard amino acids are named as N, CA, C
and O and if atoms with these name appear in the same residue they are shown
connected. Another way is to apply additional heuristics to find out if a
peptide bond between two consecutive residues is formed. Breaks in the backbone
are indicated, e.g., by introducing a discontinuity in the numbering of the residue.
Loader heuristics are great if you are the one that implemented them but are
problematic if you are just the user of a software that has them. As time goes
on, these heuristics become buried in thousands of lines of code and they are
often hard yet impossible to trace back.
Different clients of the framework have different requirements. A visualisation
software wants to read in a PDB files as is without making any changes. A
script in an automated pipeline, however, does want to either strictly reject
files that are incomplete or fill-in missing structural features. All these
aspects are implemented in the conop module, separated from the loading of the
PDB file, giving clients a fine grained control over the loading process.
The conop module defines a Builder interface, to run connectivity
algorithms, that is to connect the atoms with bonds and perform basic clean up
of erroneous structures. The clients of the conop module can specify how the
Builder should treat unknown amino acids, missing atoms and chemically
infeasible bonds.
The high-level interface
-
ConnectAll()
Uses the current default builder to connect the atoms of the entity, assign
torsions, and fill in missing or correct erroneous information such as the
chemical class of the residues and the atom’s element.
A call to ConnectAll() is sufficient to assign residue and atoms
properties as well as to connect atoms with bonds.
# Suppose that BuildRawModel is a function that returns a protein structure
# with no atom properties assigned and no bonds formed.
ent=BuildRawModel(...)
print ent.bonds # will return an empty list
# Call ConnectAll() to assign properties/connect atoms
conop.ConnectAll(ent)
print ent.bonds # will print a list containing many bonds
For a more fine-grained control, consider using the Builder interface.
The builder interface
The exact behaviour for a builder is implementation-specific. So far, two
classes implement the Builder interface: A heuristic and a rule-based builder. The builders mainly differ in the source of their connectivity information. The
HeuristicBuilder uses a hard-coded heuristic connectivity table for the 20
standard amino acids as well as nucleotides.For other compounds such as ligands
the HeuristicBuilder runs a distance-based connectivity algorithm that connects
two atoms if they are closer than a certain threshold. The RuleBasedBuilder
uses a connectivity library containing all molecular components present in the
PDB files on PDB.org. The library can easily be extended with custom
connectivity information, if required. If a compound library is present, the RuleBasedBuilder is enabled by default, otherwise the HeuristicBuilder is used as a fallback.
The following 3 functions give you access to builders known to OpenStructure,
and allow you to set the default builder:
-
GetBuilder()
Get registered builder by name
Parameters: | name – The name of the builder |
Returns: | The builder or None, if the builder doesn’t exist |
-
RegisterBuilder()
Register builder to OpenStructure
Parameters: |
- builder – A instance of Builder
- name – The name of the builder
|
-
SetDefaultBuilder()
Set the builder with the given name as the default. You will have to register
a builder with RegisterBuilder() before you will be able to set it as
the default.
Raises : | RuntimeError when trying to set a builder as the default that
has not been registered yet. |
The Builder baseclass
-
class Builder
-
CompleteAtoms(residue)
add any missing atoms to the residue based on its key, with coordinates set
to zero.
Parameters: | residue (mol.ResidueHandle) – must be a valid residue |
-
CheckResidueCompleteness(residue)
verify that the given residue has all atoms it is supposed to have based on
its key.
Parameters: | residue (mol.ResidueHandle) – must be a valid residue |
-
IsResidueComplete(residue)
Check whether the residue has all atoms it is supposed to have. Hydrogen
atoms are not required for a residue to be complete.
Parameters: | residue (mol.ResidueHandle) – must be a valid residue |
-
IdentifyResidue(residue)
attempt to identify the residue based on its atoms, and return a suggestion
for the proper residue key.
Parameters: | residue (mol.ResidueHandle) – must be a valid residue |
-
ConnectAtomsOfResidue(residue)
Connects atoms of residue based on residue and atom name. This method does
not establish inter-residue bonds. To connect atoms that belong to
different residues, use ConnectResidueToPrev(), or
ConnectResidueToNext().
Parameters: | residue (mol.ResidueHandle) – must be a valid residue |
-
ConnectResidueToPrev(residue, prev)
Connect atoms of residue to previous. The order of the parameters is
important. In case of a polypeptide chain, the residues are thought to be
ordered from N- to C- terminus.
Parameters: |
- residue (mol.ResidueHandle) – must be a valid residue
- prev (mol.ResidueHandle) – valid or invalid residue
|
-
DoesPeptideBondExist(n, c)
Check if peptide bond should be formed between the n and c atom. This
method is called by ConnectResidueWithNext() after making sure that
both residues participating in the peptide bond are peptide linking
components.
By default, IsBondFeasible() is used to check whether the two atoms
form a peptide bond.
Parameters: |
- n (mol.AtomHandle) – backbone nitrogen atom (IUPAC name N). Must be valid.
- c (mol.AtomHandle) – backbone C-atom (IUPAC name C). Must be valid.
|
-
IsBondFeasible(atom_a, atom_b)
Overloadable hook to check if bond between to atoms is feasible. The
default implementation uses a distance-based check to check if the
two atoms should be connected. The atoms are connected if they are in
the range of 0.8 to 1.2 times their van-der-WAALS radius.
Parameters: |
- atom_a – a valid atom
- atom_a – a valid atom
|
-
GuessAtomElement(atom_name, hetatm)
guess element of atom based on name and hetatm flag
Parameters: |
- atom_name (string) – IUPAC atom name, e.g. CA, CB or N.
- hetatm (bool) – Whether the atom is a hetatm or not
|
-
AssignBackboneTorsionsToResidue(residue)
For peptide-linking residues,
residues, assigns phi, psi and omega torsions to amino acid.
Parameters: | residue (mol.ResidueHandle) – must be a valid residue |
-
GuessChemClass(residue)
Guesses the chemical class of the residue based on its atom and
connectivity.
So far, the method only guesses whether the residue is a peptide. A residue
is a peptide if all the backbone atoms N,CA,C,O are present, have the right
element and are in a suitable orientation to form bonds.
The RuleBasedBuilder class
-
class RuleBasedBuilder(compound_lib)
Parameters: | compound_lib (CompoundLib) – The compound library |
The RuleBasedBuilder implements the Builder interface.
Refer to its documentation for a basic description of the methods.
-
CheckResidueCompleteness(residue)
By using the description of the chemical compound, the completeness of
the residue is verified. The method distinguishes between required atoms
and atoms that are optional, like OXT that is only present, if not
peptide bond is formed. Whenever an unknown atom is encountered,
OnUnknownAtom() is invoked. Subclasses of the
RuleBasedBuilder may implement some additional logic to deal with
unknown atom. Likewise, whenever a required atom is missing,
OnMissingAtom() is invoked. Hydrogen atoms are not considered as
required by default.
Parameters: | residue (mol.ResidueHandle) – must be a valid residue |
-
IdentifyResidue(residue)
Looks-up the residue in the database of chemical compounds and returns
the name of the residue or “UNK” if the residue has not been found in the
library.
Parameters: | residue (mol.ResidueHandle) – must be a valid residue |
-
OnUnknownAtom(atom)
Invoked whenever an unkknown atom has been encountered during a residue
completeness check.
The default implementation guesses the atom properties based on the name
and returns false, meaning that it should be treated as an unknown atom.
Custom implementations of this method may delete the atom, or modify it.
Parameters: | atom (mol.AtomHandle) – the unknown atom |
-
OnMissingAtom(atom)
Invoked whenever an atom is missing. It is up to the overloaded method
to deal with the missing atom, either by ignoring it or by inserting a
dummy atom.
Parameters: | atom (string) – The missing atom’s name |
-
GetUnknownAtoms(residue)
Returns the unknown atoms of this residue, that is all atoms that
are not part of the compound lib definition.
Changing the default builder
The default builder can be specified with SetDefaultBuilder(). Before being
able to set a builder, it needs to be registered with RegisterBuilder().
By default, there is always a builder called “HEURISTIC” registered. If, for some
reason your are currently using the RuleBasedBuilder and you would like
to switch to that builder, call
conop.SetDefaultBuilder("HEURISTIC")
|
Contents
Search
Enter search terms or a module, class or function name.
Previous topic
Functions and classes for standard amino acids
Next topic
The compound library
You are here
|