This document is for OpenStructure version 1.2, the latest version is 2.7 !

Connectivity

Motivation

Traditionally the connectivity between atoms has not been reliably described in a PDB file. Different programs adopted various ways of finding out if two atoms are connected. One way chosen is to rely on proper naming of the atoms. For example, the backbone atoms of the standard amino acids are named as N, CA, C and O and if atoms with these name appear in the same residue they are shown connected. Another way is to apply additional heuristics to find out if a peptide bond between two consecutive residues is formed. Breaks in the backbone are indicated, e.g., by introducing a discontinuity in the numbering of the residue.

Loader heuristics are great if you are the one that implemented them but are problematic if you are just the user of a software that has them. As time goes on, these heuristics become buried in thousands of lines of code and they are often hard yet impossible to trace back.

Different clients of the framework have different requirements. A visualisation software wants to read in a PDB files as is without making any changes. A script in an automated pipeline, however, does want to either strictly reject files that are incomplete or fill-in missing structural features. All these aspects are implemented in the conop module, separated from the loading of the PDB file, giving clients a fine grained control over the loading process.

The conop module defines a Builder interface, to run connectivity algorithms, that is to connect the atoms with bonds and perform basic clean up of erroneous structures. The clients of the conop module can specify how the Builder should treat unknown amino acids, missing atoms and chemically infeasible bonds.

The high-level interface

ConnectAll()

Uses the current default builder to connect the atoms of the entity, assign torsions, and fill in missing or correct erroneous information such as the chemical class of the residues and the atom’s element.

Parameters:ent (EntityHandle) – A valid entity

A call to ConnectAll() is sufficient to assign residue and atoms properties as well as to connect atoms with bonds.

# Suppose that BuildRawModel is a function that returns a protein structure
# with no atom properties assigned and no bonds formed.
ent=BuildRawModel(...)
print ent.bonds  # will return an empty list
# Call ConnectAll() to assign properties/connect atoms
conop.ConnectAll(ent)
print ent.bonds  # will print a list containing many bonds

For a more fine-grained control, consider using the Builder interface.

The builder interface

The exact behaviour for a builder is implementation-specific. So far, two classes implement the Builder interface: A heuristic and a rule-based builder. The builders mainly differ in the source of their connectivity information. The HeuristicBuilder uses a hard-coded heuristic connectivity table for the 20 standard amino acids as well as nucleotides.For other compounds such as ligands the HeuristicBuilder runs a distance-based connectivity algorithm that connects two atoms if they are closer than a certain threshold. The RuleBasedBuilder uses a connectivity library containing all molecular components present in the PDB files on PDB.org. The library can easily be extended with custom connectivity information, if required. If a compound library is present, the RuleBasedBuilder is enabled by default, otherwise the HeuristicBuilder is used as a fallback.

The following 3 functions give you access to builders known to OpenStructure, and allow you to set the default builder:

GetBuilder()

Get registered builder by name

Parameters:name – The name of the builder
Returns:The builder or None, if the builder doesn’t exist
RegisterBuilder()

Register builder to OpenStructure

Parameters:
  • builder – A instance of Builder
  • name – The name of the builder
SetDefaultBuilder()

Set the builder with the given name as the default. You will have to register a builder with RegisterBuilder() before you will be able to set it as the default.

Raises :RuntimeError when trying to set a builder as the default that has not been registered yet.

The Builder baseclass

class Builder
CompleteAtoms(residue)

add any missing atoms to the residue based on its key, with coordinates set to zero.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
CheckResidueCompleteness(residue)

verify that the given residue has all atoms it is supposed to have based on its key.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
IsResidueComplete(residue)

Check whether the residue has all atoms it is supposed to have. Hydrogen atoms are not required for a residue to be complete.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
IdentifyResidue(residue)

attempt to identify the residue based on its atoms, and return a suggestion for the proper residue key.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
ConnectAtomsOfResidue(residue)

Connects atoms of residue based on residue and atom name. This method does not establish inter-residue bonds. To connect atoms that belong to different residues, use ConnectResidueToPrev(), or ConnectResidueToNext().

Parameters:residue (mol.ResidueHandle) – must be a valid residue
ConnectResidueToPrev(residue, prev)

Connect atoms of residue to previous. The order of the parameters is important. In case of a polypeptide chain, the residues are thought to be ordered from N- to C- terminus.

Parameters:
  • residue (mol.ResidueHandle) – must be a valid residue
  • prev (mol.ResidueHandle) – valid or invalid residue
DoesPeptideBondExist(n, c)

Check if peptide bond should be formed between the n and c atom. This method is called by ConnectResidueWithNext() after making sure that both residues participating in the peptide bond are peptide linking components.

By default, IsBondFeasible() is used to check whether the two atoms form a peptide bond.

Parameters:
  • n (mol.AtomHandle) – backbone nitrogen atom (IUPAC name N). Must be valid.
  • c (mol.AtomHandle) – backbone C-atom (IUPAC name C). Must be valid.
IsBondFeasible(atom_a, atom_b)

Overloadable hook to check if bond between to atoms is feasible. The default implementation uses a distance-based check to check if the two atoms should be connected. The atoms are connected if they are in the range of 0.8 to 1.2 times their van-der-WAALS radius.

Parameters:
  • atom_a – a valid atom
  • atom_a – a valid atom
GuessAtomElement(atom_name, hetatm)

guess element of atom based on name and hetatm flag

Parameters:
  • atom_name (string) – IUPAC atom name, e.g. CA, CB or N.
  • hetatm (bool) – Whether the atom is a hetatm or not
AssignBackboneTorsionsToResidue(residue)

For peptide-linking residues, residues, assigns phi, psi and omega torsions to amino acid.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
GuessChemClass(residue)

Guesses the chemical class of the residue based on its atom and connectivity.

So far, the method only guesses whether the residue is a peptide. A residue is a peptide if all the backbone atoms N,CA,C,O are present, have the right element and are in a suitable orientation to form bonds.

The RuleBasedBuilder class

class RuleBasedBuilder(compound_lib)
Parameters:compound_lib (CompoundLib) – The compound library

The RuleBasedBuilder implements the Builder interface. Refer to its documentation for a basic description of the methods.

CheckResidueCompleteness(residue)

By using the description of the chemical compound, the completeness of the residue is verified. The method distinguishes between required atoms and atoms that are optional, like OXT that is only present, if not peptide bond is formed. Whenever an unknown atom is encountered, OnUnknownAtom() is invoked. Subclasses of the RuleBasedBuilder may implement some additional logic to deal with unknown atom. Likewise, whenever a required atom is missing, OnMissingAtom() is invoked. Hydrogen atoms are not considered as required by default.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
IdentifyResidue(residue)

Looks-up the residue in the database of chemical compounds and returns the name of the residue or “UNK” if the residue has not been found in the library.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
OnUnknownAtom(atom)

Invoked whenever an unkknown atom has been encountered during a residue completeness check.

The default implementation guesses the atom properties based on the name and returns false, meaning that it should be treated as an unknown atom.

Custom implementations of this method may delete the atom, or modify it.

Parameters:atom (mol.AtomHandle) – the unknown atom
OnMissingAtom(atom)

Invoked whenever an atom is missing. It is up to the overloaded method to deal with the missing atom, either by ignoring it or by inserting a dummy atom.

Parameters:atom (string) – The missing atom’s name
GetUnknownAtoms(residue)

Returns the unknown atoms of this residue, that is all atoms that are not part of the compound lib definition.

Return type:list of AtomHandle instances

Changing the default builder

The default builder can be specified with SetDefaultBuilder(). Before being able to set a builder, it needs to be registered with RegisterBuilder(). By default, there is always a builder called “HEURISTIC” registered. If, for some reason your are currently using the RuleBasedBuilder and you would like to switch to that builder, call

conop.SetDefaultBuilder("HEURISTIC")

Search

Enter search terms or a module, class or function name.

Contents

Documentation is available for the following OpenStructure versions:

dev / 2.7 / 2.6 / 2.5 / 2.4 / 2.3.1 / 2.3 / 2.2 / 2.1 / 2.0 / 1.9 / 1.8 / 1.7.1 / 1.7 / 1.6 / 1.5 / 1.4 / 1.3 / (Currently viewing 1.2) / 1.11 / 1.10 / 1.1

This documentation is still under heavy development!
If something is missing or if you need the C++ API description in doxygen style, check our old documentation for further information.