This document is for OpenStructure version 1.2, the latest version is 2.7 !

mmCIF File Format

The mmCIF file format is an alternate container for structural entities, also provided by the PDB. Here we describe how to load those files and how to deal with information provided above the common PDB format (MMCifInfo, MMCifInfoCitation, MMCifInfoTransOp, MMCifInfoBioUnit, MMCifInfoStructDetails).

Loading mmCIF Files

LoadMMCIF(filename, restrict_chains='', fault_tolerant=None, calpha_only=None, profile='DEFAULT', remote=False, strict_hydrogens=None, seqres=False, info=False)

Load MMCIF file from disk and return one or more entities. Several options allow to customize the exact behaviour of the MMCIF import. For more information on these options, see IO Profiles for entity importer.

Residues are flagged as ligand if they are mentioned in a HET record.

Parameters:
  • restrict_chains – If not an empty string, only chains listed in the string will be imported.
  • fault_tolerant – Enable/disable fault-tolerant import. If set, overrides the value of IOProfile.fault_tolerant.
  • remote – If set to True, the method tries to load the pdb from the remote pdb repository www.pdb.org. The filename is then interpreted as the pdb id.
  • strict_hydrogens – If set, overrides the value of IOProfile.strict_hydrogens.
  • seqres – Whether to read SEQRES records. If set to True, the loaded entity and seqres entry will be returned as second item.
  • info – Whether to return an info container with the other output. Returns a MMCifInfo object as last item.
Return type:

EntityHandle.

Raises :

IOException if the import fails due to an erroneous

or inexistent file

Categories Available

The following categories of a mmCIF file are considered by the reader:

Info Classes

Information from mmCIF files which goes beyond structural data, is kept in a special container, the MMCifInfo class. Here is a detailed description of the annotation available.

class MMCifInfo

This is the container for all bits of non-molecular data pulled from a mmCIF file.

citations

Stores a list of citations (MMCifInfoCitation).

Also available as GetCitations().

biounits

Stores a list of biounits (MMCifInfoBioUnit).

Also available as GetBioUnits().

method

Stores the experimental method used to create the structure.

Also available as GetMethod(). May also be modified by SetMethod().

resolution

Stores the resolution of the crystal structure.

Also available as GetResolution(). May also be modified by SetResolution().

operations

Stores the operations needed to transform a crystal structure into a bio unit.

Also available as GetOperations(). May also be modified by AddOperation().

struct_details

Stores details about the structure in a MMCifInfoStructDetails object.

Also available as GetStructDetails(). May also be modified by SetStructDetails().

struct_refs

Lists all links to external databases in the mmCIF file.

AddCitation(citation)

Add a citation to the citation list of an info object.

Parameters:citation (MMCifInfoCitation) – Citation to be added.
AddAuthorsToCitation(id, authors)

Adds a list of authors to a specific citation.

Parameters:
  • id (str) – identifier of the citation
  • authors (StringList) – List of authors.
GetCitations()

See citations

AddBioUnit(biounit)

Add a bio unit to the bio unit list of an info object.

Parameters:biounit (MMCifInfoBioUnit) – Bio unit to be added.
GetBioUnits()

See biounits

SetMethod(method)

See method

GetMethod()

See method

SetResolution(resolution)

See resolution

GetResolution()

See resolution

AddOperation(operation)

See operations

GetOperations()

See operations

SetStructDetails(details)

See struct_details

GetStructDetails()
AddMMCifPDBChainTr(cif_chain_id, pdb_chain_id)

Set up a translation for a certain mmCIF chain name to the traditional PDB chain name.

Parameters:
  • cif_chain_id (str) – atom_site.label_asym_id
  • pdb_chain_id (str) – atom_site.auth_asym_id
GetMMCifPDBChainTr(cif_chain_id)

Get the translation of a certain mmCIF chain name to the traditional PDB chain name.

Parameters:cif_chain_id (str) – atom_site.label_asym_id
Returns:atom_site.auth_asym_id as str
AddPDBCMMCifhainTr(pdb_chain_id, cif_chain_id)

Set up a translation for a certain PDB chain name to the mmCIF chain name.

Parameters:
  • pdb_chain_id (str) – atom_site.label_asym_id
  • cif_chain_id (str) – atom_site.auth_asym_id
GetPDBMMCifChainTr(pdb_chain_id)

Get the translation of a certain PDB chain name to the mmCIF chain name.

Parameters:pdb_chain_id (str) – atom_site.auth_asym_id
Returns:atom_site.label_asym_id as str
class MMCifInfoCitation

This stores citation information from an input file.

id

Stores an internal identifier for a citation. If not provided, resembles an empty string.

Also available as GetID(). May also be modified by SetID().

cas

Stores a Chemical Abstract Service identifier, if available. If not provided, resembles an empty string.

Also available as GetCAS(). May also be modified by SetCas().

isbn

Stores the ISBN code, presumably for cited books. If not provided, resembles an empty string.

Also available as GetISBN(). May also be modified by SetISBN().

published_in

Stores the book or journal title of a publication. Should take the full title, no abbreviations. If not provided, resembles an empty string.

Also available as GetPublishedIn(). May also be modified by SetPublishedIn().

volume

Supposed to store volume information for journals. Since the volume number is not always a simple integer, it is stored as a string. If not provided, resembles an empty string.

Also available as GetVolume(). May also be modified by SetVolume().

page_first

Stores the first page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.

Also available as GetPageFirst(). May also be modified by SetPageFirst().

page_last

Stores the last page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.

Also available as GetPageLast(). May also be modified by SetPageLast().

doi

Stores the Document Object Identifier as used by doi.org for a cited document. If not provided, resembles a empty strings.

Also available as GetDOI(). May also be modified by SetDOI().

pubmed

Stores the PubMed accession number. If not provided, is set to 0.

Also available as GetPubMed(). May also be modified by SetPubmed().

year

Stores the publication year. If not provided, is set to 0.

Also available as GetYear(). May also be modified by SetYear().

title

Stores a title. If not provided, is set to an empty string.

Also available as GetTitle(). May also be modified by SetTitle().

authors

Stores a StringList of authors.

Also available as GetAuthorList(). May also be modified by SetAuthorList().

GetCAS()

See cas

SetCAS(cas)

See cas

GetISBN()

See isbn

SetISBN(isbn)

See isbn

GetPublishedIn()

See published_in

SetPublishedIn(title)

See published_in

GetVolume()

See volume

SetVolume(volume)

See volume

GetPageFirst()

See page_first

SetPageFirst(first)

See page_first

GetPageLast()

See page_last

SetPageLast(last)

See page_last

GetDOI()

See doi

SetDOI(doi)

See doi

GetPubMed()

See pubmed

SetPubMed(no)

See pubmed

GetYear()

See year

SetYear(year)

See year

GetTitle()

See title

SetTitle(title)

See title

GetAuthorList()

See authors

SetAuthorList(list)

See authors

class MMCifInfoTransOp

This stores operations needed to transform an entity into a bio unit.

id

A unique identifier. If not provided, resembles an empty string.

Also available as GetID(). May also be modified by SetID().

type

Describes the operation. If not provided, resembles an empty string.

Also available as GetType(). May also be modified by SetType().

translation

The translational vector. Also available as GetVector(). May also be

modified by SetVector().

rotation

The rotational matrix. Also available as GetMatrix(). May also be

modified by SetMatrix().

GetID()

See id

SetID(id)

See id

GetType()

See type

SetType(type)

See type

GetVector()

See translation

SetVector(x, y, z)

See translation

GetMatrix()

See rotation

SetMatrix(i00, i01, i02, i10, i11, i12, i20, i21, i22)

See rotation

class MMCifInfoBioUnit

This stores information how a structure is to be assembled to form the bio unit.

id

The id of a bio unit as given by the original mmCIF file.

Also available as GetID(). May also be modified by SetID().

Type :str
details

Special aspects of the biological assembly. If not provided, resembles an empty string.

Also available as GetDetails(). May also be modified by SetDetails().

chains

Chains involved in this bio unit. If not provided, resembles an empty list.

Also available as GetChainList(). May also be modified by AddChain().

operations

Translations and rotations needed to create the bio unit. Filled with objects of class MMCifInfoTransOp.

Also available as GetOperations(). May be modified by AddOperations()

GetID()

See id

SetID(id)

See id

GetDetails()

See details

SetDetails(details)

See details

GetChainList()

See chains

AddChain(chain name)

See chains

GetOperations()

See operations

AddOperations(list of operations)

See operations

PDBize(asu, seqres=None, min_polymer_size=10)

Returns the biological assembly (bio unit) for an entity. The new entity created is well suited to be saved as a PDB file. Therefore the function tries to meet the requirements of single-character chain names. The following measures are taken.

  • All ligands are put into one chain (_)
  • Water is put into one chain (-)
  • Each polymer gets its own chain, named A-Z 0-9 a-z.
  • The description of non-polymer chains will be put into a generic string property called description on the residue level.
  • ligands which resemble a polymer but have less than min_polymer_size residues are assigned the same numeric residue number. The residues are distinguished by insertion code.

Since this function is at the moment mainly used to create biounits from mmCIF files to be saved as PDBs, the function assumes that the ChainType properties are set correctly. ost.conop.ConnectAll() is used to derive connectivity.

Parameters:
  • asu (EntityHandle>) – Asymmetric unit to work on. Should be created from a mmCIF file.
  • seqres (:class:’~ost.seq.SequenceList’) – If set to a valid sequence list, the length of the seqres records will be used to determine if a certain chain has the minimally required length.
  • min_polymer_size (int) – The minimal number of residues a polymer needs to get its own chain. Everything below that number will be sorted into the ligand chain.
class MMCifInfoStructDetails

Holds details about the structure.

entry_id

Identifier for a curtain data block. If not provided, resembles an empty string.

Also available as GetEntryID(). May also be modified by SetEntryID().

title

Set a title for the structure.

Also available as GetTitle(). May also be modified by SetTitle().

casp_flag

Tells whether this structure was target in some competition.

Also available as GetCASPFlag(). May also be modified by SetCASPFlag().

descriptor

Descriptor for an NDB structure or the unstructured content of a PDB COMPND record.

Also available as GetDescriptor(). May also be modified by SetDescriptor().

mass

Molecular mass of a molecule.

Also available as GetMass(). May also be modified by SetMass().

mass_method

Method used to determine the molecular weight.

Also available as GetMassMethod(). May also be modified by SetMassMethod().

model_details

Details about how the structure was determined.

Also available as GetModelDetails(). May also be modified by SetModelDetails().

model_type_details

Details about how the type of the structure.

Also available as GetModelTypeDetails(). May also be modified by SetModelTypeDetails().

GetEntryID()

See entry_id

SetEntryID(id)

See entry_id

GetTitle()

See title

SetTitle(title)

See title

GetCASPFlag()

See casp_flag

SetCASPFlag(flag)

See casp_flag

GetDescriptor()

See descriptor

SetDescriptor(descriptor)

See descriptor

GetMass()

See mass

SetMass(mass)

See mass

GetMassMethod()

See mass_method

SetMassMethod(method)

See mass_method

GetModelDetails()

See model_details

SetModelDetails(details)

See model_details

GetModelTypeDetails()

See model_type_details

SetModelTypeDetails(details)

See model_type_details

class MMCifInfoObsolete

Holds details on obsolete/ superseded structures.

date

When was the entry replaced?

Also available as GetDate(). May also be modified by SetDate().

id

Type of change. Either Obsolete or Supersede. Returns a string starting upper case. Has to be set via OBSLTE or SPRSDE.

Also available as GetID(). May also be modified by SetID().

pdb_id

ID of the replacing entry.

Also available as GetPDBID(). May also be modified by SetPDBID().

replace_pdb_id

ID of the replaced entry.

Also available as GetReplacedPDBID(). May also be modified by SetReplacedPDBID().

GetDate()

See date

SetDate(date)

See date

GetID()

See id

SetID(id)

See id

GetPDBID()

See pdb_id

SetPDBID(flag)

See pdb_id

GetReplacedPDBID()

See replace_pdb_id

SetReplacedPDBID(descriptor)

See replace_pdb_id

class MMCifInfoStructRef

Holds the information of the struct_ref category. The category describes the link of polymers in the mmCIF file to sequences stored in external databases such as uniprot. The related categories struct_ref_seq and struct_ref_seq_dif also list differences between the sequences of the deposited structure and the sequences in the database. Two prominent examples of such differences are point mutations and/or expression tags.

db_name

Name of the external database, for example UNP for uniprot.

Type :str
db_id

Name of the reference sequence in the database pointed to by db_name.

Type :str
db_access

Alternative accession code for the sequence in the database pointed to by db_name.

Type :str
GetAlignedSeq(name)

Returns the aligned sequence for the given name, None if the sequence does not exist.

aligned_seqs

List of aligned sequences (all entries of the struct_ref_seq category mapping to this struct_ref).

class MMCifInfoStructRefSeq

An aligned range of residues between a sequence in a reference database and the deposited sequence.

align_id

Uniquely identifies every struct_ref_seq item in the mmCIF file.

Type :str
seq_begin
seq_end
The starting point (1-based) and end point of the aligned range in the deposited sequence, respectively.
Type :int
db_begin
db_end
The starting point (1-based) and end point of the aligned range in the database sequence, respectively.
Type :int
difs

List of differences between the deposited sequence and the sequence in the database.

chain_name

Chain name of the polymer in the mmCIF file.

class MMCifInfoStructRefSeqDif

A particular difference between the deposited sequence and the sequence in the database.

rnum

The residue number (1-based) of the residue in the deposited sequence

Type :int
details

A textual description of the difference, e.g. point mutation, expression tag, purification artifact.

Type :str

Search

Enter search terms or a module, class or function name.

Contents

Documentation is available for the following OpenStructure versions:

dev / 2.7 / 2.6 / 2.5 / 2.4 / 2.3.1 / 2.3 / 2.2 / 2.1 / 2.0 / 1.9 / 1.8 / 1.7.1 / 1.7 / 1.6 / 1.5 / 1.4 / 1.3 / (Currently viewing 1.2) / 1.11 / 1.10 / 1.1

This documentation is still under heavy development!
If something is missing or if you need the C++ API description in doxygen style, check our old documentation for further information.