You are reading the documentation for the development version of OpenStructure. Jump to the documentation of the stable versions: 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.7.1 1.8 1.9 1.10 1.11 2.0 2.1 2.2

# mmCIF File Format¶

The mmCIF file format is a container for structural entities provided by the PDB. Here we describe how to load those files and how to deal with information provided above the legacy PDB format (MMCifInfo, MMCifInfoCitation, MMCifInfoTransOp, MMCifInfoBioUnit, MMCifInfoStructDetails, MMCifInfoObsolete, MMCifInfoStructRef, MMCifInfoStructRefSeq, MMCifInfoStructRefSeqDif, MMCifInfoRevisions, MMCifInfoEntityBranchLink).

LoadMMCIF(filename, fault_tolerant=None, calpha_only=None, profile='DEFAULT', remote=False, seqres=False, info=False)

Load a mmCIF file and return one or more entities. Several options allow to customize the exact behaviour of the mmCIF import. For more information on these options, see IO Profiles for entity importer.

Residues are flagged as ligand if they are mentioned in a HET record.

Parameters: fault_tolerant – Enable/disable fault-tolerant import. If set, overrides the value of IOProfile.fault_tolerant. remote – If set to True, the method tries to load the pdb from the remote pdb repository www.pdb.org. The filename is then interpreted as the pdb id. seqres – Whether to read SEQRES records. If True, a SequenceList object is returned as the second item. The sequences in the list are named according to the mmCIF chain name. This feature requires a default compound library to be defined and accessible via GetDefaultLib() or an empty list is returned. info – Whether to return an info container with the other output. If True, a MMCifInfo object is returned as last item. EntityHandle (or tuple if seqres or info are True). IOException if the import fails due to an erroneous or non-existent file.

## Categories Available¶

The following categories of a mmCIF file are considered by the reader:

Notes:

## Info Classes¶

Information from mmCIF files that goes beyond structural data, is kept in a special container, the MMCifInfo class. Here is a detailed description of the annotation available.

class MMCifInfo

This is the container for all bits of non-molecular data pulled from a mmCIF file.

citations

Stores a list of citations (MMCifInfoCitation).

Also available as GetCitations().

biounits

Stores a list of biounits (MMCifInfoBioUnit).

Also available as GetBioUnits().

method

Stores the experimental method used to create the structure.

Also available as GetMethod(). May also be modified by SetMethod().

resolution

Stores the resolution of the crystal structure. Set to 0 if no value in loaded mmCIF file.

Also available as GetResolution(). May also be modified by SetResolution().

r_free

Stores the R-free value of the crystal structure. Set to 0 if no value in loaded mmCIF file.

Also available as GetRFree(). May also be modified by SetRFree().

r_work

Stores the R-work value of the crystal structure. Set to 0 if no value in loaded mmCIF file.

Also available as GetRWork(). May also be modified by SetRWork().

operations

Stores the operations needed to transform a crystal structure into a bio unit.

Also available as GetOperations(). May also be modified by AddOperation().

struct_details

Stores details about the structure in a MMCifInfoStructDetails object.

Also available as GetStructDetails(). May also be modified by SetStructDetails().

struct_refs

Lists all links to external databases in the mmCIF file.

revisions

Stores a simple history of a PDB entry.

Also available as GetRevisions(). May be extended by AddRevision().

obsolete

Stores information about obsoleted / superseded entries.

Also available as GetObsoleteInfo(). May also be modified by SetObsoleteInfo().

AddCitation(citation)

Add a citation to the citation list of an info object.

Parameters: citation (MMCifInfoCitation) – Citation to be added.
AddAuthorsToCitation(id, authors)

Adds a list of authors to a specific citation.

Parameters: id (str) – Identifier of the citation. authors (StringList) – List of authors.
GetCitations()
AddBioUnit(biounit)

Add a bio unit to the bio unit list of an info object. If the id of biounit already exists in the set of assemblies, both will be merged. This means that chain and operations lists will be concatenated and the interval lists (operationsintervalls, chainintervalls) will be updated.

Parameters: biounit (MMCifInfoBioUnit) – Bio unit to be added.
GetBioUnits()
SetMethod(method)
GetMethod()
SetResolution(resolution)
GetResolution()
AddOperation(operation)
GetOperations()
SetStructDetails(details)
GetStructDetails()
AddMMCifPDBChainTr(cif_chain_id, pdb_chain_id)

Set up a translation for a certain mmCIF chain name to the traditional PDB chain name.

Parameters: cif_chain_id (str) – atom_site.label_asym_id pdb_chain_id (str) – atom_site.auth_asym_id
GetMMCifPDBChainTr(cif_chain_id)

Get the translation of a certain mmCIF chain name to the traditional PDB chain name. Only works if SEQRES records are read in LoadMMCIF() and a compound library is available (see GetDefaultLib()).

Parameters: cif_chain_id (str) – atom_site.label_asym_id atom_site.auth_asym_id as str (empty if no mapping)
AddPDBMMCifChainTr(pdb_chain_id, cif_chain_id)

Set up a translation for a certain PDB chain name to the mmCIF chain name.

Parameters: pdb_chain_id (str) – atom_site.auth_asym_id cif_chain_id (str) – atom_site.label_asym_id
GetPDBMMCifChainTr(pdb_chain_id)

Get the translation of a certain PDB chain name to the mmCIF chain name.

Parameters: pdb_chain_id (str) – atom_site.auth_asym_id atom_site.label_asym_id as str (empty if no mapping)
AddMMCifEntityIdTr(cif_chain_id, entity_id)

Set up a translation for a certain mmCIF chain name to the mmCIF entity ID.

Parameters: cif_chain_id (str) – atom_site.label_asym_id entity_id (str) – atom_site.label_entity_id
GetMMCifEntityIdTr(cif_chain_id)

Get the translation of a certain mmCIF chain name to the mmCIF entity ID.

Parameters: cif_chain_id (str) – atom_site.label_asym_id atom_site.label_entity_id as str (empty if no mapping)
AddRevision(num, date, status, major=-1, minor=-1)

Add a new iteration to the revision history. See MMCifInfoRevisions.AddRevision().

GetRevisions()
SetRevisionsDateOriginal(date)

Set the date, when this entry first entered the PDB. Ignored if it was set in the past. See MMCifInfoRevisions.SetDateOriginal().

GetObsoleteInfo()
SetObsoleteInfo()

Get bond information for branched entities. Returns all MMCifInfoEntityBranchLink objects in one list. Chain and residue information is available by the stored AtomHandles of each entry.

Returns: list of MMCifInfoEntityBranchLink
GetEntityBranchByChain(chain_name)

Get bond information for chains with branched entities. Returns all MMCifInfoEntityBranchLink objects in one list if chain is a branched entity, an empty list otherwise.

Parameters: chain_name (str) – Chain name to check for branch links list of MMCifInfoEntityBranchLink

Add bond information for a branched entity.

Parameters: chain_name (str) – Chain the bond belongs to atom1 (AtomHandle) – First atom of the bond atom2 (AtomHandle) – Second atom of the bond bond_order (int) – Bond order (e.g. 1=single, 2=double, 3=triple) Nothing
GetEntityBranchChainNames()

Get a list of chain names which contain branched entities.

Returns: list of str
GetEntityBranchChains()

Get a list of chains which contain branched entities.

Returns: list of ChainHandle

Establish all bonds stored for branched entities.

class MMCifInfoCitation

This stores citation information from an input file.

id

Stores an internal identifier for a citation. If not provided, resembles an empty string.

Also available as GetID(). May also be modified by SetID().

cas

Stores a Chemical Abstract Service identifier if available. If not provided, resembles an empty string.

Also available as GetCAS(). May also be modified by SetCas().

isbn

Stores the ISBN code, presumably for cited books. If not provided, resembles an empty string.

Also available as GetISBN(). May also be modified by SetISBN().

published_in

Stores the book or journal title of a publication. Should take the full title, no abbreviations. If not provided, resembles an empty string.

Also available as GetPublishedIn(). May also be modified by SetPublishedIn().

volume

Supposed to store volume information for journals. Since the volume number is not always a simple integer, it is stored as a string. If not provided, resembles an empty string.

Also available as GetVolume(). May also be modified by SetVolume().

page_first

Stores the first page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.

Also available as GetPageFirst(). May also be modified by SetPageFirst().

page_last

Stores the last page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.

Also available as GetPageLast(). May also be modified by SetPageLast().

doi

Stores the Document Object Identifier as used by doi.org for a cited document. If not provided, resembles an empty string.

Also available as GetDOI(). May also be modified by SetDOI().

pubmed

Stores the PubMed accession number. If not provided, is set to 0.

Also available as GetPubMed(). May also be modified by SetPubmed().

year

Stores the publication year. If not provided, is set to 0.

Also available as GetYear(). May also be modified by SetYear().

title

Stores a title. If not provided, is set to an empty string.

Also available as GetTitle(). May also be modified by SetTitle().

book_publisher

Name of publisher of the citation, relevant for books and book chapters.

Also available as GetBookPublisher() and SetBookPublisher().

book_publisher_city

City of the publisher of the citation, relevant for books and book chapters.

Also available as GetBookPublisherCity() and SetBookPublisherCity().

citation_type

Defines where a citation was published. Either journal, book or unknown.

Also available as GetCitationType(). May also be modified by SetCitationType() with values from MMCifInfoCType. For conveinience setters SetCitationTypeJournal(), SetCitationTypeBook() and SetCitationTypeUnknown() exist.

For checking the type of a citation, IsCitationTypeJournal(), IsCitationTypeBook() and IsCitationTypeUnknown() can be used.

authors

Stores a StringList of authors.

Also available as GetAuthorList(). May also be modified by SetAuthorList().

GetCAS()
SetCAS(cas)
GetISBN()
SetISBN(isbn)
GetPublishedIn()
SetPublishedIn(title)
GetVolume()
SetVolume(volume)
GetPageFirst()
SetPageFirst(first)
GetPageLast()
SetPageLast(last)
GetDOI()
SetDOI(doi)
GetPubMed()
SetPubMed(no)
GetYear()
SetYear(year)
GetTitle()
SetTitle(title)
GetBookPublisher()
SetBookPublisher()
GetBookPublisherCity()
SetBookPublisherCity()
GetCitationType()
SetCitationType(publication_type)
SetCitationTypeJournal()
SetCitationTypeBook()
SetCitationTypeUnknown()
IsCitationTypeJournal()
IsCitationTypeBook()
IsCitationTypeUnknown()
GetAuthorList()
SetAuthorList(list)
class MMCifInfoTransOp

This stores operations needed to transform an EntityHandle into a bio unit.

id

A unique identifier. If not provided, resembles an empty string.

Also available as GetID(). May also be modified by SetID().

type

Describes the operation. If not provided, resembles an empty string.

Also available as GetType(). May also be modified by SetType().

translation

The translational vector. Also available as GetVector(). May also be

modified by SetVector().

rotation

The rotational matrix. Also available as GetMatrix(). May also be

modified by SetMatrix().

GetID()

See id

SetID(id)

See id

GetType()
SetType(type)
GetVector()
SetVector(x, y, z)
GetMatrix()
SetMatrix(i00, i01, i02, i10, i11, i12, i20, i21, i22)
class MMCifInfoBioUnit

This stores information how a structure is to be assembled to form the bio unit.

id

The id of a bio unit as given by the original mmCIF file.

Also available as GetID(). May also be modified by SetID().

Type: str
details

Special aspects of the biological assembly. If not provided, resembles an empty string.

Also available as GetDetails(). May also be modified by SetDetails().

method_details

Details about the method used to determine this biological assembly.

Also available as GetMethodDetails(). May also be modified by SetMethodDetails().

chains

Chains involved in this bio unit. If not provided, resembles an empty list.

Also available as GetChainList(). May also be modified by AddChain() or SetChainList().

chainintervals

List of intervals on the chain list. Needed if there a several sets of chains and transformations to create the bio unit. Comes as a list of tuples. First component is the start, second is the right border of the interval.

Also available as GetChainIntervalList(). Is automatically modified by AddChain(), SetChainList() and MMCifInfo.AddBioUnit().

operations

Translations and rotations needed to create the bio unit. Filled with objects of class MMCifInfoTransOp.

Also available as GetOperations(). May be modified by AddOperations()

operationsintervalls

List of intervals on the operations list. Needed if there a several sets of chains and transformations to create the bio unit. Comes as a list of tuples. First component is the start, second is the right border of the interval.

Also available as GetOperationsIntervalList(). Is automatically modified by AddOperations() and MMCifInfo.AddBioUnit().

GetID()

See id

SetID(id)

See id

GetDetails()
SetDetails(details)
GetMethodDetails()
SetMethodDetails(details)
GetChainList()
SetChainList(chains)

See chains, also resets chainintervalls to contain only one interval enclosing the whole chain list.

Parameters: chains (StringList) – List of chain names.
AddChain(chain name)

See chains, also extends the right border of the last entry in chainintervalls.

GetChainIntervalList()
GetOperations()
AddOperations(list of operations)

See operations, also extends the right border of the last entry in operationsintervalls.

GetOperationsIntervalList()
PDBize(asu, seqres=None, min_polymer_size=None, transformation=False, peptide_min_size=10, nucleicacid_min_size=10, saccharide_min_size=10)

Returns the biological assembly (bio unit) for an entity. The new entity created is well suited to be saved as a PDB file. Therefore the function tries to meet the requirements of single-character chain names. The following measures are taken.

• All ligands are put into one chain (_)
• Water is put into one chain (-)
• Each polymer gets its own chain, named A-Z 0-9 a-z.
• The description of non-polymer chains will be put into a generic string property called description on the residue level.
• Ligands that resemble a polymer but have less than min_polymer_size / peptide_min_size / nucleicacid_min_size / saccharide_min_size residues are assigned the same numeric residue number. The residues are distinguished by insertion code.
• Sometimes bio units exceed the coordinate system storable in a PDB file. In that case, the box around the entity will be aligned to the lower left corner of the coordinate system.

Since this function is at the moment mainly used to create biounits from mmCIF files to be saved as PDBs, the function assumes that the ChainType properties are set correctly.

Parameters: asu (EntityHandle) – Asymmetric unit to work on. Should be created from a mmCIF file. seqres (SequenceList) – If set to a valid sequence list, the length of the seqres records will be used to determine if a certain chain has the minimally required length. min_polymer_size (int) – The minimal number of residues a polymer needs to get its own chain. Everything below that number will be sorted into the ligand chain. Overrides peptide_min_size, nucleicacid_min_size and saccharide_min_size if set to a value different than None. transformation (bool) – If set, return the transformation matrix used to move the bounding box of the bio unit to the lower left corner. peptide_min_size (int) – Minimal size to get an individual chain for a polypeptide. Is overridden by min_polymer_size. nucleicacid_min_size (int) – Minimal size to get an individual chain for a polynucleotide. Is overridden by min_polymer_size. saccharide_min_size (int) – Minimal size to get an individual chain for an oligosaccharide or polysaccharide. Is overridden by min_polymer_size.
class MMCifInfoStructDetails

entry_id

Identifier for a curtain data block. If not provided, resembles an empty string.

Also available as GetEntryID(). May also be modified by SetEntryID().

title

Set a title for the structure.

Also available as GetTitle(). May also be modified by SetTitle().

casp_flag

Tells whether this structure was a target in some competition.

Also available as GetCASPFlag(). May also be modified by SetCASPFlag().

descriptor

Descriptor for an NDB structure or the unstructured content of a PDB COMPND record.

Also available as GetDescriptor(). May also be modified by SetDescriptor().

mass

Molecular mass of a molecule.

Also available as GetMass(). May also be modified by SetMass().

mass_method

Method used to determine the molecular weight.

Also available as GetMassMethod(). May also be modified by SetMassMethod().

model_details

Details about how the structure was determined.

Also available as GetModelDetails(). May also be modified by SetModelDetails().

model_type_details

Details about how the type of the structure was determined.

Also available as GetModelTypeDetails(). May also be modified by SetModelTypeDetails().

GetEntryID()
SetEntryID(id)
GetTitle()
SetTitle(title)
GetCASPFlag()
SetCASPFlag(flag)
GetDescriptor()
SetDescriptor(descriptor)
GetMass()
SetMass(mass)
GetMassMethod()
SetMassMethod(method)
GetModelDetails()
SetModelDetails(details)
GetModelTypeDetails()
SetModelTypeDetails(details)
class MMCifInfoObsolete
Holds details on obsolete / superseded structures. The data is
available both in the obsolete and in the replacement entries.
date

When was the entry replaced?

Also available as GetDate(). May also be modified by SetDate().

id

Type of change. Either Obsolete or Supersede. Returns a string starting upper case. Has to be set via OBSLTE or SPRSDE.

Also available as GetID(). May also be modified by SetID().

pdb_id

ID of the replacing entry.

Also available as GetPDBID(). May also be modified by SetPDBID().

replace_pdb_id

ID of the replaced entry.

Also available as GetReplacedPDBID(). May also be modified by SetReplacedPDBID().

GetDate()
SetDate(date)
GetID()

See id

SetID(id)

See id

GetPDBID()
SetPDBID(flag)
GetReplacedPDBID()
SetReplacedPDBID(descriptor)
class MMCifInfoStructRef

Holds the information of the struct_ref category. The category describes the link of polymers in the mmCIF file to sequences stored in external databases such as UniProt. The related categories struct_ref_seq and struct_ref_seq_dif also list differences between the sequences of the deposited structure and the sequences in the database. Two prominent examples of such differences are point mutations and/or expression tags.

db_name

Name of the external database, for example UNP for UniProt.

Type: str
db_id

Name of the reference sequence in the database pointed to by db_name.

Type: str
db_access

Alternative accession code for the sequence in the database pointed to by db_name.

Type: str
GetAlignedSeq(name)

Returns the aligned sequence for the given name, None if the sequence does not exist.

aligned_seqs

List of aligned sequences (all entries of the struct_ref_seq category mapping to this struct_ref).

class MMCifInfoStructRefSeq

An aligned range of residues between a sequence in a reference database and the deposited sequence.

align_id

Uniquely identifies every struct_ref_seq item in the mmCIF file.

Type: str
seq_begin
seq_end

The starting point (1-based) and end point of the aligned range in the deposited sequence, respectively.

Type: int
db_begin
db_end

The starting point (1-based) and end point of the aligned range in the database sequence, respectively.

Type: int
difs

List of differences between the deposited sequence and the sequence in the database.

chain_name

Chain name of the polymer in the mmCIF file.

class MMCifInfoStructRefSeqDif

A particular difference between the deposited sequence and the sequence in the database.

rnum

The residue number (1-based) of the residue in the deposited sequence

Type: int
details

A textual description of the difference, e.g. point mutation, expression tag, purification artifact.

Type: str
class MMCifInfoRevisions

Revision history of a PDB entry. If you find a ‘?’ somewhere, this means ‘not set’.

date_original

The date when this entry was seen in PDB for the very first time. This is not necessarily the release date. Expected format ‘yyyy-mm-dd’.

Type: str
first_release

Index + 1 of the revision releasing this entry. If the value is 0, was not set yet. Set first time we encounter a GetStatus() value of “full release” (mmCIF versions < 5) or “Initial release” (current mmCIF).

Type: int
AddRevision(num, date, status, major=-1, minor=-1)

Add a new iteration to the history.

Parameters: num (int) – See GetNum() date (str) – See GetDate() status (str) – See GetStatus() major (int) – See GetMajor() minor (int) – See GetMinor() Exception if num is <= the last added iteration.
GetSize()
Returns: Number of revisions (valid revision indices are in [0, number-1]). int
GetDate(i)
Parameters: i (int) – Index of revision Date the PDB revision took place. Expected format ‘yyyy-mm-dd’. str Exception if i out of bounds.
GetNum(i)
Parameters: i (int) – Index of revision Unique identifier of revision (assigned in increasing order) int Exception if i out of bounds.
GetStatus(i)
Parameters: i (int) – Index of revision The status of this revision. str Exception if i out of bounds.
GetMajor(i)
Parameters: i (int) – Index of revision The major version of this revision (-1 if not set). int Exception if i out of bounds.
GetMinor(i)
Parameters: i (int) – Index of revision The minor version of this revision (-1 if not set). int Exception if i out of bounds.
GetLastDate()
Returns: Date of the latest revision (‘?’ if no revision set). str
GetLastMajor()
Returns: Major version of the latest revision (-1 if not set). int
GetLastMinor()
Returns: Minor version of the latest revision (-1 if not set). int
SetDateOriginal(date)
GetDateOriginal()
GetFirstRelease()

Data from pdbx_entity_branch, most specifically pdbx_entity_branch_link. That is connectivity information for branched entities, e.g. carbohydrates/ oligosaccharides. Conop Processors can not easily connect them so we use this information in LoadMMCIF() to do that.

atom1

The first atom of the bond. Corresponds to entity_branch_link.atom_id_1, entity_branch_link.comp_id_1 and entity_branch_link.entity_branch_list_num_1. Also available via GetAtom1() and SetAtom1().

Type: AtomHandle
atom2

The second atom of the bond. Corresponds to entity_branch_link.atom_id_2, entity_branch_link.comp_id_2 and entity_branch_link.entity_branch_list_num_2. Also available via GetAtom2() and SetAtom2().

Type: AtomHandle
bond_order

Order of a bond (e.g. 1=single, 2=double, 3=triple). Corresponds to entity_branch_link.value_order. Also available via GetBondOrder() and SetBondOrder().

Type: int

Establish a bond between atom1 and atom2 of a MMCifInfoEntityBranchLink.

Parameters: editor (XCSEditor) – The editor instance to call for connecting the atoms. Nothing
GetAtom1()
GetAtom2()
GetBondOrder()
SetAtom1()
SetAtom2()
SetBondOrder()

## Search

Enter search terms or a module, class or function name.

## Previous topic

Supported Image File Formats

## Next topic

IO Profiles for entity importer