OpenStructure
|
Scorer class to compute various small molecule ligand (non polymer) scores. .. note :: Extra requirements: - Python modules `numpy` and `networkx` must be available (e.g. use ``pip install numpy networkx``) At the moment, two scores are available: * lDDT-PLI, that looks at the conservation of protein-ligand contacts with :class:`lDDT <ost.mol.alg.lddt.lDDTScorer>`. * Binding-site superposed, symmetry-corrected RMSD that assesses the accuracy of the ligand pose (BiSyRMSD, hereinafter referred to as RMSD). Both scores involve local chain mapping of the reference binding site onto the model, symmetry-correction, and finally assignment (mapping) of model and target ligands, as described in (Manuscript in preparation). The binding site is defined based on a radius around the target ligand in the reference structure only. It only contains protein and nucleic acid chains that pass the criteria for the :class:`chain mapping <ost.mol.alg.chain_mapping>`. This means ignoring other ligands, waters, short polymers as well as any incorrectly connected chains that may be in proximity. Results are available as matrices (`(lddt_pli|rmsd)_matrix`), where every target-model score is reported in a matrix; as `(lddt_pli|rmsd)` where a model-target assignment has been determined (see below) and reported in a dictionary; and as (`(lddt_pli|rmsd)_details`) methods, which report additional details about different aspects of the scoring such as chain mapping. The behavior of chain mapping and ligand assignment can be controlled with the `global_chain_mapping` and `rmsd_assignment` arguments. By default, chain mapping is performed locally, ie. only within the binding site. As a result, different ligand scores can correspond to different chain mappings. This tends to produce more favorable scores, especially in large, partially regular oligomeric complexes. Setting `global_chain_mapping=True` enforces a single global chain mapping, as per :meth:`ost.mol.alg.chain_mapping.ChainMapper.GetMapping`. Note that this global chain mapping currently ignores non polymer entities such as small ligands, and may result in overly pessimistic scores. By default, target-model ligand assignments are computed identically for the RMSD and lDDT-PLI scores. Each model ligand is uniquely assigned to a target ligand, starting from the lowest RMSD and using each target and model ligand in a single assignment. If `rmsd_assignment` is set to False, RMSD and lDDT-PLI are assigned separately to optimize each score, and the other score is used as a tiebreaker. By default, only exact matches between target and model ligands are considered. This is a problem when the target only contains a subset of the expected atoms (for instance if atoms are missing in an experimental structure, which often happens in the PDB). With `substructure_match=True`, complete model ligands can be scored against partial target ligands. One problem with this approach is that it is very easy to find good matches to small, irrelevant ligands like EDO, CO2 or GOL. To counter that, the assignment algorithm considers the coverage, expressed as the fraction of atoms of the model ligand atoms covered in the target. Higher coverage matches are prioritized, but a match with a better score will be preferred if it falls within a window of `coverage_delta` (by default 0.2) of a worse-scoring match. As a result, for instance, with a delta of 0.2, a low-score match with coverage 0.96 would be preferred to a high-score match with coverage 0.90. Assumptions: The class generally assumes that the :attr:`~ost.mol.ResidueHandle.is_ligand` property is properly set on all the ligand atoms, and only ligand atoms. This is typically the case for entities loaded from mmCIF (tested with mmCIF files from the PDB and SWISS-MODEL). Legacy PDB files must contain `HET` headers (which is usually the case for files downloaded from the PDB but not elsewhere). The class doesn't perform any cleanup of the provided structures. It is up to the caller to ensure that the data is clean and suitable for scoring. :ref:`Molck <molck>` should be used with extra care, as many of the options (such as `rm_non_std` or `map_nonstd_res`) can cause ligands to be removed from the structure. If cleanup with Molck is needed, ligands should be kept aside and passed separately. Non-ligand residues should be valid compounds with atom names following the naming conventions of the component dictionary. Non-standard residues are acceptable, and if the model contains a standard residue at that position, only atoms with matching names will be considered. Unlike most of OpenStructure, this class does not assume that the ligands (either in the model or the target) are part of the PDB component dictionary. They may have arbitrary residue names. Residue names do not have to match between the model and the target. Matching is based on the calculation of isomorphisms which depend on the atom element name and atom connectivity (bond order is ignored). It is up to the caller to ensure that the connectivity of atoms is properly set before passing any ligands to this class. Ligands with improper connectivity will lead to bogus results. Note, however, that atom names should be unique within a residue (ie two distinct atoms cannot have the same atom name). This only applies to the ligand. The rest of the model and target structures (protein, nucleic acids) must still follow the usual rules and contain only residues from the compound library. Although it isn't a requirement, hydrogen atoms should be removed from the structures. Here is an example code snippet that will perform a reasonable cleanup. Keep in mind that this is most likely not going to work as expected with entities loaded from PDB files, as the `is_ligand` flag is probably not set properly. Here is a snippet example of how to use this code:: from ost.mol.alg.ligand_scoring import LigandScorer from ost.mol.alg import Molck, MolckSettings # Load data # Structure model in PDB format, containing the receptor only model = io.LoadPDB("path_to_model.pdb") # Ligand model as SDF file model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf") # Target loaded from mmCIF, containing the ligand target = io.LoadMMCIF("path_to_target.cif") # Cleanup a copy of the structures cleaned_model = model.Copy() cleaned_target = target.Copy() molck_settings = MolckSettings(rm_unk_atoms=True, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False, map_nonstd_res=False, assign_elem=True) Molck(cleaned_model, conop.GetDefaultLib(), molck_settings) Molck(cleaned_target, conop.GetDefaultLib(), molck_settings) # Setup scorer object and compute lDDT-PLI model_ligands = [model_ligand.Select("ele != H")] ls = LigandScorer(model=cleaned_model, target=cleaned_target, model_ligands=model_ligands) print("lDDT-PLI:", ls.lddt_pli) print("RMSD:", ls.rmsd) :param model: Model structure - a deep copy is available as :attr:`model`. No additional processing (ie. Molck), checks, stereochemistry checks or sanitization is performed on the input. Hydrogen atoms are kept. :type model: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView` :param target: Target structure - a deep copy is available as :attr:`target`. No additional processing (ie. Molck), checks or sanitization is performed on the input. Hydrogen atoms are kept. :type target: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView` :param model_ligands: Model ligands, as a list of :class:`~ost.mol.ResidueHandle` belonging to the model entity. Can be instantiated with either a :class:list of :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView` or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`. If `None`, ligands will be extracted based on the :attr:`~ost.mol.ResidueHandle.is_ligand` flag (this is normally set properly in entities loaded from mmCIF). :type model_ligands: :class:`list` :param target_ligands: Target ligands, as a list of :class:`~ost.mol.ResidueHandle` belonging to the target entity. Can be instantiated either a :class:list of :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView` or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView` containing a single residue each. If `None`, ligands will be extracted based on the :attr:`~ost.mol.ResidueHandle.is_ligand` flag (this is normally set properly in entities loaded from mmCIF). :type target_ligands: :class:`list` :param resnum_alignments: Whether alignments between chemically equivalent chains in *model* and *target* can be computed based on residue numbers. This can be assumed in benchmarking setups such as CAMEO/CASP. :type resnum_alignments: :class:`bool` :param check_resnames: On by default. Enforces residue name matches between mapped model and target residues. :type check_resnames: :class:`bool` :param rename_ligand_chain: If a residue with the same chain name and residue number than an explicitly passed model or target ligand exits in the structure, and `rename_ligand_chain` is False, a RuntimeError will be raised. If `rename_ligand_chain` is True, the ligand will be moved to a new chain instead, and the move will be logged to the console with SCRIPT level. :type rename_ligand_chain: :class:`bool` :param chain_mapper: a chain mapper initialized for the target structure. If None (default), a chain mapper will be initialized lazily as required. :type chain_mapper: :class:`ost.mol.alg.chain_mapping.ChainMapper` :param substructure_match: Set this to True to allow incomplete (ie partially resolved) target ligands. :type substructure_match: :class:`bool` :param coverage_delta: the coverage delta for partial ligand assignment. :type coverage_delta: :class:`float` :param radius: Inclusion radius for the binding site. Residues with atoms within this distance of the ligand will be considered for inclusion in the binding site. :type radius: :class:`float` :param lddt_pli_radius: lDDT inclusion radius for lDDT-PLI. Should be at least equal to or larger than `radius`. :type lddt_pli_radius: :class:`float` :param lddt_lp_radius: lDDT inclusion radius for lDDT-LP. :type lddt_lp_radius: :class:`float` :param model_bs_radius: inclusion radius for model binding sites. Only used when full_bs_search=False, otherwise the radius is effectively infinite. Only chains with atoms within this distance of a model ligand will be considered in the chain mapping. :type model_bs_radius: :class:`float` :param binding_sites_topn: maximum number of target binding site representations to assess, per target ligand. Ignored if `global_chain_mapping` is True. :type binding_sites_topn: :class:`int` :param global_chain_mapping: set to True to use a global chain mapping for the polymer (protein, nucleotide) chains. Defaults to False, in which case only local chain mappings are allowed (where different ligand may be scored against different chain mappings). :type global_chain_mapping: :class:`bool` :param custom_mapping: Provide custom chain mapping between *model* and *target* that is used as global chain mapping. Dictionary with target chain names as key and model chain names as value. Only has an effect if *global_chain_mapping* is True. :type custom_mapping: :class:`dict` :param rmsd_assignment: set to False to assign lDDT-PLI and RMSD separately using a combination of these two scores to optimize the assignment. By default (True), only RMSD is considered for the ligand assignment. :type rmsd_assignment: :class:`bool` :param n_max_naive: Parameter for global chain mapping. If *model* and *target* have less or equal that number of chains, the full mapping solution space is enumerated to find the the optimum. A heuristic is used otherwise. :type n_max_naive: :class:`int` :param max_symmetries: If more than that many isomorphisms exist for a target-ligand pair, it will be ignored and reported as unassigned. :type max_symmetries: :class:`int` :param unassigned: If True, unassigned model ligands are reported in the output together with assigned ligands, with a score of None, and reason for not being assigned in the \\*_details matrix. Defaults to False. :type unassigned: :class:`bool` :param full_bs_search: If True, all potential binding sites in the model are searched for each target binding site. If False, the search space in the model is reduced to chains around (`model_bs_radius` Å) model ligands. This speeds up computations, but may result in ligands not being scored if the predicted ligand pose is too far from the actual binding site. When that's the case, the value in the `unassigned_*_ligands` property will be `model_representation` and is indistinguishable from cases where the binding site was not modeled at all. Ignored when `global_chain_mapping` is True. :type full_bs_search: :class:`bool`
Definition at line 13 of file ligand_scoring.py.
def __init__ | ( | self, | |
model, | |||
target, | |||
model_ligands = None , |
|||
target_ligands = None , |
|||
resnum_alignments = False , |
|||
check_resnames = True , |
|||
rename_ligand_chain = False , |
|||
chain_mapper = None , |
|||
substructure_match = False , |
|||
coverage_delta = 0.2 , |
|||
radius = 4.0 , |
|||
lddt_pli_radius = 6.0 , |
|||
lddt_lp_radius = 10.0 , |
|||
model_bs_radius = 20 , |
|||
binding_sites_topn = 100000 , |
|||
global_chain_mapping = False , |
|||
rmsd_assignment = False , |
|||
n_max_naive = 12 , |
|||
max_symmetries = 1e5 , |
|||
custom_mapping = None , |
|||
unassigned = False , |
|||
full_bs_search = False |
|||
) |
Definition at line 278 of file ligand_scoring.py.
def chain_mapper | ( | self | ) |
Chain mapper object for the given :attr:`target`. :type: :class:`ost.mol.alg.chain_mapping.ChainMapper`
Definition at line 402 of file ligand_scoring.py.
def chain_mapping_mdl | ( | self | ) |
Definition at line 1771 of file ligand_scoring.py.
def chem_group_alns | ( | self | ) |
Definition at line 1763 of file ligand_scoring.py.
def chem_mapping | ( | self | ) |
Definition at line 1755 of file ligand_scoring.py.
def coverage_matrix | ( | self | ) |
Get the matrix of model ligand atom coverage in the target. Target ligands are in rows, model ligands in columns. A value of 0 indicates that there was no isomorphism between the model and target ligands. If `substructure_match=False`, only full match isomorphisms are considered, and therefore only values of 1.0 and 0.0 are reported. :rtype: :class:`~numpy.ndarray`
Definition at line 1218 of file ligand_scoring.py.
def get_get_repr_input | ( | self, | |
mdl_ligand | |||
) |
Definition at line 1778 of file ligand_scoring.py.
def get_repr | ( | self, | |
target_ligand, | |||
model_ligand | |||
) |
Definition at line 1817 of file ligand_scoring.py.
def get_target_binding_site | ( | self, | |
target_ligand | |||
) |
Definition at line 413 of file ligand_scoring.py.
def lddt_pli | ( | self | ) |
Get a dictionary of lDDT-PLI score values, keyed by model ligand (chain name, :class:`~ost.mol.ResNum`). If the scoring object was instantiated with `unassigned=True`, some scores may be `None`. :rtype: :class:`dict`
Definition at line 1314 of file ligand_scoring.py.
def lddt_pli_details | ( | self | ) |
Get a dictionary of lDDT-PLI score details (dictionaries), keyed by model ligand (chain name, :class:`~ost.mol.ResNum`). Each sub-dictionary contains the following information: * `lddt_pli`: the lDDT-PLI score value. * `rmsd`: the RMSD score value corresponding to the lDDT-PLI chain mapping and assignment. This may differ from the RMSD-based assignment. Note that a different isomorphism than `lddt_pli` may be used. * `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP). * `lddt_pli_n_contacts`: number of total contacts used in lDDT-PLI, summed over all thresholds. Can be divided by 8 to obtain the number of atomic contacts. * `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`) that define the binding site in the reference. * `bs_ref_res_mapped`: a list of residues (:class:`~ost.mol.ResidueHandle`) in the reference binding site that could be mapped to the model. * `bs_mdl_res_mapped`: a list of residues (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to the reference binding site. The residues are in the same order as `bs_ref_res_mapped`. * `bb_rmsd`: the RMSD of the binding site backbone after superposition. Note: not used for lDDT-PLI computation. * `target_ligand`: residue handle of the target ligand. * `model_ligand`: residue handle of the model ligand. * `chain_mapping`: local chain mapping as a dictionary, with target chain name as key and model chain name as value. * `transform`: transformation to superpose the model onto the target (for RMSD only). * `substructure_match`: whether the score is the result of a partial (substructure) match. A value of `True` indicates that the target ligand covers only part of the model, while `False` indicates a perfect match. * `inconsistent_residues`: a list of tuples of mapped residues views (:class:`~ost.mol.ResidueView`) with residue names that differ between the reference and the model, respectively. The list is empty if all residue names match, which is guaranteed if `check_resnames=True`. Note: more binding site mappings may be explored during scoring, but only inconsistencies in the selected mapping are reported. * `unassigned`: only if the scorer was instantiated with `unassigned=True`: `False` If the scoring object was instantiated with `unassigned=True`, in addition the unmapped ligands will be reported with a score of `None` and the following information: * `unassigned`: `True`, * `reason_short`: a short token of the reason, see :attr:`unassigned_model_ligands` for details and meaning. * `reason_long`: a human-readable text of the reason, see :attr:`unassigned_model_ligands` for details and meaning. * `lddt_pli`: `None` :rtype: :class:`dict`
Definition at line 1331 of file ligand_scoring.py.
def lddt_pli_matrix | ( | self | ) |
Get the matrix of lDDT-PLI values. Target ligands are in rows, model ligands in columns. NaN values indicate that no lDDT-PLI could be computed (i.e. different ligands). :rtype: :class:`~numpy.ndarray`
Definition at line 1195 of file ligand_scoring.py.
def rmsd | ( | self | ) |
Get a dictionary of RMSD score values, keyed by model ligand (chain name, :class:`~ost.mol.ResNum`). If the scoring object was instantiated with `unassigned=True`, some scores may be `None`. :rtype: :class:`dict`
Definition at line 1235 of file ligand_scoring.py.
def rmsd_details | ( | self | ) |
Get a dictionary of RMSD score details (dictionaries), keyed by model ligand (chain name, :class:`~ost.mol.ResNum`). The value is a dictionary. For ligands that were assigned (mapped) to the target, the dictionary contain the following information: * `rmsd`: the RMSD score value. * `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP). * `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`) that define the binding site in the reference. * `bs_ref_res_mapped`: a list of residues (:class:`~ost.mol.ResidueHandle`) in the reference binding site that could be mapped to the model. * `bs_mdl_res_mapped`: a list of residues (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to the reference binding site. The residues are in the same order as `bs_ref_res_mapped`. * `bb_rmsd`: the RMSD of the binding site backbone after superposition * `target_ligand`: residue handle of the target ligand. * `model_ligand`: residue handle of the model ligand. * `chain_mapping`: local chain mapping as a dictionary, with target chain name as key and model chain name as value. * `transform`: transformation to superpose the model onto the target. * `substructure_match`: whether the score is the result of a partial (substructure) match. A value of `True` indicates that the target ligand covers only part of the model, while `False` indicates a perfect match. * `coverage`: the fraction of model atoms covered by the assigned target ligand, in the interval (0, 1]. If `substructure_match` is `False`, this will always be 1. * `inconsistent_residues`: a list of tuples of mapped residues views (:class:`~ost.mol.ResidueView`) with residue names that differ between the reference and the model, respectively. The list is empty if all residue names match, which is guaranteed if `check_resnames=True`. Note: more binding site mappings may be explored during scoring, but only inconsistencies in the selected mapping are reported. * `unassigned`: only if the scorer was instantiated with `unassigned=True`: `False` If the scoring object was instantiated with `unassigned=True`, in addition the unassigned ligands will be reported with a score of `None` and the following information: * `unassigned`: `True`, * `reason_short`: a short token of the reason, see :attr:`unassigned_model_ligands` for details and meaning. * `reason_long`: a human-readable text of the reason, see :attr:`unassigned_model_ligands` for details and meaning. * `rmsd`: `None` :rtype: :class:`dict`
Definition at line 1252 of file ligand_scoring.py.
def rmsd_matrix | ( | self | ) |
Get the matrix of RMSD values. Target ligands are in rows, model ligands in columns. NaN values indicate that no RMSD could be computed (i.e. different ligands). :rtype: :class:`~numpy.ndarray`
Definition at line 1172 of file ligand_scoring.py.
def unassigned_model_ligand_descriptions | ( | self | ) |
Get a human-readable description of why model ligands were unassigned, as a dictionary keyed by the controlled dictionary from :attr:`unassigned_model_ligands`.
Definition at line 1516 of file ligand_scoring.py.
def unassigned_model_ligands | ( | self | ) |
Get a dictionary of model ligands not assigned to any target ligand, keyed by model ligand (chain name, :class:`~ost.mol.ResNum`). The assignment for the lDDT-PLI score is used (and is controlled by the `rmsd_assignment` argument). Each item contains a string from a controlled dictionary about the reason for the absence of assignment. A human-readable description can be obtained from the :attr:`unassigned_model_ligand_descriptions` property. Currently, the following reasons are reported: * `no_ligand`: there was no ligand in the target. * `disconnected`: the ligand graph is disconnected. * `binding_site`: a potential assignment was found in the target, but there were no polymer residues in proximity of the ligand in the target. * `model_representation`: a potential assignment was found in the target, but no representation of the binding site was found in the model. (I.e. the binding site was not modeled, or the model ligand was positioned too far in combination with `full_bs_search=False`) * `identity`: the ligand was not found in the target (by graph isomorphism). Check your ligand connectivity, and enable the `substructure_match` option if the target ligand is incomplete. * `stoichiometry`: there was a possible assignment in the target, but the model target was already assigned to a different model ligand. This indicates different stoichiometries. * `symmetries`: too many symmetries were found (by graph isomorphisms). Increase `max_symmetries`. * `no_contact`: there were no lDDT contacts between the binding site and the ligand in the target structure, and lDDT-PLI is undefined. Increase the value of `lddt_pli_radius` to at least the value of the binding site `radius`. Some of these reasons can be overlapping, but a single reason will be reported. :rtype: :class:`dict`
Definition at line 1461 of file ligand_scoring.py.
def unassigned_target_ligand_descriptions | ( | self | ) |
Get a human-readable description of why target ligands were unassigned, as a dictionary keyed by the controlled dictionary from :attr:`unassigned_target_ligands`.
Definition at line 1451 of file ligand_scoring.py.
def unassigned_target_ligands | ( | self | ) |
Get a dictionary of target ligands not assigned to any model ligand, keyed by target ligand (chain name, :class:`~ost.mol.ResNum`). The assignment for the lDDT-PLI score is used (and is controlled by the `rmsd_assignment` argument). Each item contains a string from a controlled dictionary about the reason for the absence of assignment. A human-readable description can be obtained from the :attr:`unassigned_target_ligand_descriptions` property. Currently, the following reasons are reported: * `no_ligand`: there was no ligand in the model. * `disconnected`: the ligand graph was disconnected. * `binding_site`: no residues were in proximity of the ligand. * `model_representation`: no representation of the reference binding site was found in the model. (I.e. the binding site was not modeled, or the model ligand was positioned too far in combination with `full_bs_search=False`) * `identity`: the ligand was not found in the model (by graph isomorphism). Check your ligand connectivity, and enable the `substructure_match` option if the target ligand is incomplete. * `stoichiometry`: there was a possible assignment in the model, but the model ligand was already assigned to a different target ligand. This indicates different stoichiometries. * `symmetries`: too many symmetries were found (by graph isomorphisms). Increase `max_symmetries`. * `no_contact`: there were no lDDT contacts between the binding site and the ligand, and lDDT-PLI is undefined. Increase the value of `lddt_pli_radius` to at least the value of the binding site `radius`. Some of these reasons can be overlapping, but a single reason will be reported. :rtype: :class:`dict`
Definition at line 1398 of file ligand_scoring.py.
binding_sites_topn |
Definition at line 332 of file ligand_scoring.py.
check_resnames |
Definition at line 325 of file ligand_scoring.py.
coverage_delta |
Definition at line 338 of file ligand_scoring.py.
full_bs_search |
Definition at line 339 of file ligand_scoring.py.
global_chain_mapping |
Definition at line 333 of file ligand_scoring.py.
lddt_lp_radius |
Definition at line 331 of file ligand_scoring.py.
lddt_pli_radius |
Definition at line 330 of file ligand_scoring.py.
max_symmetries |
Definition at line 336 of file ligand_scoring.py.
model |
Definition at line 288 of file ligand_scoring.py.
model_bs_radius |
Definition at line 329 of file ligand_scoring.py.
model_ligands |
Definition at line 313 of file ligand_scoring.py.
n_max_naive |
Definition at line 335 of file ligand_scoring.py.
radius |
Definition at line 328 of file ligand_scoring.py.
rename_ligand_chain |
Definition at line 326 of file ligand_scoring.py.
resnum_alignments |
Definition at line 324 of file ligand_scoring.py.
rmsd_assignment |
Definition at line 334 of file ligand_scoring.py.
substructure_match |
Definition at line 327 of file ligand_scoring.py.
target |
Definition at line 295 of file ligand_scoring.py.
target_ligands |
Definition at line 303 of file ligand_scoring.py.
unassigned |
Definition at line 337 of file ligand_scoring.py.