PSI-MI Conversion

Mapping from PSI-MI Level 2.5 to BioPAX Level 2

PSI-MI (Proteomics Standards Initiative - Molecular Interactions) Level 2.5, which describes molecular interactions, including protein-protein interactions, was completed in Sep 2005 and will be supported by all major interaction databases.


 * PSI Homepage
 * PSI-MI Level 2.5 XML Schema Documentation
 * PSI-MI Level 2.5 Controlled Vocabularies in OBO format (View/Edit with DAG-Edit)
 * PSI-MI Level 2.5 XML Schema (View/Edit with XML Spy or Oxygen)

Overview
All PSI-MI interactions are mapped to the BioPAX physicalInteraction class, which is a subclass of 'interaction' and has 'conversion' and 'control' classes as children. physicalInteraction stores PSI-MI interactors in the PARTICIPANTS property. PSI-MI experimental evidence maps to the BioPAX evidence class. The mapping is not perfect and potentially lossy, though all non-optional PSI-MI field and most commonly used optional PSI-MI fields can be mapped.

Interaction
Summary: All PSI-MI interactions map to BioPAX physicalInteraction

PSI-MI interactions store participants of complex assembly, biochemical reaction and other interactions, but do not store the result of the interaction. A PSI-MI interaction may contain an 'interaction type' from the PSI-MI controlled vocabulary (e.g. acetylation, phosphorylation, dephosphorylation, etc.) Even though these interaction types describe biochemical reactions, the actual substrates and products are not listed, thus must be inferred. Performing this inference to map PSI-MI interactions to more specific BioPAX classes, such as biochemicalReaction, in a general way is out of the scope of this mapping procedure (it is more suitable as a research project).

Participants
Like BioPAX, PSI-MI interactions contain a set of interactors linked through the participants field. In PSI-MI Level 2.5, interactor types are described using controlled vocabulary terms. PSI-MI has a more detailed set of interactor types:

*   *
 * interactor type (parent term)
 * biopolymer
 * nucleic acid
 * dna (maps to BioPAX DNA)
 * rna (maps to BioPAX RNA)
 * protein (maps to BioPAX protein)
 * peptide
 * complex (maps to BioPAX complex)
 * gene (does not map to BioPAX)
 * interaction (does not map to BioPAX)
 * small molecule (maps to BioPAX smallMolecule)
 * unknown participant (does not map to BioPAX)

Note: This is a major change in PSI-MI from Level 1, where interactor and participant types each had their own defined XML type. This was changed to use controlled vocabulary terms because it was difficult to develop and extend.

Evidence
Evidence decribes experimental evidence used to support an interaction. The experimental data is not stored, just an experimental description. BioPAX evidence is copied directly from PSI-MI evidence with minor changes.

Evidence is attached to interactions and participants in PSI-MI. It is attached to pathways and interactions in BioPAX.

PSI-MI evidence fields
 * interaction->experimentList->experimentDescription
 * interaction->confidenceList->confidence
 * participant->experimentalRoleList (uses the 'experimental role' CV - doesn't map to BioPAX)
 * participant->experimentalFormList (uses the 'experimental feature' CV - maps to BioPAX EXPERIMENTAL-FORM)
 * participant->experimentalPreparationList (uses the 'experimental preparation' CV - doesn't map to BioPAX)
 * participant->experimentalInteractorList (stores actual interactor used in modeled interactions - doesn't map to BioPAX)
 * participant->hostOrganismList (the organism used for the experiment - doesn't map to BioPAX)

BioPAX evidence fields (evidence is a property of interaction and pathway)
 * evidence->EVIDENCE-CODE
 * evidence->CONFIDENCE
 * evidence->EXPERIMENTAL-FORM
 * evidence->XREF

PSI-MI experimentDescription contains many fields describing an experiment. Each one uses a set of PSI-MI controlled vocabulary terms. All of these map to evidence->EVIDENCE-CODE (openControlledVocabulary) in BioPAX.

PSI-MI confidence maps directly to BioPAX confidence, though BioPAX requires a publicationXref that describes the confidence method and PSI-MI uses a controlled vocabulary term or free text string to describe the confidence method, though the controlled vocabulary is not yet defined. Thus confidence is not easily mapped.

PSI-MI experimental form is stored as a controlled vocabulary term attached to each participant. In BioPAX, the evidence->EXPERIMENTAL-FORM stores a map from participants to their experimental form, also as a controlled vocabulary term. Thus only a small data rearrangement is required in the mapping.

PSI-MI experimentDescription has an xref which maps directly to BioPAX evidence->XREF

Sequence features
Sequence features are modifications to a biopolymer at specific sites. BioPAX sequence features are copied directly from PSI-MI sequence features with minor changes.

PSI-MI stores sequence features in the participant->featureList field

A PSI-MI feature contains
 * names
 * xref
 * featureType (controlled vocabulary - CV)
 * featureDetectionMethod (CV) (The experimental method type used to detect the feature)
 * experimentRefList (the experiment used to determine the feature)
 * featureRangeList (the only required field here)

A BioPAX sequenceFeature contains
 * names (and NAME, SHORT-NAME, SYNONYMS)
 * XREF
 * FEATURE-TYPE
 * FEATURE-LOCATION

featureRangeList maps directly to FEATURE-LOCATION, though the actual data structure is slightly differently organized in BioPAX.

Note: featureDetectionMethod and experimentRefList are not present in BioPAX, so these optional PSI-MI fields can't be mapped.

Cross-references (xrefs)
PSI-MI stores xrefs as a bibref (publication) which maps to BioPAX publicationXref. All other PSI-MI xrefs are stored in the xref type, which maps to BioPAX unificationXref if the xref type is 'idenity' or relationshipXref otherwise. Since the PSI-MI xref type is optional, a mapper will have to take into account knowledge of unification xrefs, which requires a database ID mapping service, in order to support a reliable mapping. It is likely this will cause many errors unless great care is taken here.

Controlled vocabularies (CVs)
PSI-MI makes heavy use of controlled vocabularies. Most of these can be mapped to BioPAX, but some of them can't be.

Mapped CVs

 * alias type - stores types of names e.g. gene name, orf name, gene name synonym
 * PSI-MI names->shortLabel and names->fullName maps to BioPAX SHORT-NAME and NAME, respectively. PSI-MI names->alias uses the 'alias' type and all aliases describing physicalEntity names map to synonyms. For instance, PSI-MI aliases of type GO synonym may not map to BioPAX.
 * database citation e.g. experiment xref, feature xref, interaction xref
 * This maps to BioPAX xref->DB
 * feature range status e.g. less-than, range, c-terminal, certain
 * This partially maps to BioPAX sequenceSite->POSITION-STATUS in the following way. All other PSI-MI CV terms are not mappable.
 * certain -> EQUAL
 * less-than -> LESS-THAN
 * greater-than -> GREATER-THAN
 * feature type e.g. post-translational modification, mutation, binding site, experimental feature (e.g. tag)
 * This maps to BioPAX sequenceFeature->FEATURE-TYPE
 * interaction detection method e.g. two-hybrid, etc.
 * This maps to BioPAX evidence->EVIDENCE-CODE
 * interaction type
 * This maps to BioPAX physicalInteraction->INTERACTION-TYPE
 * interactor type
 * This was covered above in the Participants section
 * xref type: cross-reference type e.g. identity, method reference, see-also
 * PSI-MI xrefs of type 'identity' map to BioPAX unificationXref all other xref types map to BioPAX relationshipXref, in which case, the PSI-MI CV term maps to the RELATIONSHIP-TYPE property

Unmapped CVs

 * attribute name
 * PSI-MI allows free text name-value pairs to be attached to many objects. These are non standard, but a CV of attribute names does exist e.g. figure legend, experimental decription (more free text information than already captured). BioPAX does not allow arbitrary attributes, thus these can't be reliably mapped to BioPAX.
 * biological role
 * PSI-MI allows participants to be tagged with a role e.g. electron acceptor, inhibitor, enzyme, enzyme target. BioPAX has specific properties for some of these concepts, but PSI-MI does not have enough information to reliably map these to BioPAX. As mentioned above, PSI-MI does not have a 'result' role term, so this needs to be inferred.
 * experimental preparation - describes how participants were experimentally prepared e.g. delivery method, expression level
 * BioPAX does not include this concept
 * experimental role e.g. bait, prey
 * This is attached to a PSI-MI participant. The BioPAX experimentalForm class does not include this concept.
 * feature detection method e.g. alanine scanning, mobility shift
 * This is attached to a PSI-MI feature. The BioPAX sequenceFeature class does not include this concept.
 * participant identification method e.g. mass spectrometry
 * BioPAX does not include this concept

Not mapped
The following PSI-MI fields can't be mapped to BioPAX Level 2.

Interaction
 * inferredInteractionList
 * modelled flag
 * intramolecular flag
 * negative flag

General
 * attributeList
 * parameterList

Tools
FrankGibbons has created a PSI-MI Level 1 to BioPAX XSLT converter.