Level4ToDo

Level 4 Proposals
(some of which could be also considered for Level 3 next revisions, if we plan any, e.g., L3 v2, v3, etc.)


 * Entity BioSource: propose subclasses organism, cell, tissue (check for usage)
 * We could convert BioSource to CV - BioSource superclass, organism, cell, tissue subclasses.
 * Would be nice to say 'we require using NCBI taxonomy reference', or query for all pathways in mammals.
 * Need to support strains and synthetic organisms, cells and tissues.
 * To be consistent: tissue -> tissueType (or cellType -> cell)
 * Priority: low. Current biopax level 3 implementation is ok


 * Entity ExperimentalFormVocabulary: add property "Participant Identification Method" ( check for usage)
 * We don't remember what this means, so are ignoring it (it may have been


 * PhysicalEntity Property: Evidence - rational needed from Gary for Documentation
 * small issue - to be done later.


 * Reorganization of the top level ontology : Make physical entity reference and entity feature children of OWL:Thing (rational is that these, strictly speaking, are not meta data)
 * Make EntityReference and EntityFeature siblings of Entity (top level)
 * Priority: low


 * Try to get rid of utility class?
 * major disadvantage is that the presentation of the ontology in e.g. protege gets ugly and it is more difficult to know which objects are primary curation targets of pathway databases vs. other things, like chemical structures.
 * Priority: low


 * Entity Pathway: Study the semantics of "black box" pathways. NCI PID, Panther Db and PharmaGKB uses these. Black box reactions in Reactome.
 * a black box pathway is a pathway where the internal steps are not specified. We can handle a basic black box pathway, but we can't specify inputs and outputs of these. If we think of adding inputs and outputs on pathways, then we need to clarify their relationship with reactions (conversions). We need to write up all of the examples of how various databases use this concept and evaluate how these can be handled in biopax.
 * Revisit the overall design of level of granularity for biological processes. We need some way of relating pathway to interaction (both are processes) and deals with input-output of processes, black box pathways, and subprocesses (part-of for processes). ( Very long term project,related to above)
 * Review NIST PSL process specification language to see if we can use aspects of it. (Consider assigning this to an engineering student? Emek to review the language)
 * model composition and layering, how do you refer to one BioPAX model from another model (for example a pathway not containing its participants that is described elsewhere).
 * Action: Andrea Splendiani will study this issue and write up a white paper.


 * Entity RelationshipTypeVocabulary: check this vocab, there are too many terms in this vocabulary that would be inappropriate in BioPAX, this would require careful documentation and use (for example sameAs IsA). Perhaps none of these terms are relevant now that we have implemented "generics". Perhaps a choose from list? There are also terms that we need (e.g. from BioCyc) that are not in the PSI-MI system
 * In general, relationship type is dangerous, because you are free to represent relationships already in BioPAX in RelationshipTypeVocabulary
 * best practice: never create a relationship xref type that is already captured in BioPAX (e.g. protein is part of a complex)
 * Frequently used to reference genes from proteins, structures from proteins, one pathway is a part of another - these are used to create 'link outs' on websites.
 * We need to catalog all of these frequently used reference types (idea to visualize the frequency as a tag cloud), model them as specific types of relationships and eventually not include RelationshipTypeVocabulary
 * TODO: tabulate these in existing databases
 * Use of miriam CV for relationshipXref (suggestion by Augustin, MIM) http://www.ebi.ac.uk/miriam/main/mdb?section=qualifiers ( Bring this to the attn: semantics group)
 * Priority: important, but needs long term attention. Medium


 * Entity SequenceRegionVocabulary: points to disjoint classes Protein Domains, Promoters, UTRs
 * we may want to restrict use to specific terms in SO and add this to BioPAX documentation.
 * Priority: medium. Emek interested.


 * Should Evidence have the object property BioSource? Example : Evidence for a human protein from mouse by homology.
 * 1. there is a bug in level 3 - experimental form doesn't point back to the participant in the interaction that it is the form for. This is a problem if there is more than one participant with experimental form in the interaction.
 * Reactome has a good solution for capturing modeling by homology - they create a reaction in species X (what was done in the experiment) and can reference that as evidence in species Y (what we are trying to model). We should use this in BioPAX. This doesn't capture e.g. chemical modifications of participants e.g. GFP tagged protein (Reactome currently doesn't annotate this information). Here's a link to a page in the reactome curator guide that describes the procedure and its logic: http://wiki.reactome.org/index.php/New_Reactome_Curator_Guide#Creating_an_inferred_event Here's a finished example: http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Gallus%20gallus&ID=994169&
 * We could subclass evidence into evidence with experimental form (e.g. tagged proteins) and evidence without experimental form (e.g. evidence by homology as in Reactome).
 * Priority: low, because not many people are using this.

Priority: semantic web group is working on this - Michel has written up a google document)
 * Use of dublin core to store attribution? (suggestion by Augustin, MIM) Do we support all of dublin core, or only a subset? PaxTools and validator will have to be updated. Best practices for use?


 * Do we want to subclass pathway? – metabolism, signaling, MI, regulatory (regulation of one operon) – request from Peter Karp (long ago) and also RegulonDB. (There is no behavioral difference, check for usage - we can potentially use GO biological process)
 * They are non-disjoint
 * Priority: low


 * Extend physicalEntity to other molecules.
 * You could conceive of an entire subclassing structure for all molecules (Related to the linking external CV discussion)
 * Could add: photon, heat, electrical stimulus, osmotic pressure (SBGN calls these 'perturbations') - check PATO for possible easy fixes to this perturbation issue.
 * chebi has a photon class
 * Priority: low


 * Do we need an entity reference for complexes to tie together complex states (e.g. complexA and complexA-phosphorylated)? ( No )
 * We can't do this due to a problem with representing stoichiometry for participants of hierarchical complexes - this is related to polymerization.


 * Complex 1A and 1B have 3 different forms which are phosphorylated in 3 different places, but you don’t know where. You need to create 3 different As, but they would look the same. There is no way to specify 2 unknown PTMs at different sites. (Done  in Level 3 generic features)
 * Action: Emek to document examples


 * Update Small Molecule EntityReference with an object Property to SmallMoleculeClass (single value from SmallMoleculeVocabulary) And add subclass SmallMoleculeVocabulary(Chebi) to ControlledVocabulary. E.g. reference generic alcohol class in BioPAX
 * bring this to attention of Semantic web group, esp Nadia - related to linking external CV discussion.


 * representing negative observation.
 * Simple way to do this is to create a biopax file and label it as full of negative information
 * Priority: probably not relevant for biopax because pathway databases are not very interested in capturing these, thus it is a low


 * Modulation Class, Controlled property can only be a Catalysis. Ashok Reddy, molecule pages example of Transport/Modulation ion channel ( Relatively straightforward)
 * Action: Emek will write up a document describing the problem and send to the list. It may be possible to use existing biopax for this.


 * Add cellularLocation to EntityReference class. This would be a list of all possible locations, in the same way that entityFeatures are listed in EntityReferences. Tim Jewison example ( Relatively straightforward - no data source?)
 * Action: add this in next biopax maintenance release (e.g. 3.1). This will better synchronize PhysicalEntity and EntityReference.


 * Degradation is not stoichiometric
 * Emek will look into this further. Suzanne brought up.


 * Move physiological direction from catalysis to control
 * Action: promote this to control in next maintenance release.


 * ControlType Controlled Vocabulary Details ( Long term, talk to SBML)


 * There is no easy way to reference GO molecular function annotation in Level 3. You cannot store a controlled vocabulary object on a physical entity or physical entity reference. ( Ask for data sources - INOH, Molecular Role)


 * To limit nextStep property semantics to mean "next step(s) in the same pathway" (i.e., all pathway step processes must be also pathway components); create a subclass (or new independent class) of nextStep property, e.g., outerStep, i.e., "next step(s) to the other (neighbor) pathway" (i.e, outer step’s processes should not be this pathway components). ( Reactome uses cross pathway step, can be inferred looking at the participants, ask this to data providers)


 * Shall we have a new isGeneric (boolean) property to clearly distinguish generics from normal PhysicalEntity/EntityReference? (because member* and xref properties, which could help to differenciate, aren't, unfortunately, always used...)( Ease of use, flagging, labeling)


 * Convert E.C. numbers to controlled vocabulary (long term)
 * Action: make this change in level 4 - it is not a backwards compatible change with level 3.


 * Agree on one chemical structure data type


 * Discuss influence on influence and related


 * move datatype properties to classes whose value can be specified by a single datatype value 'has value' (refactoring of ontology)


 * Remove organism from Pathway - redundant as the members can also have organisms and in the case of host/patogen pathways leads to semantic inaccuracies.
 * Priority :low