Host-Pathogen Interactions

Host-Pathogen Interactions are curated by the PathPort project (http://pathport.vbi.vt.edu/) based on an XML DTD based format called MINET (Molecular Interaction NETwork). For details on the MINET format, refer to http://pathport.vbi.vt.edu/minet/.

Thank you all for the comments on these requirements. We are actually identifying shortcomings in our DTD as we brainstorm requirements to map MINET to BioPAX. In this email, I summarize the requirements and comments received so far. I also provide my responses to many of your comments. Sorry it took me this long to put together this email.

Requirement 1: We have physical entities such as cell membrane, spore, etc. that do not fit into any of the BioPAX supported physical entities. To solve this, we may go by (1) supporting a comprehensive list of physical entities OR (2) providing a general physical entity called "Other" where we could fit everything else.

Andrea: How many physical entities and, are they defined in/mspped to some ontology ?

Ken: I think this should be represented as an instance of other ontologies (hierarchical control vocabularies), such as the OBO libraries. What I heard from Barry Smith is OBO will introduce an upper ontology and OBO foundary ontologies (subset of OBO libraries) will be placed under this upper ontology. My very rough idea is, rename controledvocabulary class to a class something like "referenceentity" under which we import other ontologies. That would be easier for us to coordinate with the OBO effort in the future.

Mathias: I am also developing a foundational ontology for Biology which is based on DOLCE and SKOS (I have already posted about it on the HCLSIG mailing list). The current version of this ontology is (among other things) a 'translation' of BioPAX to the design principles of DOLCE. A development version can be downloaded from http://neuroscientific.net/index.php?id=download You can also download extensions that already include the vocabularies from many OBO ontologies. Here, these ontologies are expressed with SKOS [1], which is a basic ontology for the description of taxonomies and concept hierarchies. The concepts from these hierarchies can be used to annotate individuals (e.g. with terms from the Gene Ontology). During the mapping from BioPAX, the SKOS has replaced the 'controlled vocabulary' class of BioPAX. This has the advantage that hierarchies and class-subclass relations (like that of the Gene Ontology) can be represented with ease. I think it would make sense to integrate SKOS into the BioPAX ontology, too. [1] http://www.w3.org/2004/02/skos/ I might add that trying to import the ontology into Protege will not work at the moment, because the ontology owl:imports the DC vocabulary from protege.stanford.edu, which seems to be down at the moment. Bad luck. Hopefully it is fixed soon.

Emek: Can you please provide us with concrete examples of reactions where such entities take part in ? They sound more like a compartment rather than entities to me. For example I can not see how a cell membrane can take part in a reaction ( although a particular lipid in a cell membrane can do).

Gary: Yes, more examples would be useful. Listing more physical entities in BioPAX has been on the table since the start, but other things took center stage.

Dan: For 2.1, I'd generalize this to say: "physicalEntity" must be replaced by a class tree more indicative of all subclasses 2.x will actually need - molecules, pools, structures - whatever. Proposed: Add "object" types for "pools", "spores" & "membranes"; plus "substance", subclassed into "compound", "element" & "mixture". Matthias and I have both recently cited upper ontology terms like these as a way to improve 2.0, which today tries to express such distinctions using English names and comments - and fails.

Harsha (Response): We are working on getting a complete list of all physical entities curated into MINET documents. I will send them once I have them. Currently these physical entities are not mapped to any ontology. But it looks like that’s something useful for us to add/correct. We are familiar with GO and are checking out DOLCE and SKOS for this purpose. Thanks for the links. Once we have the list of all the MINET physical entities, we can try to find if they map to one of the ontologies like GO cellular component. We do have example interactions using these physical entities (such as cell membrane, spores, etc.). But they are rather loosely/broadly defined. We are thinking these should actually be defined as a process (or sub-pathway) rather than as interaction. For example, sporulation is defined by an interaction as			(sporulation) (e.g., Vegetative cell                         spores) But may be we should define sporulation as a pathway and define the various steps involved in sporulation as interactions. As Gary says that adding new physical entities has been on table from beginning, it may take priority at some point. Do we have a rough idea for what version/level we may revisit this?

Harsha: Requirement 2: XREF currently supports publication, relationship, and unification. We would like to specify Curation. At some point if we start curating pathways directly into BioPAX, we will need to support curation/curator information.

Andrea: Curation sounds more like an evidence information on pathway description (hence pathways or ubparts) than a reference.

Ken: I agree. INOH has several evidence codes for curation information. But I think Harsha is requesting slots like "author", "affiliation". I'm not sure if that kind of information should be included in a "public release" version of your data. We have those slots in INOH but ...

Emek: This is a long standing requirement and I believe is very important. Especially at one point we start expanding data we get from other sources. One option is what Ken suggested, that we just keep a reference to the original database and then the original database takes care of tracking from this point on. This has two drawbacks though: Pathway databases, unfortunately, tend to be quite volatile. What happens if I do not want to build and maintain such a service, instead want to provide just a model.

On the other hand this is a very authorship rights/law related issue, and as a scientist I am not feeling really comfortable with it. Maybe we can use Science Commons framework for that?

Gary: This also ties to provenance, which can get quite complex. For instance, who is the author of a record that was taken from one source and modified by a 3rd party? The last time I thought about this, I felt that this was more up to the databases, as Ken said. There are many information types that are more useful locally than globally e.g. internal IDs to track who has edited what record and when. However, if many people need to exchange this information, I would favor a very simple approach. One potential way to do this would be to use the Dublin core (http://dublincore.org/documents/dcmi-terms/), though I find that overly complicated. Maybe a simple author field would do, as a list of author objects. Harsha, could you give us more information on how you use author and what you need? E.g. what happens when there are multiple people editing the record over time? E.g. BIND uses the NCBI Author object, which is very detailed. Just a part of the NCBI author object description is here: http://www.ncbi.nlm.nih.gov/dtd/NCBI_Biblio.mod.dtd Some information about the use of it in BIND are available in the BIND curation manual: http://www.blueprint.org/bind/bind_curation.html Dan: For 2.1, I'd rephrase as: BioPAX must clearly distinguish "natural physical" classes from "abstract model" classes, and insist all the source(s) be defined for any valid model. (Iff curators impact a model's content, they are a source). Rationale: Gary makes good points on the difficulties, but to demand good provenance seems central to BioPAX's use cases. If a model's source is unclear, science-wise it is garbage.

Gary: Hi Dan, All of BioPAX is an 'abstract model', there are no 'natural physical' classes there, though many of the abstract model classes have names that sound like the natural physical classes they are abstracting, like "protein", but that is just for lack of better names - purely a naming issue. This matches the way basically all of the pathway databases store information, so BioPAX follows this to be effective at data exchange. There are clear limitations of this representation e.g. can't model dynamic systems. That is what the SBML and CellML languages are for and we aim to minimize overlap there. Harsha (Response): I think there is an important distinction that some of the comments seemed to mix up. The distinction between publicationXref and curator of a biopax document. A curator or an editor of a document may reference several publications (that are static, never change). This is handled well in the publicationXref I believe. However, we need to include curation information for a document. We currently provide curation information using the following entities and attributes: 's children Name	Cardin. Req. Curators 1	Yes Date 1	Yes Version 1	Yes Note 1	No Revision M	No ContactInfo 1	Yes

http://pathport.vbi.vt.edu/xml/molecules/molecules.dtd.html Gary, to answer your question on multiple people editing a document over time, we currently just add him to the list of curators. This has not been an issue for us since we curate internally only. However, like you said, we may have to think of a standard solution when documents are being modified and redistributed. We need to look at DC and science commons (that Ken suggested) more closely.

Harsha: Requirement 3: In a conversion reaction, does BioPAX currently provide for a way to distinguish reactants from cofactors in LEFT? products from released factors in RIGHT?

Emek: I can not see why cofactors should ever be listed on Left or Right of a reaction. Do you have particular reasons/examples ?

Gary: cofactors can be distinguished from reactants by using the COFACTOR property in the catalysis class - is this insufficient for your needs?

Dan: For 2.1, I'd say BioPAX *must* do so. Reactants/products are physical (cause/effect) roles; RIGHT/LEFT express only sides of somebody's 2D diagram. They need untangling. Opinion: 2.0 conflates several physical and abstract things, especially in "utilityClasses". Such fictions need replacing in 2.x with abstract or concrete classes that actually exist. Otherwise BioPAX itself will end up (like these hybrids) as figments of imagination and nothing more.

Mathias: That's a bit problematic, as I think BioPAX uses LEFT/RIGHT for good reasons. The direction of a reaction can sometimes be unknown, or it can be subject to change depending on other parameters. When the direction of the reaction is unknown, you need two properties to distinguish 'side 1 of the reaction' from 'side 2 of the reaction', and honestly I cannot come up with a name for a property that is significantly less ambiguous than 'LEFT' and 'RIGHT'. I would also like to point out that reactions in the real worlds mostly are stochastic processes that involve large pools of molecules, which blurs the distinction between reactants and products even further. If you want to move a way from the abstraction of reaction diagrams on paper, you also need to move away from thinking about reactions as non-stochastic, singular events, I think. Some nitpicking: I would not see reactants/products in a cause/effect relation, as both are endurants (continuants) and not perdurants (occurrents). In the DOLCE foundational ontology at least, only perdurants can take part in a causal relation.

Harsha (Response): Actually, the current BioPAX design w.r.t. LEFT and RIGHT may just be fine. Our curators are now looking back into the definitions of cofactors and released factors. We do not have examples of cases where we need to define cofactors for interactions of type other than Catalysis. However, we need to think if it is an important requirement to distinguish released factors from products of a reaction. (e.g., cofactor ATP resulting in released factors AMP + PPi in a main reaction). When we computationally manipulate biopax for reactants or products, we may want to avoid the many mis-hits to ATP, AMP, PPi, etc. This is possible if we distinguish reactants and products from reactants and products respectively. This is definitely not something critical… but may be something to think about?

Harsha: Requirement 4: We want to encode host-pathogen interactions in BioPAX format. Meaning, interactions occurring in 1 or 2 species grouped together into a single pathway. I believe we can encode biological source of a participant for each interaction using the BioSource object. Is that right? However, I do not see BioSource as an attribute of any other entity (e.g., dna, rna, protein). Let me know what I am missing here.

Emek: This is an important requirement, to generalize: addressing multicellular phenomena. It was brought to table several times on the list and meetings, and always left for a future release. The problem is it often requires quite a bit of extra structures into the ontology and is a major task.

Gary: BioSource is attached to all physical entity types except for small molecule - under the (probably poorly named) 'organism' property. This was basically copied from PSI-MI a long time ago. Do you need more information stored?

Dan: I'd re-express this by saying 2.x must define ways to encode biological interactions and pathways *more generally*, onto which extensions and instances are more readily mapped. Concern: Our mission is to define a good exchange format for user-defined models of "pathways". That doesn't mean BioPAX can define all legal PARTICIPANTS and interactions, merely good mapping target(s) and annotation requirements. Like Paul, I sense "mission creep" is a problem, central to recent delays and tensions within the working group.

Harsha (Response): Gary, the ‘organism’ property may just be what we want to encode host-pathogen interactions. I will let know after we actually try encoding our pathways that way. Should we change the property name to something more accurate like bioSource?

Harsha: If some of these requirements are worth discussing further, would we have time at the CSHL meeting?

Emek: I am hoping to concentrate on tasks at hand. These probably would not make it into this release anyway. But that is just my vote. Others?

Dan: In a F2F at CSHL, we could use your requirements to revisit and rethink BioPAX design goals, governance principles, etc. Indeed, your needs may be a useful catalyst for changes that boost design freedom and release speed, by adding modularity. Would everyone not win, and sooner, I wonder, if 2.x targeted more rigorous OWL standards, under which ANY "well described" model of ANY biological pathway might be exchanged, but under simpler PARTICIPANTS and interactions (fewer constraints and types) than we might otherwise specify? Both you and DX folk might then publish BioPAX 2.x extension types, in distinct name spaces created soon and in *parallel*. Rough versions of such sub-ontologies might come out quickly, yet competing ones could emerge more easily later, as science models, reasoners, and use cases all independently evolve.

Reponse: Looks like the agenda is shaping up to accommodate this discussion briefly as also Gramene. Will wait until then for further discussion.

General Comments: Paul: I looked at http://sciencecommons.org/licensing/ and found the site may not be very active in updating and new discussions. Last post on the user list was May 2005. I would hope that science commons would be more active. The key word in so many processes nowadays is "standards" and "open standards".

Intellectual property is supposed to serve a limited role, allowing proper return on investment for a limited period of time. But at times, the rewards for ownership do not reflect the efforts from which the knowledge was produced. This has become a problem for science and for the universities, reflected in the shift from the old university and teaching models to "universities as businesses". Some of us feel that there is a real loss to everyone from this shift. However, basic science may need to be not encumbered, at all, with IP issues. There is an argument that science operates best when IP rights are fully protected. But as we all know, most scientists like to think that there is a community value in science and this community value does not always have protection. Do royalty rights compensate your undergraduate instructors? Having made an opening to a brief discussion about this issue of "ownership", I would like to raise the question of "what is a compartment". If I might ask informally each who would like to respond to give a very general definition of compartment reaction participant physical entity with our eye on abstract ontological specification. I notice the Barry Smith's model has been discussed from time to time; and to a certain point his model is useful. However, the abstract ontology (the concepts that one might have about the world) for these four concepts seems to require some "staging.

I remember several months ago Alan reminding us that the purpose of BioPax was to serve as a data integration of many of the cell and gene expression data bases. Part of my remembrance of this discussion was that certain hard questions of science also did not easily get resolved in the BioPax model for this data integration. I discussed Gerald Edelman's notion of response degeneracy and I have also discussed the role that non-locality (such as field gradients) seem to play in the course of a pathway expression. These are very general concepts which some feel are too theoretical for discussion in this forum. However, it is the only contribution I can make at this time.

Much of the recent work is being done in your face to face meetings, and I have not been able to attend. So I guess I am asking for a moment to pause and to reflect on the original mission, which was to integrate existing databases, and on the possible mission creep. My sense was that the core ontology was going to be recognized as having 1) solved certain modeling problems, such as physical entities that are participant in certain well understood reactions 2) allowing some disclosure of where open problems are not solved

The issue of IP is important in this context; since we (or our employers) often come to own new IP for solving or claiming to have solved specific specification of process or expression properties; but less and less reward is made for those who are able to stand back and see what problems are not solved, what problems may not be solvable using OWL models, and what problems are still open scientific issues - such as those related to non-locality.

Paul_1: would a subset of the Dublin Core be sufficient to the curation information Perhaps of all "common ontology" the Dublin core is most successful, most stable and most complete. One does not have to use all of the parts of the Dublin Core information model. I wonder if a single concept might be in BioPAX that pointed to a completely separated information store where Dublin core ontology (this does not have to be OWL as there is little "inferencing" (ie production of information that is not explicit already) using just Dublin Core ontology). Reuse is the key, and to develop some curation information that is part of BioPAX seems inconsistent with a desire to focus on the scientific value of the biological data. How do other feel about this. Does anyone have experience in separating off the curation information with only a pointer between a Dublin core record set and individual data in an OWL ontology.