URI-CV Discussion

Definition and intended usage

 * All entities in a BioPAX file have an associated URI.
 * BioPAX codify the URIs for metadata elements (i.e.: the ontology). There is no specific requirement in BioPAX for which URIs should be used for the instance pathway information. But there is - in RDF!

Current Practice

 * Because BioPAX has been used mostly (as its name suggests) for data exchange, individual URIs are not necessarily stable nor even make sense, can be re-generated with every data import/export iteration. The current BioPAX development trend, however, seems to be towards the "linked data" (Semantic web)
 * URI style varies across databases. Some (e.g.: Reactome) adopt URIs whose string include the name of the entity they refer to. Some (e.g.: BioCyc) adopt URIs which are based on a convenience code (i.e.: they are semantically opaque).
 * URIs of entities change as the information is transferred from one database to another (for instance, when ractome is imported in pathwayCommons, all URIs are in the PC name space).
 * The carrier of identity is not the URI itself, but the UnificationXref associated to the object.

Observations and Problems
* URIs that are related to "Utility Classes" are essentially blank Nodes. They don't need to conform to some restrictions. * URIs which are related to different biological entities, may be required to be the same across different databases. (see open questions). * URIs which relate to ontologies could be directly imported from the ontology providers.
 * From Requirements for BioPAX URIs: URIs from a same source should be different for different entities. This seems not to be always the case (e.g.: Reactome RDF files of the "same"  entity from different organisms share the same URI). It may be part of BioPAX specifications to guarantee such basic requirements.
 * There are different classes of URIs in BioPAX, some related to Biological entities, some related to biological "concepts" (e.g. ontologies), and some which are related to features of BioPAX itself (Utility Classes).
 * The URL for BioPAX are in namespaces like: http://www.biopax.org/release/biopax-level2.owl . The trailing ".owl" is unusual in a namespace URL and confuses format and namespace.
 * In general, the namespace in BioPAX doesn't need to be the location of the document on the web: the former explains "what", the latter "where".

Open questions

 * For URIs that refer to biological entities ? Should these be kept across databases ? Or should they change ? Probably it is better to consider them different, and unify them at a later stage.
 * Can we just use URIs for CV terms ? In a sense, these may correspond to classes, which would break OWL when merged with BioPAX (where, as the model is now, they would be rather instances of terms). Do we really need to consider this ? Or is this something that can be assessed at an interpretation/stage out of a (generically inconsistent) RDF ?
 * Can we remove 'xref' froperty from ContolledVocabulary (thus - all its sub-classes) and make it a sub-class of Xref?

Proposals
* If an information provider uses persistent and maintained URIs, and these are used in a pathway database, use the URIs of the orginal provider (for instance, if proteins are identified by Uniprot identifiers, use the Uniprot URIs.). As a side discussion it's true that these URIs refer to potentially different things. But the reference is the same, and the context is different (this can be discussed more). * If a new URI is produced, follow a stated convention (optional) * URIs should be stable and resolvable (optional) * 1 star: No commitment to URIs * 2 stars: Persistent URIs * 3 starts: Persistent URIs which are resolvable (URLs).
 * URIs could follow a set of best practices, ad a preliminary example:
 * We cannot "force" providers to adopt specific URIs or deployment technologies, but could adopt the 5 star models which has been proposed in the OpenData context . There are not strict requirement in the management of URIs from data providers, but the exported BioPAX files could be labeled with 1-x stars, reflecting different levels of integrability. For instance:

Notes and references

 * OBO ID policy (versions and URIs): http://obofoundry.org/id-policy.shtml


 * [1] http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/


 * OKKAM http://www.okkam.org/


 * MIRIAM identifiers may soon supporting resolvable URL to support Linked Data