Semantic web/linking/CVs

Wishlist for data integration
The goal of the Semantic Web/Linking/CV is to improve BioPAX to facilitate data integration. In the specific, the goal is to make simple to integrate and query knowledge bases which include pathways represented through BioPAX and related information (for instance GO and other ontologies). In a first instance, this group is intended to focus on simple improvements in the specification, that can yield a relevant benefit for common use cases.

Meetings
First conference call: SemWeb January 2011 Friday 28th January 2011, 15GMT

Use cases

 * Query across integrated resources for annotation of particular entities (e.g.: all pathways a given protein participates in)
 * Qualitative pathway analysis

People involved

 * Andrea Splendiani (BioPAX)
 * Sorin Draghici (Pathway analysis)
 * Andrew Gibson (Data integration, knowledge representation, ontologies, see peroxisomekb)
 * Michel Dumontier (Bio2RDF)
 * Alan Ruttenberg (OBO Foundry)
 * Nick Juty (MIRIAM, SBO)
 * Camille Laibe (BioModels.net)
 * Nadia Anwar (BioPAX)
 * Marco Antoniotti (RetroNet)
 * Igor Rodchenkov (BioPAX, PathwayCommons)
 * Sujatha Mohan (Rapid)

Shared relations
BioPAX is currently missing high-level relations which can be shared with other information resources, for instance: part-of, is-a, participates-in... (possibly the relations defined in the OBO ontology). Having shared relations is possibly the most important improvement to simplify data integration, and can perhaps be obtained via a relatively simple layer of inference on top of BioPAX, without the need to modify the current specifications. Adoption of shared relations requires an analysis of the semantics of relations in BioPAX vs, for instance, GO.

Objectives:
 * explain how it's different from using RelationshipXref and RelationshipTypeVocabulary (and what's problem with those)
 * definition of a set of high-level relations to be adopted in BioPAX
 * analysis of the implications of integrating OBO ontologies and BioPAX
 * definition of "rules" to produce these relations from the current BioPAX representation.

Status:

Versioning
The current versioning policy in BioPAX ha several problems which are, as adoption increase, going to create confusion in integrated knowledge bases. In particular it should be possible to refer to specific revisions of the ontology and to track changes. This task aims at specifying how versioning should be handled in BioPAX. A few ideas are listed here:
 * Each subrelease of BioPAX should have its own URI, with a default pointing to the last version. Example:
 * bioPAX/ -> latest version
 * BioPAX/3 -> latest version level 3
 * BioPAX/3/1 a specific version)
 * Versioning information could be in the ontology header
 * Namespaces should change only when concepts change: a query for  should work on level1-level2 (and perhaps level3 as well).
 * naming convention for the ontology without "owl" trailing: http://www.biopax.org/release/level3/biopax.owl#protein -> http://www.biopax.org/release/level3/biopax#protein

Objectives:
 * policy for versioning to be included in the main BioPAX specifications

Status: Draft RFC ready (available on googledoc to interested parties) Discussion on Versioning

URIs&CVs
While resources that represent pathways in BioPAX make de facto use of URIs, there is no convention or requirements about these URIs: having conventions on URIs would simplify the integration of different BioPAX files, and other resources (e.g.: GO). In particular conventions should address a range of issues as: Convention on URIs should possibly be framed in larger efforts (sharednames ? linkeddata ?).
 * how to compose a URI
 * which URIs to re-use
 * should be URIs resolvable, and if so, to what ?

Controlled Vocabularies, which are usually ontologies, pose an additional layer of complexity as, together with shared relations, they have implications on the consistency of pathways descriptions.

Objectives:
 * policy for URIs to be included in the main BioPAX specifications
 * further action points on the integration of (OBO) ontologies and BioPAX.

Status: Draft RFC ready (available on googledoc to interested parties). URI-CV Discussion

Documents:
 * Requirements for BioPAX URIs

Standardization
BioPAX uses its own predicates even when standard predicates are widely used, for instance for labels (biopax:NAME as opposed to rdfs:label). At the same time it doesn't follow common practice in the annotation of artifacts (e.g.: Dublin Core). The scope of the task its to adopt standard representation, where there is an overlap in BioPAX or where it's easy to extend the specifications. Examples:
 * use of rdfs:label instead of (or in addiction to) biopax:NAME
 * introduction of "standard" metadata: e.g.: Dublin Core.
 * N-triples support

Objectives: Proposed amendement to the BioPAX specification

Status: Started Standardization discussion

Other action points
Other minor action points this workgroup may be interested in are:
 * Modularity: different namespaces could be used for different aspects of BioPAX, this facilitating extensions (for instance, to add visualization related data). Also the use of named graphs could be explored.
 * Deployment: sparql endpoints / Linked Data

Resources

 * temporary list to resources relevant for Semantic Web intergation of BioPAX files

Known Issues

 * PathwayCommon BioPAX files are inconsistent in Protege and gives errors when parsed in triplestore.
 * This is most likely caused by a wrong usage of rdf:ID instead of rdf:about in PathwayCommon files. (rdf:ID must be unique in a xml base namespace, but is defined more then once in some BioPAX files). While this will be fixed in the next release, if you want to use BioPAX level2 files from PathwayCommons just use this simple script.