GenericParticipants

This page attempts to provide a comprehensive set of examples, to be used as use cases for modeling generic participants in BioPAX. Generic participants are groupings of molecules that are often grouped and referred as a single actor in the literature. Please try to select examples that generate the use-case space.

Truly Generic Participants:

These are groupings of participants that are formed often through polymerization or random aberrations. Their instances can not be (feasibly) enumerated.

Name: Glycogen

Source: Generic Textbooks (e.g. online Stryer at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=stryer.chapter.2911)

Notes: Although the structure of the “complete” glycogen is single, virtually infinite sub-structures occur during its metabolism. It is an example of generic participants by polymerization, also applicable to other polysaccharides, cytoskeleton proteins (i.e. tubulin) and fatty acids.

It is worth noting that sometimes entities of the participants are also generic. For example in the example of removing a glucose from a glycogen, left and right sides of the reaction contains the same entity, however clearly they are different molecules. Although semantically we can perceive that, this is nowhere in the ontology, so we can not claim that we are modeling glycogen metabolism unambiguously. But are these two molecules different participants? Same comments apply to other examples below where the entity itself is generic.

Name: MDO, Membrane Derived Oligosaccharide D-glucose

Source: BioCyc (http://biocyc.org/ECOLI/NEW-IMAGE?type=COMPOUND&object=MDO-D-GLUCOSE )

Note: From BioCyc site “MDOs represent a class of periplasmatic oligosaccharides occuring in gram-negative bacteria. The detailed structures are still under investigation, and cyclic, branched, and open-chain variants have been found, usually containing on the order of a dozen D-glucose units in β-D-glucosidic linkages. They are decorated to varying degrees with phospho-sn-glycerol, phosphoethanolamine, phosphocholine, succinate, and methylmalonic residues that impart negative charges. MDOs are implicated in osmoregulation.” This is very similar to glycogen, however it is worth noting that it has two ( generic ) participants in biocyc, an unmodified and a glycerophosphorylated one.

Name: Ubiquitinated protein

Source: See Cell. 2004 Jan 23;116(2):181-90, for a review, although there are already many. A protein gets ubiquitinated, to be marked for degradation. Essentially almost all proteins get ubiquitinated and degraded at one point, so enumerating all ub-participants are possible but not feasible. This is an example of generic participants by templates, and also applicable to many participants in RNA processing or protein trafficking. Although a protein with an unknown post-translational modification can currently be represented in BioPAX level 2. it is not possible to model degradation of the protein by proteoseome, unless we explicitly create a separate interaction for each protein.

Name: damaged DNA with 3' incision

Source: Reactome ( http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=109942& )

Notes: This is a classic example used in DNA repair pathways. It is worth noting that one really can not hope to model DNA repair pathways, without somehow modeling damaged generic DNA.

Name: immunoglobulins

Source: Various text books, a nice online resource at http://www.med.sc.edu:85/mayer/IgGenetics2000.htm

Notes: Classic example of combinatorial recombination for diverse immune response. This is a very special example and might be excluded from this document as a special case. However one also needs to think the extensions of this generic behavior, like MHCs, epitopes etc.

Homologies:

These are groupings of similar molecules, often belonging to different but evolutionarily homologous entities.

Name: WNT-Frizzled (paralog case)

Source: A nice but outdated review at: http://genomebiology.com/2001/3/1/reviews/3001 (PATIKA deals with this case explicitly)

Notes: 17 different wnts bind to 10 different fzd receptors, during development. Wnt signals are transduced through at least three distinct intracellular signaling pathways including the canonical 'Wnt/β-catenin' pathway, the 'Wnt/Ca2+' pathway, and the 'Wnt/polarity' pathway. Distinct sets of Wnt and Fzd ligand-receptor pairs can activate each of these pathways and lead to unique cellular responses. In general tissue specific control is performed by homologous proteins/molecules that activate the same downstream pathways. Growth hormones and their receptors, activating jak/stat is another example. It is also possible to extend the examples to non-development examples such as various families of annexins or proteoglycans.

Here instead of a generic entity, what we have is generic groupings of concrete participants of concrete entities. It is often unknown which Wnt is being assayed in an experiment, since almost all experiments in this field use cell extracts as a source of Wnt (it was impossible to purify specific active Wnts for a long time). These extracts are known to contain many different forms of Wnt, thus generic Wnt is an important aspect of Wnt pathway models.

Name: Hexokinase (ortholog case)

Source: KEGG, WIT

Notes: KEGG and WIT maintain generic pathways, which are composed of generic enzymes, represented by EC numbers (which define a generic class of reactions without any assumption about how they are implemented). An enzyme in one of these pathways represents all enzymes in all genomes that implement the reaction type specified by the EC number.

Analogies:

Name: dNTP

Source: Reactome

Notes: Members of these participants are consumed by a common generic reaction (DNA replication in this case). Although this is a common abstraction in literature and lab, its semantics are quite different then what we have considered so far. Similar examples are amino acids, or amino acid specific tRNAs, alcohols, etc. These molecule classes are captured by new chemical species ontologies, such as the one in ChEBI.

(Also, there is the R-group representation for chemicals, which defines a class of small molecules)

Semi-quantitative Modifications

These participants have multiple phosphorlyation sites, often found as repeats, to provide a quantitative measure

Name: NEFH

Source: http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=162230

Notes: The tail of the neurofilament heavy subunit is composed of a repeating amino acid motif, usually X-lysine-serine-proline-Y-lysine (XKSPYK), where X is a single amino acid and Y is 1 to 3 amino acids. There are 2 common polymorphic variants of 44 and 45 repeats. The tail probably regulates axonal caliber, with interfilament spacing determined by phosphorylation of the KSP motifs. (Thanks to Elif Erson for pointing this out).

Sufficient Modifications

For these participants to participate in a reaction, a certain number of variables are sufficient. Therefore any combination of remaining variables can take part in the reaction. Note that, in strict terms all of these molecules take part in separate chemical reactions, however it is not good complexity management to represent all of them separately. This is an example of a generic participant of a concrete entity. This entry is here because it was extensively discussed in BioPAX forum and meetings. There actually is a post claiming 100 phosphorylation sites at insulin receptor are possible and are biologically relevant. However specific well studied examples exist (e.g. Sic1 PMID: 11734846, or see http://www.mshri.on.ca/pawson/cellswitches.html for some more recent stuff, also note that behaviour in case of sic1 we still have no evidence of combinatorial behaviour, paper even uses words like at least 6 (p), which sounds like the semi-quantitative example above.).

Another example is BioNetGen’s egfr model, but there most complexity arises due to complex formation which is a separate issue in this document.) http://cellsignaling.lanl.gov/bionetgen/egfr_sos_plcg.in).

Another example is histone, which is modified in multiple places for chromatin regulation. If chromatin is considered a generic entity, then the histone code represents an incredibly complex set of PTMs. DNA methylation could fall under this category as well. Another example, cyclin-b1 pmid: 12612056. Four different serines on this protein gets phosphorylated by different pathways/contexts. Cyc-b1 can be in two different logical participants, active and inactive ( details are rather not relavent, but it has to do with the nuclear export rate of the molecule). Although paper does not explicitly describe it, it looks like different combinations might lead to different logical participants, thus forcing us to consider each combination separately.

Emek Demir>> I am not really satisfied with this part. I am still looking for a good example, where there is a significant complexity introduced by multiple CMs, and a context where a wild-card grouping of these participants is meaningful. Maybe anyone might help?

Generic Complexes

Generic participants can form complexes in such a manner that they create combinatorially many species, even though species of participants themselves can feasibly be enumerated.

Name: Initial events in EGFR signaling, accounting for Sos and PLCg activation.

Source: BioNetGen, (http://cellsignaling.lanl.gov/bionetgen/egfr_sos_plcg.in)

Notes: Generated enumeration contains ~5000 participants. ( This, as I get it is the number of different complex members, not complexes.) Generic complexes are abundant at phase/tissue control mechanisms, mostly because participants of type 2.2 are also abundant there. Do we want to describe ‘generic complexes’ or do we only want to create generic complexes automatically by constructing them out of generic components. This leads into a discussion of generic interactions, since complexes and interactions are closely related (a complex is the product of interactions)

Name: SCF Complex

Source: http://biop.ox.ac.uk/www/lj2000/endicott/endicott_04.html

Notes: SCFSKP2 is a complex of at least five subunits that is required for S-phase entry [3]. It is a member of a large family of SCF complexes which all contain the proteins Rbx/Roc1, Cul-1, SKP1 and Cdc34 (the E2 or ubiquitin-conjugating enzyme). The function of SCF complexes is to select and ubiquitinate specific proteins, a modification that targets them for destruction by the proteasome. Different SCF complexes are distinguished by the F-box protein present in the complex (in this study SKP2) that is responsible for substrate recognition.

Polymerization

Polymerization reactions from BioCyc DBs can't be represented. The BioCyc DBs represent some reactions with coefficients of N, N+1, etc. These are typically polymerization-type reactions. An example is EC# 2.3.1.85: acetyl-CoA + N malonyl-CoA + 2N NADPH + 2N H+ = a long-chain fatty acid + (N+1) CoA + N CO2 + 2N NADP+. Currently, these must be omitted from the biopax file (fortunately none of them are in pathways, so far). BioPAX STOICHIOMETRIC-COEFFICIENT is currently limited to type 'double'.