Gene Regulation Proposal/Discussion

Comments from Peter Karp and response from Gary Bader - May 31 2006 Peter Karp wrote: > Here are some comments on the gene regulation proposal. > > Overall it looks reasonable... > > P > > > > 1. Gene expression is regulated in many ways. Although the document > at present is oriented mostly toward regulation of transcription > initiation and regulation by small RNAs, I think the document should > state that many other mechanisms are known, including attenuation, > mRNA degradation, etc, and that these others are left for future work.

Good point, I added this to the future directions section.

> 2. Not clear if class Gene is intended to represent the prokaryotic or > eukaryotic notion of gene, or a generalization of the two.

It is meant to be a generalization of the two. I clarified this and added a note on how we may want to further specify gene in the future.

> 3. In the Notes section, re: "In cases where multiple regulators work > in conjunction" -- this is unclear. I would like to see examples of > cases this is and is not meant to handle.

I clarified the text by separating the cases of multiple regulators acting together but not known to form a complex and multiple regulators acting together as part of a complex. Examples of each are now given.

> In Proposed Implementation: > > 4. In class geneRegulation, it states "A DNA or RNA sequence > recognition site involved in the complex binding site must be known" > -- but what about indirect regulation cases? Presumbly it is not > required then?

Correct, this has been clarified.

geneRegulation class is now defined as "A control interaction in which a physical entity increases (activates) or decreases (inhibits) the expression of a gene. This can be direct, e.g. transcription factor activates DNA coded gene, or indirect, e.g. kinase activates gene. If the interaction is direct, a binding site must be defined, even if it is unknown. If the interaction is indirect, a binding site must not be defined."

Also, Mediating-Interaction has range complexAssembly. > Why specify a mediating interaction rather than simply the controller > (the complex) itself? Are both mean to be specified, or only the > mediating interaction? I'd argue that only the complex should be > specified, since presumbly for every complex there is at most one > complexAssembly that produces it. Further, this strikes me as too > specific in the case of indirect interactions, when the controller > might be a monomer or an RNA and not a complex.

We decided to move the definition of binding site to the molecular state proposal (still being worked on) as a more general definition of binding site. The same information can be described, but it is now part of the physical entity definition. This is simpler, though we are still testing the approach using worked examples.

> 5. Regarding sequenceBindingSite and bindingPair, are these meant to be > optional or required? I'd bet that most of the time, the sequence > binding site location on the transcription factor is unknown. If > they're meant to be optional, make that clearer, and I think it's fine.

Yes, binding information is optional. There is an open issue related to this though:

* This would help differentiate direct from indirect cases, though would incur a cost of forcing users to define many unknown binding sites. However, if the interaction is known to be direct, usually the binding site is known (because that is often the proof that the interaction is direct). The latter statement needs to be verified by checking actual databases.
 * Should direct interactions require the definition of a binding site, even if it is unknown?

> 6. There are probably some aspects of gene regulation as defined in > EcoCyc that this ontology can't capture. Examples: > a) Which sigma factor is involved in a given regulation event, and > where is the promoter (start of transcription)?  We capture these > aspects by (a) defining complexes between sigma factors and RNA polymerase, > and defining complexAssembly events between the preceding complexes and > the promoters.  It is actually that event that we specify transcription > factors as regulating.

We are defining gene regulation as a shorthand so it assumes that 'normal' non gene or operon-specific transcription machinery is present. So RNA polymerase would be automatically described. Do you think this is sufficient, or are there cases where you want to describe different RNA pol complexes for different genes of the same class? In eukaryotes, at least, there are 3 RNA polymerase complexes for different gene classes e.g. mRNA coding vs. ribosomal RNA coding. But these would be considered normal given the class (implicit information).

> b) In bacteria, TFs themselves are usually regulated by small molecules > that bind to them, for example, we write reactions in which > TF + small-molecue = TFcomplex,   and TFcomplex is what binds to the > DNA to activate or inhibit transcription.  Although this representation > is probably compatible with the ontology you've defined here, it's not > crystal clear that this is the case, and I think that this point should > be clarified via comments.

Yes, we can deal with this. Would you be able to send me an example like this from EcoCyc? I will use it in one of the worked examples.

> c) This ontology also doesn't explicitly define operons (although I > did see the note about multicistronic regulation), nor locations of > transcription terminator sites, which EcoCyc does.

Correct, we thought it would be simpler not to define these now because of the complications generalizing these across prokaryotic and eukaryotic genes. I think it would be very useful to look into in the future. I added more uses of the word operon in the proposal and beefed up the future directions section on this point.

I will know more about how well this translates from EcoCyc once I create some worked examples using EcoCyc records.

> General BioPAX comments: > o I suggest that comments on complexAssembly should clarify the > following items: Can assembly reactions always be written in either > direction, or should the RIGHT side always have one member, that > is should they always be written in the direction of assembly? > If either direction, then for every complexAssembly, either LEFT > or RIGHT should have one value that is an instance of class complex (clearer > than the current text about class complex), and the other should have two or > more values (or exactly two? clarify that too).

This is the current comment for complexAssembly in BioPAX - is this clear enough or do you think we need more?

Definition: A conversion interaction in which a set of physical entities, at least one being a macromolecule (e.g. protein, RNA, or DNA), aggregate via non-covalent interactions. One of the participants of a complexAssembly must be an instance of the class complex (via a physicalEntityParticipant instance). Comment: This class is also used to represent complex disassembly. The assembly or disassembly of a complex is often a spontaneous process, in which case the direction of the complexAssembly (toward either assembly or disassembly) should be specified via the SPONTANEOUS property.

Alan: It needs to be unambiguous which regulations are known to be direct, for example, via transcription factors, versus which are indirect, for example signal which eventually leads to increased trancription.

Alan: The definition of a gene, as written, is one that describes information (something that can be encoded), which can not be properly called a physical entity. Suggest either rephrase the definition, or move the class somewhere appropriate. (Dan Corwin caught this one) [2006-06-12, Alan: Noticed that Gary changed the definition of gene in the genetic interaction proposal to address this concern. Given the current definition, Gene should be defined as a subclass of the union of DNA and RNA: SubClassOf(bp2:Gene unionOf(bp2:DNA bp2:RNA ))

Alan: In many cases we know that the gene is, in fact, made of DNA, such as when referring to deletion strains. How is this information represented?

Alan: The geneRegulation class is unsatisfiable, because it says that all CONTROLLED are genes. However from the definition of CONTROLLED, we have that it's range is interaction or pathway. However interactions and pathways are disjoint from physicalEnties, so there is a contradiction. Probably the there should be a geneExpression reaction which is the object of the CONTROLLED. (Recommendation to designers of proposals - run the reasoner to check them, and use the species validator to verify that they are OWL-DL)

Andrea: why should a Gene anyway be the subClass of physicalEntity ? A Gene is better characterize as something that can characterize a set of physical entites. Is there a case in which when talking about a Gene we refer to some specif molecule ? It may be, but no more than you refer to a CD to talk about a software.

Beside, what do we gain considering a gene a physicalEntity ? And we risk several level of inconsistencies.... can a gene encode itself ?