A. Genome-wide
analysis of rice and Arabidopsis cDNAs for translational signals
Eukaryotes
possess complex regulatory mechanisms of mRNA translation for modulating gene
expression in a wide range of biological situations. Well developed
translational regulation is made possible by separating translation from
transcription, accomplished by the nuclear membrane, and by the use of
different start and stop sites for transcription and translation. A
consequence of the latter organization is the existence of additional gene
structures called untranslated regions (UTRs) present at both ends of the
messenger RNA.
UTRs
contain elements that in combination with proteins or small RNAs modulate
translation of individual mRNAs without affecting global protein synthesis.
Several features of the 5’-leader sequence can influence mRNA translational
efficiency, of which the nucleotide sequence or context surrounding the AUG
codon and the presence of AUGs / open reading frames (ORFs) upstream of the
main translation initiation site are the most important (these are termed
uAUGs and uORFs).
To assess
the level of 5’-UTR mediated translational regulation in rice, a genome-wide
computational analysis of rice 5’-UTRs was carried out. Combinatorial analyses
of start-codon context, uAUG context and context of uAUGs of upstream open
reading frames of individual genes indicate that about 34 % of genes in rice
are likely to be influenced at translational level by signals present in
5’-UTR as they possess uAUG/uORFs with sequence context conforming to the
consensus sequence
B. Development of
Translatebase - the database of translational signals in plants
The data
generated in the genome-wide analyses and the genome sequence of rice were
used for construction of a relational database using MySQL relational database
management system and Bio::DB::GFF schema. The database is linked to the
Gbrowse visualization tool which consists of several components: At the top
level is a CGI (Common Gateway Interface) script named gbrowse, which is
responsible for managing the user interface. This script generates the HTML
forms that the end-users interact with, accepts and processes requests,
manages the cookies that preserve users' preferences from session to session,
and displays the rendered images of annotated regions.
Currently
the Translatebase database contains 5’-UTR based translational signal
annotations for the japonica rice genes. The translational signal annotations
are shown in the context of the general genome annotations provided by the
rice annotation project which was from RAP-DB. For each signal the genomic
location, strand, AUG sequence context, context strength and the reading frame
with respect to the cDNA. The number of full length cDNA represented in the
database are 1,80,376. All redundant / splice variant cDNAs are included.
The
Translatebase database can be accessed at the address: http://www.ricebrowse.org.
The database uses the Generic genome browser as an engine. The genome browser
graphically displays a section of the genome and all features annotated on it.
The user can zoom in and out and scroll through the genome and click on
features to obtain more detailed information. Users can specify a genome
segment for displaying, e.g. chr1:1000..9000, or query the database by
entering a keyword including wild card characters, e.g. alkaline*. This query
will return a list of matches to the search term. For example, to find the
translational signals, start codon, uAUG and uORF for the gene AK069738 one
would query the database with AK069738 (which will fetch the cDNA and all
signals associated with this cDNA). For specific querying of translational
signals the accession number (AK069738) is appended with –AUG, -uAUG and -uORF,
for the start codon, upstream AUG and upstream ORF, respectively. By clicking
on one of the signals in the list the user will see the section of the genome
where the signal occurs. Annotated translational signals are displayed as
segments (boxes). Each feature (segment representing the AUG, uAUG, uORF)
is labeled by an identifier which is the RAP accession number (of the cDNA to
which it belongs) appended with –AUG (for the start codon), -uAUG (for the
upstream AUG) and -uORF (for the upstream ORF). Double clicking the cursor on
the segment opens a page with detailed information about the signal. For the
start codon and upstream AUG this information includes the chromosomal
position, sequence context, sequence context strength and the reading frame.
For the upstream ORF, chromosomal position, its start codon sequence context,
its start codon sequence context strength, the reading frame and the ORF
sequence. TRANSLATEBASE is being tested
C. Laboratory
validation predicted translational signals
Work in progress