Show index.php syntax highlighted
<?php
if(empty($SITEDEF_H)){require('SITEDEF.php');}
if(empty($PARAM_H)){require('getPARAM.php');}
require('SSI_GDBprep.php');
virtual("${CGIPATH}SSI_GDBgui.pl/TWO_COLUMN_HEADER/" . $SSI_QUERYSTRING);
?>
<STYLE TYPE="text/css">
p{ font:normal 12pt Verdana,Arial,sans-serif;}
hr{ clear:both; width:95%; }
td { font:normal 10pt Verdana,Arial,sans-serif;}
h2 { font:bold 14pt Verdana,Arial,sans-serif;
clear:both; padding-top:15px;
}
h3{ font:bold 12pt Verdana,Arial,sans-serif; clear:both; padding-left:5px;}
h3#tim{ clear:right; }
h4 { font:bold 10pt Verdana,Arial,sans-serif; clear:both; padding-left:10px;}
a { color: blue; }
a.btt {float:right; clear:both;}
img.leader{ float:left; }
img.leader_pic{ float:left; margin-right:5px; border:2px solid blue;}
img#trans_img{ width:250px; }
p.student {clear:left; padding-top:5px;}
div#background_info{ float:left; clear:left; width:48%;}
div#definitions{ float:right; width:48%;}
div#dna_rep{ float:left; width:48%; }
div#mrna_trans{ float:right; width:48%; }
</STYLE>
<DIV ID="mainWLS">
<H1 style="text-align:center;">Spring 2004: Discovering Gene Structure</H1>
<!--
<table border="1" !cellpadding="0" !cellspacing="0" bordercolor="#111111" !width="94%" id="AutoNumber1" !height="241" bordercolorlight="#FFFFFF" bordercolordark="#FFFFFF" bgcolor="#FFFFFF">
<tr>
<td width="20%" height="241"><p><a href="#def">Definitions</a><p>
<a href="#dna" >DNA Replication</a>
<p><a href="#pro" >Protein Synthesis</a>
<p><a href="#back" >Background of Bioinformatics</a>
<p><a href="#fin" >Finding the Problem</a>
<p><a href="#anno" >Annotation Process</a>
<p><a href="#ex" >Examples</a>
<p><a href="#kids" >Who Are We?</a><p> </td>
<td width="180%" height="241" bordercolor="#FFFFFF" bordercolorlight="#FFFFFF" bordercolordark="#FFFFFF" nowrap>
-->
<DIV id='abstract'>
<a name="a_abstract"></a>
<p id='abstract'><img class='leader' src='./watson.gif'>Hello and welcome to the world of bioinformatics! We are Xin Pan and Anna
Kurkalova and we did an internship at ISU in Dr. Brendel's lab in the spring of 2004. Our
internship included analyzing and annotating gene sequences as well learning the biological
processes of DNA transcription and working our way through research papers. Bioinformatics
has a wide range applications but our internship was mainly focused on analyzing of and
correcting of gene sequences. Together we analyzed about 300 annotations. This internship
provided us with a great opportunity to experience this exciting new field of genetics.
We would like to thank Dr. Volker Brendel, Adah Ackerman and ISU for making this internship
possible. We also like to thank Shannon Schlueter and Matthew Wilkerson for all of their help
with the annotation processes.
</p>
</DIV>
<HR>
<DIV id='background_info'>
<a name="a_background"></a>
<H2>Background of Bioinformatics</H2>
<P id='background'>
<ul id='background_topics'>
<li class='topic'>Bioinformatics defined
<ul class='comment_list'>
<li class='comment'>Modern bioinformatics is broadly comprised of three main disciplines
<ul>
<li>biological science</li>
<li>computer science</li>
<li>applied statistics</li>
</ul>
<li class='comment'>Bioinformatics itself is defined as the use of computers to analyze
biological information. The most common form of bioinformatics is studying the vast amounts
of DNA, RNA, and protein sequence that are now available. There are many other possible
applications of computers in biology, such as simulating populations, analyzing experimental
gels and storing information about the phenotypes of mutant organisms.</li>
</ul></li>
<li class='topic'>General objectives
<ul class='comment_list'>
<li class='comment'>To be able to explain normal biological processes through
understanding of how gene sequences code specific proteins</li>
<li class='comment'>To further drug discoveries by analyzing the cause of malfunctions
leading to a diseases condition</li>
</ul></li>
<li class='topic'>General Principles
<ul class='comment_list'>
<li class='comment'>Molecular biology provides the information to be analyzed</li>
<li class='comment'>Computer science supplies the tools and networks for managing,
analyzing, and storing this information</li>
<li class='comment'>Applied statistics enables us to compare and evaluate the information
and the results of analysis in which it is used.</li>
</ul></li>
<li class='topic'>History
<ul class='hist_list'>
<li class='chrono_event'>(1865) Gregor Mendel "The Father of Genetics" begins his study of
genetic inheritance which goes on to spur countless others and launches a new field of science</li>
<li class='chrono_event'>(1868) Friedrich Miescher discovers "nuclein" in the cell nucleus,
acidic, rich in PO4, lacks S (characteristic of protein). Now known as nucleic acid</li>
<li class='chrono_event'>(1953) James Dewey Watson and Francis Harry Compton Crick propose the
double helix model for DNA based on x-ray diffraction data.</li>
<li class='chrono_event'>(1953) Frederick Sanger, E. O. P. Thompson and Hans Tuppy completed the
determination of the amino acid sequence of the A and B chains of insulin</li>
<li class='chrono_event'>(1958) Francis Harry Compton Crick announces that information flows from
DNA to RNA to protein "The Central Dogma of Genetics".</li>
<li class='chrono_event'>(1961) Sidney Brenner, François Jacob, Matthew Meselson, identify
messenger RNA.</li>
<li class='chrono_event'>(1990) The Human Genome Project is underway</li>
</ul></li>
<li class='topic'>Computers Languages
<ul class='comment_list'>
<li class='comment'>Computer languages supply the tools for organizing the vast amounts of data
collected from / by researchers.
</H3>Commonly used programming, markup, and scripting languages</H3>
<ul class='comment_list'>
<li class='comment'>HTML</li>
<li class='comment'>XML</li>
<li class='comment'>C/C++</li>
<li class='comment'>PERL</li>
<li class='comment'>Java</li>
<li class='comment'>PHP</li>
</ul></li>
</ul></li>
<li class='topic'>Databases
<ul class='comment_list'>
<li class='comment'>The first bioinformatic/biological databases were constructed a few years after
the first protein sequences began to become available. A huge variety of divergent data resources of different
types and sizes are now available either in the public domain or more recently from commercial third parties.
All of the original databases were organized in a very simple way with data entries being stored in flat files,
either one per entry, or as a single large text file. </li>
</ul></li>
<li class='topic'>Tools
<ul class='comment_list'>
<li class='comment'>Concurrent to the development of databases tools became available for searching sequence
databases and matching and alignment sequences.</li>
</ul></li>
</ul>
</P>
</DIV>
<DIV id='definitions'>
<a name="a_definitions"></a>
<H2>Definitions</H2>
<P id='definitions'>
<ul id='def_list'>
<li class='def'>
<span class='term'>Gene:</span>
<span class='description'>Segment of DNA that controls the expression of a protein</span>
<ul class='comment_list'>
<li>We don't know how many genes there are</li>
<li>Characteristics are usually created by many genes, not just one</li>
<li>Genes interact with each other</li>
</ul>
</li>
<li class='def'>
<span class='term'>Genome:</span>
<span class='description'>All the genes of a particular species</span>
</li>
<li class='def'>
<span class='term'>Eugenics:</span>
<span class='description'>An event that has tried to control human evolution by breeding</span>
</li>
<li class='def'>
<span class='term'>Exons:</span>
<span class='description'>Coding segments of nucleic acidfound in mRNA</span>
</li>
<li class='def'>
<span class='term'>Introns:</span>
<span class='description'>Segments of non-coding nucleic acid found in mRNA</span>
</li>
<li class='def'>
<span class='term'>DNA:</span>
<span class='description'>Deoxyribonucleic Acid is a nucleic acid that carries
the genetic information in the cell and is capable of self-replication and
synthesis of RNA. DNA consists of two long chains of nucleotides twisted into a
double helix and joined by hydrogen bonds between the complementary bases
adenine and thymine or cytosine and guanine. The sequence of nucleotides
determines individual hereditary characteristics.</span>
</li>
<li class='def'>
<span class='term'>Codon:</span>
<span class='description'>Three consecutive bases codes for an amino acid, there are 64 combinations but
only 20 different amino acids, meaning that there is more than one combination
for every amino acid.</span>
</li>
<li class='def'>
<span class='term'>Stop Codon:</span>
<span class='description'>Three base pairs that stop the chain of amino acids</span>
</li>
<li class='def'>
<span class='term'>RNA:</span>
<span class='description'>Ribonucleic Acid is polymeric constituent of all
living cells and many viruses, consisting of a long, usually single-stranded
chain of alternating phosphate and ribose units with the bases adenine, guanine,
cytosine, and uracil bonded to the ribose. The structure and base sequence of
RNA are determinants of protein synthesis and the transmission of genetic
information.</span>
<H3>There are three forms of the RNA:</H3>
<ul class='def_comment_list'>
<li class='def'>
<span class='term'>tRNA:</span>
<span class='description'>Clover shaped molecules that bring in one kind of amino acid to the codons, tRNA
are made out of anti-codons, which match up with its compliment codon on the
mRNA</span>
</li>
<li class='def'>
<span class='term'>Messenger RNA (mRNA):</span>
<span class='description'>RNA that is synthesized in the nucleus and processed in the
endoplasmic reticulum. mRNA is the single-stranded complement of DNA. The only
difference is that mRNA has the base uracil instead of thymine</span>
</li>
<li class='def'>
<span class='term'>Ribosomal RNA (rRNA):</span>
<span class='description'>RNA that is a permanent structural part of a ribosome.</span>
</li>
</ul>
</li>
<li class='def'>
<span class='term'>Ribosome:</span>
<span class='description'>An organelle which consists of RNA and proteins and is found on the outside of
the rough endoplasmic reticulum</span>
</li>
<li class='def'>
<span class='term'>Polypeptide:</span>
<span class='description'>A small protein that containing many molecules of
amino acids, typically between 10 and 100.</span>
</li>
<li class='def'>
<span class='term'>DNA polymerase:</span>
<span class='description'>Any of various enzymes that function in the replication and repair
of DNA using single-stranded DNA as a template</span>
</li>
<li class='def'>
<span class='term'>RNA polymerase:</span>
<span class='description'>A polymerase that catalyzes the synthesis
of a complementary strand of RNA from a DNA template, or, in some viruses, from
an RNA template.</span>
</li>
<li class='def'>
<span class='term'>cDNA:</span>
<span class='description'>Called complementary DNA, cDNAs are synthesized by RNA polymerase in a process
similar to DNA replication.</span>
</li>
<li class='def'>
<span class='term'>ORF:</span>
<span class='description'>Open Reading Frames. Reading frames where successive
nucleotide triplets can be read as codons specifying amino acids and where the
sequence of these triplets is not interrupted by stop codons.</span>
</li>
<li class='def'>
<span class='term'>BLAST (Basic Local Alignment Search Tool):</span>
<span class='description'>A set of similarity search programs which use heuristic
algorithm to seek out local alignments and is designed to explore all of the
available sequence databases regardless of whether the query is protein or DNA.</span>
</li>
<li class='def'>
<span class='term'>GenBank:</span>
<span class='description'>A database containing all known sequences of DNA
strands, categorized by alphanumeric code.</span>
</li>
<li class='def'>
<span class='term'>GeneSeqer:</span>
<span class='description'>a method to identify potential exon/intron
structure in pre-mRNA by splice site prediction and spliced alignment.</span>
</li>
<li class='def'>
<span class='term'>UCA:</span>
<span class='description'>user contributed annotation</span>
</li>
<li class='def'>
<span class='term'>Alternative splicing:</span>
<span class='description'>The cutting and pasting of the primary mRNA transcript
into various combinations of mature mRNA.</span>
</li>
<li class='def'>
<span class='term'>GAEVAL:</span>
<span class='description'>The Genome Annotation EVALuation project was created
to assign qualityscores to gene structure predictions and to note exceptional cases of
incongruence.</span>
</li>
</ul>
</P>
</DIV>
<a class='btt' href="#top">Back to Top</a>
<DIV id='dna_rep'>
<a name="a_dna_rep"></a>
<H2>DNA Replication</H2>
<P id='dna_rep'>
<img class='leader' src='./replication.jpg'>
<ol id='dna_rep_steps'>
<li class='step'>DNA uncoils and "unzips"</li>
<li class='step'>DNA polymerase then reads the "unzipped" strands of DNA and produces
a reverse complement which is attached to the single strand of original DNA. The
reverse complements are shown in green.</li>
</ol>
</P>
</DIV>
<DIV id='mrna_trans'>
<a name="a_mrna_trans"></a>
<H2>mRNA Transcription</H2>
<P id='mrna_trans'>
<img id='trans_img' class='leader' src='./transcription.gif'>
<ol id='mrna_trans_steps'>
<li class='step'>Transcription occurs in nucleus
<H3>There are 3 stages of transcription</H3>
<ul id='trans_stages'>
<li class='step'>Initiation</li>
<li class='step'>Elongation</li>
<li class='step'>Termination</li>
</ul>
</li>
<li class='step'>RNA Processing (maturation) "edits" the pre-mRNA by splicing
out the intronic sequence from the pre-mRNA transcript. Once the mRNA is fully matured
it leaves the nucleus becoming venerable to thousands of enzymes, a methoguanine (MG)
Cap is added to the front, and a Poly-A tail is added to the 3' terminus to prevent
premature degradation.</li>
</ol>
<H3 id='tim'>DEMO: Transcription in motion</H3>
<img id='trans' src='./transcription_mov.gif'>
</P>
</DIV>
<DIV id='prot_syn'>
<a name="a_protein_syn"></a>
<H2>Protein Synthesis (aka. mRNA Translation)</H2>
<P id='prot_syn'>
<img class='leader' src='./expression.gif'>
<ul id='protein_syn'>
<li class='step'>Translation occurs via the ribosome where mRNA is "read" and
polypeptides are formed</li>
<li class='step'>The ribosome travels 5' to 3' on the single stranded mRNA
helping to generate the protein polypeptide from amino(N)-terminus to
carboxy(C)-terminus.</li>
<li class='step'>Translation occurs in such a way that multiple ribosomes
can read the same strand of mRNA at once thus generating multiple copies of the
encoded polypeptide in a short time.</li>
<li class='step'>Polypeptides produced in the ribosome are usually routed to
the Golgi Apparatus, which is the "post office" of the cell seeing to their proper
delivery.</li>
</ul>
</P>
</DIV>
<a class='btt' href="#top">Back to Top</a>
<a name='internship'></a>
<H2>The Project</H2>
<H3>Defining the problem</H3>
<p>We'll start with the easiest and most frequently observed case, which is
when there is one or more full length cDNAs. Full length cDNAs are those that
are experimentally derived such that they should capture the entire span of
their mRNA precurser. Therefore these sequences should be as long as or longer
than their predicted gene model annotation. We are interested in any differences
between the alignment of these sequences and the predicted gene model.
<ul id='example_problems'>
<li class='problem_case'>If just a few exons on the original annotation do not agree
with those on the cDNA, the cDNA is almost always correct.</li>
<li class='problem_case'>If there is a cDNA that spans the length of two or more
original annotations, then it most likely means that the two or more annotations
need to be joined. This can be verified by prediction of an ORF.</li>
<li class='problem_case'>As well, there may be one or more short cDNAs
that do not span the length of the original annotation; this means that there is
a chance that the original annotation needs to be split. However, be sure to
check for ORFs once more. If the ORF for the original annotation is longer than
your corrected annotation, this may be an exceptional case which needs further attention.</li>
</ul>
</p>
<a name="a_annotation_process"></a>
<H3>Processing our Annotations</H3>
<H4>Once a problem is found:</H4>
<p id='annotation_process'>
<ol id='annotation_steps'>
<li class='annot_step'>Make sure you are using the most recent genome assembly version. This
version information can be found in a pull-down menu at the top of every AtGDB page.</li>
<li class='annot_step'>Now click on Provide Expert Annotation.</li>
<li class='annot_step'>If you are not yet a registered user, click on Register HERE! Otherwise,
log in and continue.</li>
<li class='annot_step'>Type in a LOCUS ID (generally these begin with UCA- followed by the
sister gene model id.[eg. UCA-At2g23500]).</li>
<li class='annot_step'>Now click on the exons (the thick blue lines) that you believe to be
part of an accurate gene structure description to add them individually to your gene structure.
Click on the mRNA gi number in order to add the whole series of exons predicted by its alignment
to the your UCA structure.</li>
<li class='annot_step'>You can now verify your UCA structure by checking for an
open reading frame. Do this by clicking on the ORF Finder button. If one open reading
frame is noticeably longer, select it. If there is no obvious case, use BLAST
(see tutorial on www.plantbdb.org/AtGDB) to determine the proper ORF.</li>
<li class='annot_step'>Write a brief description commenting on your reasons for submitting this
altered annotation. Here is a template: <i>This annotation corrects the current annotation
for gene model ***. This modifies *** by doing ***. It appears that this error was caused
by ***. These changes are supported by ***.</i></li>
<li class='annot_step'>Once this is done, you are finished. You should now click SUBMIT.
Once an AtGDB curator has seen and accepted your UCA you will be notified of its public
availablity.</li>
</ol>
</p>
<a name="a_examples"></a>
<H3>Example User Contributed Annotations</H3>
<p id='examples'>Shown below are examples of annotations we've corrected as part of our internship
experience.
<H4>Incorrect / Unannotated Exon:</H4>
<img class='example_img' src="Exampl3.gif">
<ul>
<li>If you look at exon 10 on the cDNA (light blue), you will see that the gene model annotation
(dark blue) shows this region to be intronic. When counting exons you start from the end with the
green flag and count to the end with the red flag. In this case, just submit the coordinates for
the cDNA.</li>
</ul>
<H4>Gene Model Needs to Be Split:</H4>
<img class='example_img' src="Exampl4.gif">
<ul>
<li>By looking at the cDNAs (light blue) and ESTs (red), you can see that there is a definite
gap. In cases like this where the gap is very clear cut, it means that most likely the mRNA should
be split. However, still make sure to check the open reading frame for verification.</li>
</ul>
<H4>Gene Models Need to Be Combined:</H4>
<img class='example_img' src="Exampl5.gif">
<ul>
<li>If there is a cDNA that spans the length of two or more original annotations, then it most
likely means that the two or more annotations need to be joined. Verify with ORF. The green strip
shown in the example is a user contributed annotation.</li>
</ul>
<H4>Ambiguous Boundary:</H4>
<img class='example_img' src="Exampl6.gif">
<ul>
<li>Annotations like this represent possible errors in gene structure
determination caused by the automated gene structure annotation routines used
for <i>Arabidopsis</i> genome annotation. Specifically, this situation occurs
when an EST or cDNA is aligned such that it may belong to either of two
overlapping annotations.</li>
</ul>
<H4>Alternative Splicing:</H4>
<img class='example_img' src="Exampl7.gif">
<ul>
<li>As you can see from the cDNAs, there are two possible
annotations for this mRNA. One possible annotation has 12 exons while
the other has only 11 (exons 2 and 3 are combined). This is an example of
alternative splicing, and both annotations should be submitted</li>
</ul>
</p>
<a class='btt' href="#top">Back to Top</a>
<a name="a_interns"></a>
<H2>Interns: spring 2004</H2>
<p class='student'><img class='leader_pic' src="./Anna_pic.jpg" alt='Annas photo'>
<b>Anna Kurkalova</b> is a currently a junior at Ames High school. Anna
enjoys an academic challenge; she is currently taking AP Physics, Honors
American Literature, Pre-Calculus, French III, 2-D Art, and Western
Civilization. This summer, Anna will be participating in another
internship in which she will work 40 hours a week. Next Fall Anna will be
taking two ISU classes, Introduction to Design and Drawing I. Anna also
participates avidly in extra-curricular activities including Key Club, S.H.E.F.
(Students Helping to Eliminate Hunger), Dance, and Fashion Show. On
weekends, she enjoys ignoring homework, relaxing, sleeping and hanging out with
friends.</p>
<p class='student'><img class='leader_pic' src="./Xin_pic.jpg" alt='Xins photo'>
<b>Xin Pan</b> is a currently a sophomore at Ames High school. Xin
really enjoys an academic challenge; he is currently taking AP Physics, AP
Calculus, AP U.S. History, Honors English10, Spanish II, and Orchestra.
This summer, Xin plans on talking Physics 221, Differential Equations, and
attending HOBY. Next Fall Xin will be talking Calculus III and Physics
222. Outside of school, Xin participates in Science Olympiad, Math League,
GPML, and ARML. On weekends, Xin enjoys playing football, basketball,
tennis, and hanging out with friends. </p>
<a class='btt' href="#top">Back to Top</a>
</DIV>
<?php
require('SSI_GDBprep.php');
virtual("${CGIPATH}SSI_GDBgui.pl/STANDARD_FOOTER/" . $SSI_QUERYSTRING);
?>
See more files for this project here