- The unequal use of synonymous codons for encoding amino acids, has been found in many organisms,
both prokaryotes and eukaryotes. This bias varies considerably among organisms and even within the
genes of the same organism. In different species, codon bias was found to be in weak correlation
with gene expression level. Two main processes were proposed to explain codon bias: natural
selection acting on silent changes in DNA, mutational bias, or both. In unicellular organisms,
as well as in some eukaryotes, many studies in the past decade show that codon bias favoring
codons with high tRNA-gene copy number rises with expression level. This suggests an action of
selection on codon bias to improve translation efficiency. However, this idea has not been
confirmed in mammals. Although a weak correlation between gene expression level and codon bias
has been observed by Urrutia and Hurst (2003) in the human genome, this relation has not been
linked to tRNA abundance. In this study we present evidence, based on tRNA gene copy numbers,
suggesting that selection acts on codon bias in the human genome.
- namely the computational methods for finding the locations of protein-coding regions in
uncharacterized genomic DNA sequences (Fickett, 1996, Salzberg et al., 1998, Pevzner, 2000, Mount, 2001),
is one of the most important issues in bioinformatics. For a given DNA sequence of an organism, in which
the genes and other functional structures are not already known, it is very important to have an accurate
and reliable tool, for automatic annotation of the sequence: the number and locations of genes, the
location of exons and introns (in Eukaryots), and their exact boundaries (Claverie, 1997). Therefore,
along with standard molecular methods, many new methods for finding distinctive features of protein-coding
regions have been proposed in the past two decades (Fickett and Tung, 1992; Fickett, 1996). In this
research we examine the use of signal processing approach for gene prediction. In part of this research
we shall introduce improved predictors, based on wavelets and Fourier analysis.
The objectives of this research are:
- To extract new discriminative features of DNA sequences for the identification of protein-coding regions and other functional structures, using signal processing methods. In this research Fourier analysis and wavelets will be the main methods used.
- To construct improved gene predictors based on the above characteristics. These predictors will be based on adaptive algorithms, fitted to each organism. Such algorithms will be initialized using a small database of the organismís known genes and their parameters will be updated while identifying more genes.
- To explore the new discriminative features of protein coding regions along the phylogenetic tree, and examine the possibility of classifying organisms, according to these features.