Our genomes are not linear. They are instead a millefeuille of overlapping layers of information. By ignoring this layered structure, we risk misinterpreting patterns of genome evolution but also misidentifying the molecular bases of disease. In this thesis, I have studied a particular aspect of such multiple coding, namely, overlaps between coding sequence (CDS) and regulatory elements (notably splice signals). I have asked two major questions. Firstly, could CDS evolution be constrained not only by the need to preserve regulatory information but also by the need to avoid inappropriate signals? I hypothesized that intronless genes could be under selection to avoid exonic splice enhancers (ESEs) to prevent inappropriate processing of the mRNA. Surprisingly, not only did I find no evidence of avoidance, ESEs in intronless genes are rather selectively maintained, underlining the splicing-independent roles of these motifs. I then broadened my investigations to a large set of motifs that had been shown experimentally to be recognized by various RNA-binding proteins (RBPs) in humans. I indeed found that whereas target motifs to some RBPs (notably exonic binders) were conserved in human CDSs, others were avoided in CDS evolution (notably intronic and UTR binders). Therefore, it appears that avoidance of regulatory signals indeed constrains CDS evolution. Secondly, I asked how frequent functional ESEs were in human CDSs. No previous evolutionary studies had explicitly provided this estimate, as they had not disentangled the frequency of functional motifs from the strength of the selection that maintains them. Using a variety of approaches, I determined that roughly 15-20% of human four-fold degenerate sites overlap with a functional ESE, with most of them under unexpectedly strong selection. The need to preserve ESEs thus severely constrains a considerable proportion of CDS.
|Date of Award||31 Oct 2018|
|Supervisor||Laurence Hurst (Supervisor)|