Retrocopy in a nutshell

In the late 1940s, Barbara McClintock discovered the controlling elements, later known as transposons [1].


Image of Barbara McClintock. Cold Spring Harbor Laboratory Archives. Copyright © 2016 by the Genetics Society of America

These elements, also called transposable elements (TEs), collectively comprise more than half of mammals’ genome [2] and for humans, approximately two-thirds of the 3 billion base pair genome are the outcome of TEs activity [3]. TEs are subdivided in DNA-transposons and retrotransposons, and the latter being the result of retrotransposition process [4] [5]. Those classes of TEs can be autonomous or non-autonomous according to the presence or absence of their own enzymatic machinery of (retro)transposition, respectively. In retrotransposons, the most prominent autonomous elements are LINEs (Long Interspersed Nuclear Elements), and from the non-autonomous class, they are SINEs (Short Interspersed Nuclear Elements) together with processed pseudogenes or retrocopies of mRNAs (retrotransposed protein-coding genes).


LINEs became the most frequent transposable element, in number of nucleotides, corresponding to approximately 17% of the human genome [6]. In our genome, the most numerous family of LINEs is LINE-1 (L1) and when its sequence is full-length (about 6 kb), this element has: i) one promoter region; ii) a 5’UTR region; iii) two coding regions (ORF1p and ORF2p); iv) a 3’UTR region; v) a poly-A tail inside its transcript; vi) and recently a distinct ORF (ORF0, which is 70 amino acids in length, but still with unknown function) was found in primates [7] [8].


ORF1p encodes a RNA-binding protein, responsible for the mRNA binding specificity, and ORF2p encodes a dual function protein working as reverse transcriptase and endonuclease. Together, the coding regions of L1s are accountable for shaping the retrotransposase and this machinery can operate in cis making retrocopies of the element itself, or in trans retrocopying non-autonomous repetitive elements, like SINEs and mRNAs transcripts [9] [10]. In this process, from mRNA, a cDNA is generated (by retrotranscription) and then randomly inserted back to the nuclear genome, giving birth to a (retro)copy from the original/parental element.


SINEs, one of the elements retrotransposed by L1 retrotransposase, account for approximately 11% of the human genome and its most frequent family is Alu with average length of 300bp [11]. Alu is a primate-specific element and has (when in full-length mode) 5’ end with internal hallmarks of RNA polymerase III linked by an A-rich region to a 3’ end with an oligo-dA-rich sequence that acts as target to the reverse transcription [12]. As well as SINEs, retrocopies of coding genes depend on L1 machinery and they are one of the major sources of de novo genetic variations [13], potentially contributing also to genetic diseases [14]. Nowadays, we know that retrotransposition events are very frequent in many organisms, with more than 1 million copies of Alu [9] and more than 7,800 retroduplication events of coding genes in our genome [15] [16].

Retrocopy and diseases

In somatic cells, retrotransposition events are repressed by post-transcriptional and epigenetics modifications, but the temporary loss of these controls can lead to new insertions resulting in structural modifications accountable for diseases, as colorectal and lung cancers [17] [18] [19]. Recently, some authors showed that, in tumorigenic process, there is a strong correlation between colorectal cancer (CRC) progression and the loss of methylation in regions containing LINEs, from the most methylated (normal mucosa) to the least methylated (CRC metastasis), suggesting that LINEs could act as an important marker for CRC progression [20] [21]. Alu elements are also rich in CpG residues and, as in LINEs, the methylation of these elements appears to decrease in many tumors contributing to the development of diseases by either altering the expression of some genes in several ways, disrupting a coding region or splice signal [11]. In 2016, Clayton et al. [22] showed a potentially tumorigenic Alu insertion in the enhancer region of the tumor suppressor gene CBL in a breast cancer sample [22]. However, although many studies have highlighted Alu elements as sources of genetic instability and their contribution to carcinogenesis [23] [24], other high throughput studies have hidden Alu elements due to the difficulties in developing efficient methods to identify these elements in a tumorigenic context [11]. Retrocopies were also described in tumorigenic context, as the classical case of PTEN and its retrocopy PTEN1 [25]. In this paper, Poliseno and others show the critical consequences of the interaction between PTEN and PTENP1, where the retrocopy (pseudogene) is active, regulates coding gene expression by regulating cellular levels of PTEN and is also selectively deleted in cancer. Therefore, finding these retrotranscribed elements became very important in understanding their potential functions in tumorigenesis and tumor heterogeneity.

References and Further Reading

[1]MCCLINTOCK, B. (1950). The origin and behavior of mutable loci in maize. Proceedings of the National Academy of Sciences of the United States of America, 36(6), 344–355.
[2]BURNS, K. H. (2017). Transposable elements in cancer. Nature reviews. Cancer, 17(7), 415–424.
[3]DE KONING, A. P. J. et al. (2011). Repetitive Elements May Comprise Over Two-Thirds of the Human Genome. PLoS genetics, 7(12), e1002384.
[4]KAESSMANN, H. (2010). Origins, evolution, and phenotypic impact of new genes. Genome research, 20(10), 1313–1326.
[5]HELMAN, E. et al. (2014). Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome research, 24(7), 1053–1063.
[6]LANDER, E. S. et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921.
[7]HANCKS, D. C. and KAZAZIAN, H. H. (2016). Roles for retrotransposon insertions in human disease. Mobile DNA, 7(9).
[8]DENLI, A. M. et al. (2015). Primate-specific ORF0 contributes to retrotransposon-mediated diversity. Cell, 163(3), 583–593.
[9](1, 2) BATZER and DEININGE. (2002). Alu repeats and human genomic diversity. Nature reviews. Genetics, 3(5), 370–379.
[10]KAESSMANN, H. et al. (2009). RNA-based gene duplication: mechanistic and evolutionary insights. Nature reviews. Genetics, 10(1), 19–31.
[11](1, 2, 3) DEININGER, P. (2011). Alu elements: know the SINEs. Genome biology, 12(12), 236.
[12]BAKSHI et al. (2016). DNA methylation variation of human-specific Alu repeats. Epigenetics: official journal of the DNA Methylation Society, 11(2), 163–173.
[13]BECK et al. (2010). LINE-1 retrotransposition activity in human genomes. Cell, 141(7), 1159–1170.
[14]LEE, E. et al. (2012). Landscape of somatic retrotransposition in human cancers. Science, 337(6097), 967–971.
[15]NAVARRO, F. C. P. and GALANTE, P. A. F. (2013). RCPedia: a database of retrocopied genes. Bioinformatics, 29(9), 1235–1237.
[16]NAVARRO, F. C. P. and GALANTE, P. A. F. (2015). A Genome-Wide Landscape of Retrocopies in Primate Genomes. Genome biology and evolution, 7(8), 2265–2275.
[17]MIKI, Y. et al. (1992). Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer research, 52(3), 643–645.
[18]SOLYOM, S. et al. (2012). Extensive somatic L1 retrotransposition in colorectal tumors. Genome research, 22(12), 2328–2338.
[19]COOKE, S. L. et al. (2014). Processed pseudogenes acquired somatically during cancer development. Nature communications, 5, 3644.
[20]SUNAMI, E. et al. (2011). LINE-1 hypomethylation during primary colon cancer progression. PloS one, 6(4), e18884.
[21]HUR, K. et al. (2014). Hypomethylation of long interspersed nuclear element-1 (LINE-1) leads to activation of proto-oncogenes in human colorectal cancer metastasis. Gut, 63(4), 635–646.
[22](1, 2) CLAYTON, E. A. et al. (2016). Patterns of Transposable Element Expression and Insertion in Cancer. Frontiers in molecular biosciences, 3, 76.
[23]DEININGER, P. L. and BATZER, M. A. (1999). Alu repeats and human disease. Molecular genetics and metabolism, 67(3), 183–193.
[24]BELANCIO et al. (2010). All y’all need to know ‘bout retroelements in cancer. Seminars in cancer biology, 20(4), 200–210.
[25]POLISENO, L. et al, (2010). A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature, 465(7301), 1033–1038.