************
Introduction
************
**sideRETRO** is a bioinformatic tool devoted for the detection
of somatic (*de novo*) **retrocopy insertion** in whole genome
and whole exome sequencing data (WGS, WES). The program has been
written from scratch in C, and uses `HTSlib `_
and `SQLite3 `_ libraries, in order to
manage SAM/BAM/CRAM reading and data analysis. The source code is
distributed under the **GNU General Public License**.
Wait, what is retrocopy?
========================
I can tell you now that retrocopy is a term used for the process
resulting from **reverse-transcription** of a mature **mRNA**
molecule into **cDNA**, and its insertion into a new position on
the genome.
.. image:: images/retrocopy.png
:scale: 50%
:align: center
Got interested? For a more detailed explanation about what is
a retrocopy at all, please see our section :ref:`Retrocopy in a
nutshell `.
Features
========
When detecting retrocopy mobilization, sideRETRO can annotate
several other features related to the event:
Parental gene
The **gene** which **underwent retrotransposition** process,
giving rise to the retrocopy.
Genomic position
The genome **coordinate** where occurred the retrocopy
**integration** (chromosome:start-end). It includes the
**insertion point**.
Strandness
Detects the orientation of the insertion (+/-). It takes into
account the orientation of insertion, whether in the
**leading** (+) or **lagging** (-) DNA strand.
Genomic context
The retrocopy integration site context: If the retrotransposition
event occurred at an **intergenic** or **intragenic** region - the
latter can be splitted into **exonic** and **intronic** according
to the host gene.
Genotype
When **multiple** individuals are analysed, annotate the
events for each one. That way, it is possible to
**distinguish** if an event is **exclusive** or **shared**
among the cohort.
Haplotype
Our tool provides information about the ploidy of the event,
i.e., whether it occurs in one or both **homologous** chromosomes
(homozygous or heterozygous).
How it works
============
sideRETRO compiles to an executable called :code:`sider`,
which has three subcommands: :code:`process-sample`,
:code:`merge-call` and :code:`make-vcf`. The :code:`process-sample`
subcommand reads a list of SAM/BAM/CRAM files, and captures
**abnormal reads** that must be related to an event of retrocopy.
All those data is saved to a **SQLite3 database** and then we come
to the second step :code:`merge-call`, which **processes** the database
and **annotate** all the retrocopies found. Finally we can run the
subcommand :code:`make-vcf` and generate an annotated retrocopy
`VCF `_.
.. code-block:: sh
# List of BAM files
$ cat 'my-bam-list.txt'
/path/to/file1.bam
/path/to/file2.bam
/path/to/file3.bam
...
# Run process-sample step
$ sider process-sample \
--annotation-file='my-annotation.gtf' \
--input-file='my-bam-list.txt'
$ ls -1
my-genome.fa
my-annotation.gtf
my-bam-list.txt
out.db
# Run merge-call step
$ sider merge-call --in-place out.db
# Run make-vcf step
$ sider make-vcf \
--reference-file='my-genome.fa' out.db
Take a look at the manual page for :ref:`installation `
and :ref:`usage ` information. Also for more details about
the algorithm, see our :ref:`methodology `.
Obtaining sideRETRO
===================
The source code for the program can be obtaining in the `github
`_ page. From the command
line you can clone our repository::
$ git clone https://github.com/galantelab/sideRETRO.git
No Warranty
===========
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
`GNU General Public License
`_
for more details.
Reporting Bugs
==============
If you find a bug, or have any issue, please inform us in the
`github issues tab `_.
All bug reports should include:
- The version number of sideRETRO
- A description of the bug behavior
Citation
========
If sideRETRO was somehow useful in your research, please cite it:
.. code-block:: bib
@article{10.1093/bioinformatics/btaa689,
author = {Miller, Thiago L A and Orpinelli, Fernanda and Buzzo, José Leonel L and Galante, Pedro A F},
title = "{sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies}",
journal = {Bioinformatics},
year = {2020},
month = {07},
issn = {1367-4803},
doi = {10.1093/bioinformatics/btaa689},
url = {https://doi.org/10.1093/bioinformatics/btaa689},
note = {btaa689},
}
Further Information
===================
If you need additional information, or a closer contact with the authors -
*we are always looking for coffee and good company* - contact us by email,
see :ref:`authors `.
Our bioinformatic group has a site, feel free to make us a visit:
https://www.bioinfo.mochsl.org.br/.