TAQLoRE - Transcript Annotation and Quantification using Long Reads¶

TAQLoRE is a Snakemake-based pipeline to improve existing annotations and to quantify transcripts coming from long read amplicon-based cDNA sequencing technologies (Oxford Nanopore Technologies, PacBio). It was tested on Linux (CentOS 6) but it should work on Mac as well. Briefly, it uses LAST to align all reads to the transcriptome, then it discovers new exons by looking at insertions in alignments, it creates meta-gene with all known and novel exons, aligns all reads to it and generates a TMM-normalised read counts, together with expression heatmaps and PCA plots. It also identifies new splice sites by looking at perfectly aligned reads to the genome, and correcting all splice sites to the closest most abundant canonical ones. For more information, refer to General concepts.

Installation¶

Source code: GitHub
Issue tracker: Issue tracker

Citations of dependencies¶

Our pipeline is based on following software:

Papers using the pipeline¶

The following papers/pre-prints that use our pipeline has been published:

Clark M, Wrzesinski T, Garcia-Bea A, Kleinman J, Hyde T, Weinberger D, Haerty W, Tunbridge E. bioRxiv 260562.

Authors¶

Developers:

Wilfried Haerty (Earlham Institute)
Tomasz Wrzesinski (Earlham Institute)

Contributors:

Elizabeth Tunbridge (University of Oxford)
Michael Clark (University of Melbourne)
Nicola Hall (University of Oxford)
Syed Hussain (University of Oxford)
Hami Lee (University of Oxford)

Things to add¶

Splice-site-based pipeline (part4 and part5).
Usage of splice-site-based approach (part4 and part5).
Description of output files.
Description of scripts.
Description of example dataset.

Documentation index