There are a number of processing steps that must occur prior to the creation of usable bam files. We will briefly go over the required steps here:
QoRTs is designed to run on paired-ended or single-ended next-gen RNA-Seq data. The data must be aligned (or "mapped") to a reference genome before QoRTs can be run. RNA-Star[6], GSNAP[7], and TopHat2[8] are all popular and effective aligners for use with RNA-Seq data. The use of short-read or unspliced aligners such as BowTie, ELAND, BWA, or Novoalign is NOT recommended.
For single-end data, the reads can be in any order, and sorting is unnecessary.
For paired-end data, QoRTs is designed to automatically accept files sorted by either read-name or position. Sorting can be accomplished via the samtools or novosort tools (which are NOT included with QoRTs). Sorting is unnecessary for single-end data.
To sort by coordinate:
samtools sort unsorted.bam sorted OR novosort unsorted.bam > sorted.bam
Or, to sort by read name:
samtools sort -n unsorted.bam sortedByName OR novosort -n unsorted.bam > sortedByName.bam
Running in the default mode, QoRTs will accept both name-sorted and position-sorted BAM files. Technically QoRTs can accept any BAM files regardless of ordering; however, if a large number of paired mates are not located near one another in the file then memory usage may be too high as QoRTs stores unmatched mates in memory.
QoRTs also has a separate mode designed only for name-sorted samples, which can be activated using the "--nameSorted
" option. Under certain conditions this may improve speed and reduce memory usage. Under typical conditions any improvement is trivial.