next up previous contents
Next: Visualization Up: QoRTs Package User Manual Previous: Dataset Organization   Contents

Subsections


Processing of aligned RNA-Seq data

The first step is to process the aligned RNA-Seq data. The bulk of the data-processing is performed by the QoRTs.jar java utility. This tool produces an array of output files, analyzing and tabulating the data in various ways. This utility requires about 10-20gb of RAM for most genomes, and takes roughly 4-7 minutes to process 1 million read-pairs.

java -jar /path/to/jarfile/QoRTs.jar QC \
                   mybamfile.bam \
                   transcriptAnnotationFile.gtf.gz \
                   /qc/data/dir/path/

In the above command (which must be entered as a single line), you must replace /path/to/jarfile/ with the file-path to the directory in which the jar file is kept. The path /qc/data/dir/path/ should be replaced with the path you want the QC data to be written. This should match the path located in the decoder in the qc.data.dir column for this sample-run.

The bam processing tool includes numerous options. A full description of these options can be found in the online documentation of the jar utility6, or by entering the command:

java -jar /path/to/jarfile/QoRTs.jar QC --man

There are a number of crucial points that require attention when using the QoRTs.jar QC command.

For example, to read the first read group bam-file for SAMP1 from the example dataset (which is stranded, coordinate-sorted, and uses the fr_firstStrand stranded library type), one would use the following command:

java -jar /path/to/jarfile/QoRTs.jar QC \
                   --stranded \
                   inputData/bamFiles/SAMP1_RG1.bam \
                   inputData/annoFiles/anno.gtf.gz \
                   outputData/qortsData/SAMP1_RG1/

This command must be run on each bam file (and possibly more than once on each, if each bam file consists of multiple separate read-groups).


Memory Usage

Memory usage: The QoRTs QC utility requires at least 4gb or RAM for most genomes / datasets. Larger genomes, genomes with more annotated genes/transcripts, or larger bam files may require more RAM. You can set the maximum amount of RAM allocated to the JVM using the options -Xmx4000M. This should be included before the -jar in the command line. For example:

#Set the maximum to the minimum recommended 4 gigabytes:
java -Xmx4000M -jar /path/to/jarfile/QoRTs.jar QC \
                   --stranded \
                   inputData/bamFiles/SAMP1_RG1.bam \
                   inputData/annoFiles/anno.gtf.gz \
                   outputData/qortsData/SAMP1_RG1/

#Or Set the maximum to 16 gigabytes:
java -Xmx16G -jar /path/to/jarfile/QoRTs.jar QC \
                   --stranded \
                   inputData/bamFiles/SAMP1_RG1.bam \
                   inputData/annoFiles/anno.gtf.gz \
                   outputData/qortsData/SAMP1_RG1/

This option can be used with any and all of the QoRTs java utilities.


next up previous contents
Next: Visualization Up: QoRTs Package User Manual Previous: Dataset Organization   Contents
Dr Stephen William Hartley 2015-11-06