Several QoRTs functions will require "decoder" information in some form, which describes each sample and all of its technical replicates (if any). The simplest method is to provide a decoder file. All of the columns are optional except for unique.ID, however if group, lane, and/or technical replicate information is not supplied then QoRTs obviously will not be able to produce plots that organized and/or grouped by these factors.
Fields:
In addition, the decoder can contain any other additional columns as desired, as long as all of the column names are distinct.
While QoRTs is primarily designed to allow comparisons between biological groups, lanes, sequencing-runs, etc, it can also be used on simpler datasets, or even individual samples. Thus, only the unique.ID variable is actually required. For testing purposes, you can produce a completed decoder (with all default values filled in) using the completeAndCheckDecoder
function.
The simplest example would just be a character vector of unique.ID's:
Alternatively, any of the optional fields can be included or left out, as desired:
## unique.ID sample.ID lane.ID group.ID qc.data.dir
## 1 SAMPLE1 SAMPLE1 UNKNOWN UNKNOWN SAMPLE1
## 2 SAMPLE2 SAMPLE2 UNKNOWN UNKNOWN SAMPLE2
## 3 SAMPLE3 SAMPLE3 UNKNOWN UNKNOWN SAMPLE3
group.ID = c("CASE","CONTROL"));
completeAndCheckDecoder(incompleteDecoder);
## unique.ID group.ID sample.ID lane.ID qc.data.dir
## 1 SAMPLE1 CASE SAMPLE1 UNKNOWN SAMPLE1
## 2 SAMPLE2 CONTROL SAMPLE2 UNKNOWN SAMPLE2
The separate R package QoRTsExampleData contains an example dataset with an example decoder:
Due to size constraints the example dataset contained in this R package includes only the QC output data, not the raw bam-files themselves. The actual bamfiles, along with a step-by-step example walkthrough that covers the entire analysis pipeline, are linked to from the QoRTs github website (https://github.com/hartleys/QoRTs).
The example dataset is derived from a set of rat pineal gland samples, which were multiplexed and sequenced across six sequencer lanes. For the sake of simplicity, the example dataset was limited to only six samples and three lanes. However, the bam files alone would still occupy 18 gigabytes of disk space, which would make it unsuitable for distribution as an example dataset. To further reduce the example bamfile sizes, only reads that mapped to chromosomes chr14, chr15, chrX, and chrM were included. Additionally, all the selected chromosomes EXCEPT for chromosome 14 were randomly downsampled to 30 percent of their original read counts.
THIS DATASET IS INTENDED FOR DEMONSTRATION AND TESTING PURPOSES ONLY. Due to the various alterations that have been made to reduce file sizes and improve portability, it is not representitive of the original data and as such is really not suitable for any actual analyses.
mustWork=TRUE),"/");
decoder.file <- system.file("extdata/decoder.txt",
package="QoRTsExampleData",
mustWork=TRUE);
decoder.data <- read.table(decoder.file,
header=T,
stringsAsFactors=F);
print(decoder.data);
## sample.ID lane.ID unique.ID qc.data.dir group.ID input.read.pair.count
## 1 SAMP1 L1 SAMP1_RG1 ex/SAMP1_RG1 CASE 465298
## 2 SAMP1 L2 SAMP1_RG2 ex/SAMP1_RG2 CASE 472241
## 3 SAMP1 L3 SAMP1_RG3 ex/SAMP1_RG3 CASE 500691
## 4 SAMP2 L1 SAMP2_RG1 ex/SAMP2_RG1 CASE 461405
## 5 SAMP2 L2 SAMP2_RG2 ex/SAMP2_RG2 CASE 467713
## 6 SAMP2 L3 SAMP2_RG3 ex/SAMP2_RG3 CASE 492322
## 7 SAMP3 L1 SAMP3_RG1 ex/SAMP3_RG1 CASE 485397
## 8 SAMP3 L2 SAMP3_RG2 ex/SAMP3_RG2 CASE 489859
## 9 SAMP3 L3 SAMP3_RG3 ex/SAMP3_RG3 CASE 516906
## 10 SAMP4 L1 SAMP4_RG1 ex/SAMP4_RG1 CTRL 460968
## 11 SAMP4 L2 SAMP4_RG2 ex/SAMP4_RG2 CTRL 468391
## 12 SAMP4 L3 SAMP4_RG3 ex/SAMP4_RG3 CTRL 484530
## 13 SAMP5 L1 SAMP5_RG1 ex/SAMP5_RG1 CTRL 469884
## 14 SAMP5 L2 SAMP5_RG2 ex/SAMP5_RG2 CTRL 475001
## 15 SAMP5 L3 SAMP5_RG3 ex/SAMP5_RG3 CTRL 494213
## 16 SAMP6 L1 SAMP6_RG1 ex/SAMP6_RG1 CTRL 452429
## 17 SAMP6 L2 SAMP6_RG2 ex/SAMP6_RG2 CTRL 458810
## 18 SAMP6 L3 SAMP6_RG3 ex/SAMP6_RG3 CTRL 477751