Stage Input Data
By default, all input files are handled out of the input/
and input/preprocessedBams
directories for the Preprocessing and Germline/Somatic modules, respectively. However, each module in the pipeline includes an parameter (--input_dir
) for the user to define the input directory. Finally, all symbolic links are followed.
Click on a tab below for details on input needed for each module …
The Preprocessing module currently supports fastq
or bam
as --input_format
and WGS
or WES
as --seq_protocol
. This includes lane split FASTQs (See below for validated naming conventions). The files will be merged internally as part of the module run and will not alter any input files.
There are two underlying logical assumptions:
- The input files are of a single format
- The FASTQs use an ‘R1/R2’ naming convention to designate paired-end read files
Example of input directory with FASTQs:
sample_1_R1.fastq.gz
sample_1_R2.fastq.gz
sample_2_R1.fastq.gz
sample_2_R2.fastq.gz
Example of input directory with BAMs:
sample_1.bam
sample_2.bam
Validated naming conventions for lane split FASTQs:
_L00[1-9]_R[12]_[\d\w]+
_L00[1-9]_R[12]
_00[1-9]_R[12]_[\d\w]+
_00[1-9]_R[12]
_R[12]_00[1-9]
Not sure if your FASTQs will work? Test their filename against the regex strings listed above here.
The Germline module currently supports any GRCh38 aligned bam
for either WGS
or WES
as --seq_protocol
. Only additional input needed is an index (.bai
) for the bam
and a sample sheet CSV file.
Example of input directory:
N_sample_1.bam
N_sample_1.bam.bai
N_sample_2.bam
N_sample_2.bam.bai
Example of sample sheet CSV file:
normal,tumor
N_sample_1.bam,T_sample_1.bam
N_sample_2.bam,T_sample_2.bam
This CSV file should only include the filenames of the bam
that can be found at the input directory path.
Notice: While the
T_sample_1.bam
file is written in the sample sheet, the module only searchs for files in the normal column
The Somatic module currently supports any GRCh38 aligned bam
for WGS
as --seq_protocol
. Only additional input needed is an index (.bai
) for the bam
and a sample sheet CSV file.
Example of input directory:
N_sample_1.bam T_sample_1.bam
N_sample_1.bam.bai T_sample_1.bam.bai
N_sample_2.bam T_sample_2.bam
N_sample_2.bam.bai T_sample_2.bam.bai
Example of sample sheet CSV file:
normal,tumor
N_sample_1.bam,T_sample_1.bam
N_sample_2.bam,T_sample_2.bam
This CSV file should only include the filenames of the bam
that can be found at the input directory path.