illuminaBarcodeDist.py

Given a FASTQ file containing unmatched reads, tabulates the frequencies at which each barcode is present. The first line of each FASTQ record must be in the standard Illumina format: @<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos>[:<UMI>] <read>:<is filtered>:<control number>:<index sequence> where anything in [] is optional. The output file will have 3 tab-delimite fields being 1) Barcode 2) Freq 3) Relative_Freq%


usage: illuminaBarcodeDist.py [-h] -i INFILE [-o OUTFILE] [-s SAMPLE_SIZE]
                              [-v]

Named Arguments

-i, --infile Input FASTQ file. May be gzip’d with a .gz extension.
-o, --outfile Output file name with barcode histogram. Default is –infile name with the added extension ‘_barcodeHist.txt’.
-s, --sample-size
 

The number of reads to use to create the distribution, taken from the start of the file. Default is 100,000

Default: 100000

-v, --version show program’s version number and exit