illumina_fastq.utils

illumina_fastq.utils.getPairedendReadId(read_id)[source]

Given either a forward read or reverse read identifier, returns the corresponding paired-end read identifier.

Parameters:read_idstr. The forward read or reverse read identifier. This should be the entire title line of a FASTQ record, minus any trailing whitespace.
Returns:The pairend-end read identifier (title line).
Return type:str

Example

Setting read_id to “@COOPER:74:HFTH3BBXX:3:1101:29894:1033 1:N:0:NATGAATC+NGATCTCG” will return @COOPER:74:HFTH3BBXX:3:1101:29894:1033 2:N:0:NATGAATC+NGATCTCG

illumina_fastq.utils.isForwardRead(seqid)[source]

Indicates whether the passed-in read identifier is a forward or reverse read identifier.

Parameters:seqidstr. A read identifier of a FASTQ record.
Returns:True if a forward read identifier, False otherwise.
Return type:bool
illumina_fastq.utils.parseIlluminaFastqAttLine(attLine)[source]

Given the title line of a FASTQ record, tonizes the line and stores the tokens in a dict. The Illumina FASTQ Att line format (as of CASAVA 1.8 at least) is:

@<instrument-name>:<run ID>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos> <read number>:<is filtered>:<control number>:<barcode sequence>
Parameters:attLinestr. The title line of a FASTQ record, minus any trailing whitespace.
Returns:
  1. instrument,
  2. runId,
  3. flowcellId,
  4. lane,
  5. tile,
  6. xpos,
  7. ypos,
  8. readNumber,
  9. isFiltered,
Return type:dict. The keys are
  1. control,
  2. barcode
illumina_fastq.utils.yieldRecs(fastqFile, log=<open file '<stdout>', mode 'w'>, barcodes=[])[source]

A generator function that reads a FASTQ file and yields records, one at a time. The records to yield can be restricted to the specified set of barcodes.

Parameters:
  • fastqFilestr. Path to the FASTQ file to parse.
  • log – A file handle to write log messages to. Defaults to STDOUT.
Yields:

A list containing one element per line of a FASTQ record. Each element is whitespace stripped.