illumina_fastq.utils¶
-
illumina_fastq.utils.getPairedendReadId(read_id)[source]¶ Given either a forward read or reverse read identifier, returns the corresponding paired-end read identifier.
Parameters: read_id – str. The forward read or reverse read identifier. This should be the entire title line of a FASTQ record, minus any trailing whitespace. Returns: The pairend-end read identifier (title line). Return type: str Example
Setting read_id to “@COOPER:74:HFTH3BBXX:3:1101:29894:1033 1:N:0:NATGAATC+NGATCTCG” will return @COOPER:74:HFTH3BBXX:3:1101:29894:1033 2:N:0:NATGAATC+NGATCTCG
-
illumina_fastq.utils.isForwardRead(seqid)[source]¶ Indicates whether the passed-in read identifier is a forward or reverse read identifier.
Parameters: seqid – str. A read identifier of a FASTQ record. Returns: True if a forward read identifier, False otherwise. Return type: bool
-
illumina_fastq.utils.parseIlluminaFastqAttLine(attLine)[source]¶ Given the title line of a FASTQ record, tonizes the line and stores the tokens in a dict. The Illumina FASTQ Att line format (as of CASAVA 1.8 at least) is:
@<instrument-name>:<run ID>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos> <read number>:<is filtered>:<control number>:<barcode sequence>Parameters: attLine – str. The title line of a FASTQ record, minus any trailing whitespace. Returns: - instrument,
- runId,
- flowcellId,
- lane,
- tile,
- xpos,
- ypos,
- readNumber,
- isFiltered,
Return type: dict. The keys are - control,
- barcode
-
illumina_fastq.utils.yieldRecs(fastqFile, log=<open file '<stdout>', mode 'w'>, barcodes=[])[source]¶ A generator function that reads a FASTQ file and yields records, one at a time. The records to yield can be restricted to the specified set of barcodes.
Parameters: - fastqFile – str. Path to the FASTQ file to parse.
- log – A file handle to write log messages to. Defaults to STDOUT.
Yields: A list containing one element per line of a FASTQ record. Each element is whitespace stripped.