Verifiers

Introduction

The bio_util’s verifiers subpackage contains numerous functions that verify the data of a biological file format, i.e. they ensure a given file is properly formatted. These function check file entries against a regex matching a given file format. If the match fails, the verifier will subdivide the entry and determine what part of the entry fails the regex. This investigation of the entry permits the verifiers to return detailed error messages on what and where the file failed. Each verifier except entry_verifier is also a program with the simple syntax

[file]_verifier <file>

which simply reads through a file and prints whether ot not it is valid.

entry_verifier

The guts of the verifiers package, this versatile function matches a string to a regex. If the match fails, entry_verifier() splits both the regex and string by a given delimiter and matches each regex fragment to its corresponding string fragment. When a string fragment fails, a custom FormatError containing details on the failure is raised.

b6_verifier

Verifies the validity of a list of B6Entry.

binary_guesser

Heuristically guess whether a file is binary or text. While not technically a “verifier”, this function fits in this subpackage well as it helps confirm a generic property of the file before use by a program.

fasta_verifier

Verifies the validity of a list of FastaEntry.

fastq_verifier

Verifies the validity of a list of FastqEntry.

gff3_verifier

Verifies the validity of a list of GFF3Entry.

sam_verifier

Verifies the validity of a list of SamEntry.