Verifiers¶
Introduction¶
The bio_util’s verifiers subpackage contains numerous functions that verify the data of a biological file format, i.e. they ensure a given file is properly formatted. These function check file entries against a regex matching a given file format. If the match fails, the verifier will subdivide the entry and determine what part of the entry fails the regex. This investigation of the entry permits the verifiers to return detailed error messages on what and where the file failed. Each verifier except entry_verifier is also a program with the simple syntax
[file]_verifier <file>
which simply reads through a file and prints whether ot not it is valid.
entry_verifier¶
The guts of the verifiers package, this versatile function matches a string
to a regex. If the match fails, entry_verifier()
splits both the regex
and string by a given delimiter and matches each regex fragment to its
corresponding string fragment. When a string fragment fails, a custom
FormatError
containing details on the failure is raised.
binary_guesser¶
Heuristically guess whether a file is binary or text. While not technically a “verifier”, this function fits in this subpackage well as it helps confirm a generic property of the file before use by a program.
fasta_verifier¶
Verifies the validity of a list of FastaEntry.
fastq_verifier¶
Verifies the validity of a list of FastqEntry.