Skip to content

Add .bgz file decompression to data.table::fread() (compatible with .gz) #5461

@TMRHarrison

Description

@TMRHarrison

Currently, .bgz files are read as plain text, which fails due to invalid characters. .bgz files are compatible with gunzip, and have the same data header (0x1F 0x8B). renaming *.bgz files to *.gz files allows them to be decompressed normally.

Adding .bgz to the list of files that can be decompressed by data.table::fread shouldn't require anything other than R.utils. I think adding ".bgz" to the vector in this line:

if ((w <- endsWithAny(file, c(".gz",".bz2"))) || (gzsig <- identical(head(file_signature, 2L), gz_signature)) || identical(head(file_signature, 3L), bz2_signature)) {

And checking for w<=2 on this line

FUN = if (w==1L || gzsig) gzfile else bzfile

Would allow fread to decompress .bgz files automatically. However, I haven't tested this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions