Currently, .bgz files are read as plain text, which fails due to invalid characters. .bgz files are compatible with gunzip, and have the same data header (0x1F 0x8B). renaming *.bgz files to *.gz files allows them to be decompressed normally.
Adding .bgz to the list of files that can be decompressed by data.table::fread shouldn't require anything other than R.utils. I think adding ".bgz" to the vector in this line:
|
if ((w <- endsWithAny(file, c(".gz",".bz2"))) || (gzsig <- identical(head(file_signature, 2L), gz_signature)) || identical(head(file_signature, 3L), bz2_signature)) { |
And checking for w<=2 on this line
|
FUN = if (w==1L || gzsig) gzfile else bzfile |
Would allow fread to decompress .bgz files automatically. However, I haven't tested this.
Currently, .bgz files are read as plain text, which fails due to invalid characters. .bgz files are compatible with gunzip, and have the same data header (0x1F 0x8B). renaming *.bgz files to *.gz files allows them to be decompressed normally.
Adding .bgz to the list of files that can be decompressed by data.table::fread shouldn't require anything other than R.utils. I think adding
".bgz"to the vector in this line:data.table/R/fread.R
Line 121 in c4a2085
And checking for
w<=2on this linedata.table/R/fread.R
Line 124 in c4a2085
Would allow fread to decompress .bgz files automatically. However, I haven't tested this.