Skip to content

fread cleanup prints error message in webR #5969

@maek-ies

Description

@maek-ies

data.table is working quite fine in webR, but when using fread an error occurs (System errno 8 unmapping file: Bad file descriptor). Interestingly, the file loads but it throws an error neverthless. Same error occurs also for .gz files. Perhaps, it's just the fact that this runs in browser using wasm is not recognized, resulting in an error.

install.packages("data.table")
library("data.table")

dt = data.table(1:10)
fwrite(dt,"dt.csv")
list.files()
dtcsv = fread("dt.csv", verbose = T)
print(dtcsv)

sessionInfo()

Resulting output:

`> install.packages("data.table")
Downloading webR package: data.table

library("data.table")
data.table 1.14.7 IN DEVELOPMENT built 2023-12-19 09:51:47 UTC using 1 threads (see ?getDTthreads). Latest news: r-datatable.com


This development version of data.table was built more than 4 weeks ago. Please update: data.table::update_dev_pkg()



This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode.
This is Emscripten. This warning should not normally occur on Windows or Linux where OpenMP is turned on by data.table's configure script by passing -fopenmp to the compiler. If you see this warning on Windows or Linux, please file a GitHub issue.


dt = data.table(1:10)

fwrite(dt,"dt.csv")

list.files()
[1] "dt.csv"

dtcsv = fread("dt.csv", verbose = T)
This installation of data.table has not been compiled with OpenMP support.
omp_get_num_procs() 1
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 1
omp_get_max_threads() 1
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 1 threads with throttle==1024. See ?setDTthreads.
freadR.c has been passed a filename: dt.csv
[01] Check arguments
Using 1 threads (omp_get_max_threads()=1, nth=1)
NAstrings = [<>]
None of the NAstrings look like numbers.
show progress = 1
0/1 column will be read as integer
[02] Opening the file
Opening file dt.csv
File opened, size = 24 bytes.
Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
Positioned on line 1 starting: <>
[06] Detect separator, quoting rule, and ncolumns
Detecting sep automatically ...
No sep and quote rule found a block of 2x2 or greater. Single column input.
Detected 1 columns on line 1. This line is either column names or first data row. Line starts as: <>
Quote rule picked = 0
fill=false and the most number of columns found is 1
[07] Detect column types, good nrow estimate and whether first row is column names
Number of sampling jump points = 1 because (23 bytes from row 1 to eof) / (2 * 23 jump0size) == 0
Type codes (jump 000) : 6 Quote rule 0
'header' determined to be true due to column 1 containing a string on row 1 and a lower type (int32) in the rest of the 10 sample rows
All rows were sampled since file is small so we know nrow=10 exactly
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : 6
[10] Allocate memory for the datatable
Allocating 1 column slots (1 - 0 dropped) with 10 rows
[11] Read the data
jumps=[0..1), chunk_size=1048576, total_size=20
Read 10 rows x 1 columns from 24 bytes file in 00:00.004 wall clock time
[12] Finalizing the datatable
Type counts:
1 : int32 '6'
=============================
0.001s ( 25%) Memory map 0.000GB file
0.002s ( 50%) sep='' ncol=1 and header detection
0.000s ( 0%) Column type detection using 10 sample rows
0.001s ( 25%) Allocation of 10 rows x 1 cols (0.000GB) of which 10 (100%) rows used
0.000s ( 0%) Reading 1 chunks (0 swept) of 1.000MB (each chunk 10 rows) using 1 threads

  • 0.000s ( 0%) Parse to row-major thread buffers (grown 0 times)
  • 0.000s ( 0%) Transpose
  • 0.000s ( 0%) Waiting
    0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
    0.004s Total
    System errno 8 unmapping file: Bad file descriptor

print(dtcsv)
V1

1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7
8: 8
9: 9
10: 10

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: wasm32-unknown-emscripten (32-bit)
Running under: emscripten

Matrix products: default

locale:
[1] en_US.UTF-8 C en_US.UTF-8 en_US.UTF-8 en_US.UTF-8 C

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.14.7

loaded via a namespace (and not attached):
[1] tools_4.3.2 webr_0.2.2.9000
error_fread
`

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions