Edge case inconsistency: droplevels() does not drop empty levels of factor columns in empty data.tables. This is different from the behavior of data.frame (see reprex). I tried finding out whether this is intentional. However, I could find no mention of it, should that be the case.
As far as I can tell, this is due to an early exit in setdroplevels():
|
if (!nrow(x)) return(invisible(x)) |
Going by git blame, this was apparently introduced as a fix for
#5184.
Reprex
library(data.table)
dt = data.table(x = factor(levels = c("A", "B")))
df = data.frame(x = factor(levels = c("A", "B")))
str(dt)
#> Classes 'data.table' and 'data.frame': 0 obs. of 1 variable:
#> $ x: Factor w/ 2 levels "A","B":
#> - attr(*, ".internal.selfref")=<externalptr>
str(df)
#> 'data.frame': 0 obs. of 1 variable:
#> $ x: Factor w/ 2 levels "A","B":
str(droplevels(dt))
#> Classes 'data.table' and 'data.frame': 0 obs. of 1 variable:
#> $ x: Factor w/ 2 levels "A","B":
#> - attr(*, ".internal.selfref")=<externalptr>
str(droplevels(df))
#> 'data.frame': 0 obs. of 1 variable:
#> $ x: Factor w/ 0 levels:
Output of sessionInfo()
sessionInfo()
#> R version 4.5.0 (2025-04-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Linux Mint 22
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.17.4
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 fastmap_1.2.0 xfun_0.52 glue_1.8.0
#> [5] knitr_1.50 htmltools_0.5.8.1 rmarkdown_2.29 lifecycle_1.0.4
#> [9] cli_3.6.5 reprex_2.1.1 withr_3.0.2 compiler_4.5.0
#> [13] rstudioapi_0.17.1 tools_4.5.0 evaluate_1.0.3 yaml_2.3.10
#> [17] rlang_1.1.6 fs_1.6.6
Created on 2025-05-31 with reprex v2.1.1
Edge case inconsistency:
droplevels()does not drop empty levels of factor columns in emptydata.tables. This is different from the behavior ofdata.frame(see reprex). I tried finding out whether this is intentional. However, I could find no mention of it, should that be the case.As far as I can tell, this is due to an early exit in
setdroplevels():data.table/R/fdroplevels.R
Line 18 in 764809b
Going by git blame, this was apparently introduced as a fix for #5184.
Reprex
Output of
sessionInfo()Created on 2025-05-31 with reprex v2.1.1