Skip to content

encoding problem #1

@agricolamz

Description

@agricolamz

Hi, thank you for the package. I tried to use it with Cyrillic script (here is a Tabasaran example); however, I've got the following problem...

library(diffmatchpatch)
diff_make("БицIидимиди швушв гъахирна гвачIнимиди уьл гъипIур швумал даршул",
          "БицIидимиди швушв гъахирна гвачIнимиди уьл гъипIур швумал дашул")

#> Error in gsub(st$close, st$open, txt, fixed = TRUE) : 
#>   input string 1 is invalid in this locale

The result is writable to the variable, and the contents of the table clearly indicate the encoding problem:

image

I've never experienced any encoding problems on my Linux machine, and I didn't find any encoding calls in your Rcpp code (however, I don't know Rcpp). Here are some more details:

Linux Mint
R 4.4.1
diffmatchpatch v. 0.1.0
Sys.getlocale()
#> [1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=ru_RU.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions