Skip to content

Strict input files validation and user-friendly error reporting #2

@ssvb

Description

@ssvb

Hunspell happens to have a rather non-transparent and user unfriendly validation of the input files. It just interprets various suspicious constructs in the input files in some deterministic way without saying anything. For example:

  • the SFX/PFX directives can be indented by whitespace characters and are still recognized
  • some of the other directives, such as NEEDAFFIX, are ignored if they are indented
  • the words counter in the first line of a .dic file is ignored and words are loaded regardless of the actual counter value
  • the indented words in a .dic file are ignored
  • the stems in a .dic file may have invalid non-existing flags listed in the flags affix field. Hunspell seems to just ignore that particular invalid flag but process the rest of the data.
  • ... and many other peculiarities like this

This situation is not ideal. Hunaftool needs to precisely emulate the Hunspell's behaviour to interpret the input data in the same way. On the other hand, all suspicious/ambiguous constructs in the .aff and .dic input files need to be reported to the user. If the user is a dictionary maintainer, then they may be encouraged to resolve these ambiguities in the input files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions