Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GLib's regexp interface (backed by PCRE) #412

Merged
merged 7 commits into from
Jan 3, 2017

Conversation

LemonBoy
Copy link
Member

@LemonBoy LemonBoy commented Jan 29, 2016

Pros:

  • Uniform syntax between systems (as brought up by @wilhelmy)
  • More people are familiar with the syntax used by PCRE
  • Faster (?)
  • More robust utf8 support

Cons:

  • People may need to update their regexps
  • We need to make sure to feed utf8 encoded strings

This proposal is up for discussion 🎈


@ailin-nemui
I would really like to see some test cases wrt. UTF8 and with and without the raw flags and some thoughts about how to go best about this (compile 2 regexen and match them depending on utf state?) there is also some secret unicode command flag in pcre iirc, does this apply?

@ailin-nemui
Copy link
Contributor

I think this is a good idea. what we need is

  • examples of breakage in regexes when using the new engine
  • documentation for users how to manually adjust their regexes

@ailin-nemui
Copy link
Contributor

we (probably) need to conditionally use the raw flag. Two cases:

  • irssi is being used with 8bit term_type (-> everything is bytes)
  • incoming messages are not valid utf8

@ailin-nemui
Copy link
Contributor

we should consider if it shouldn't be possible to disable the RAW mode? otherwise we cannot benefit from the unicode advantage in gregex

@foice
Copy link

foice commented Sep 10, 2016

Does this support negative lookahead? I'd like to try it, I get irssi from brew on OS X if that matters.

@vague666
Copy link
Member

@foice , see man pcresyntax, I assume that's the place to read about pcre and what it supports (it mentions negative look ahead)

@vague666
Copy link
Member

@LemonBoy @ailin-nemui how about using both regex systems in parallel for a while to let people adjust to the change and use a setting to switch between the two? It would make the codebase bigger for the time being but might not give users as much headache, or is there a library that you know of(I didn't find anything with google) that would allow conversion of POSIX regex to PCRE?

@dequis
Copy link
Member

dequis commented Sep 10, 2016

is there a library that you know of(I didn't find anything with google) that would allow conversion of POSIX regex to PCRE?

Converting POSIX regex to PCRE would mean trying to emulate it while keeping bug-compatibility with it. Nope. Maybe it's a matter of adding backslashes to some characters and removing them from others, but we might end up learning about missed edge cases after releasing a new version breaks stuff. I don't want to go there.

Better allow switching regex engines with a global setting.

@ailin-nemui
Copy link
Contributor

I would really like to see some test cases wrt. UTF8 and with and without the raw flags and some thoughts about how to go best about this (compile 2 regexen and match them depending on utf state?) there is also some secret unicode command flag in pcre iirc, does this apply?

@ailin-nemui
Copy link
Contributor

I will merge this, issues caused by that can then be found and solved later

@dequis
Copy link
Member

dequis commented Nov 16, 2016

I really don't like the idea of replacing the regexp engine.

@dequis
Copy link
Member

dequis commented Nov 16, 2016

It's a breaking change, and a subtle one at that. People will upgrade irssi and find their stuff stopped working for no particular reason. Let's not do that. The improvements brought by this change aren't worth it. I'd rather have it under a setting.

@ailin-nemui
Copy link
Contributor

how about making it a compile time switch

@ailin-nemui
Copy link
Contributor

@LemonBoy

LemonBoy and others added 7 commits January 2, 2017 17:50
Also remove the prototype for regex_match since it has been removed.
It was made redundant by the introduction of the pointer to the GRegex
structure.
Silence the compiler warning in textbuffer.c about preg being
initialized by setting it to NULL.
Also, plugged a memory leak when retrieving the match position.
@ailin-nemui ailin-nemui merged commit 5787e2b into irssi:master Jan 3, 2017
@lotheac
Copy link
Contributor

lotheac commented Jan 9, 2017

dequis wrote:

It's a breaking change, and a subtle one at that. People will upgrade
irssi and find their stuff stopped working for no particular reason.
Let's not do that. The improvements brought by this change aren't
worth it. I'd rather have it under a setting.

Exactly this happened to me -- I realize that this is a compile-time
setting, but if subtle breakage was to be avoided, then it should not
have been the default.

edit: or, release notes should have been clearer about this.

@ailin-nemui
Copy link
Contributor

sorry for your inconvenience, I pushed for this change but dequis was against it. I added a further hint about it to the on-site release notes (cannot change the released notes anymore)

would you like to expand on what regex broke and how you had to modify it?

@lotheac
Copy link
Contributor

lotheac commented Jan 9, 2017 via email

@ailin-nemui ailin-nemui added this to the 1.0.0 milestone Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants