Below is a function for converting Hebrew final characters to their
normal equivelants should they appear in the middle of a word.
The /b argument does not treat Hebrew letters as part of a word,
so I had to work around that limitation.
<?php
$text="עברית מבולגנת";
function hebrewNotWordEndSwitch ($from, $to, $text) {
$text=
preg_replace('/'.$from.'([א-ת])/u','$2'.$to.'$1',$text);
return $text;
}
do {
$text_before=$text;
$text=hebrewNotWordEndSwitch("ך","כ",$text);
$text=hebrewNotWordEndSwitch("ם","מ",$text);
$text=hebrewNotWordEndSwitch("ן","נ",$text);
$text=hebrewNotWordEndSwitch("ף","פ",$text);
$text=hebrewNotWordEndSwitch("ץ","צ",$text);
} while ( $text_before!=$text );
print $text; // עברית מסודרת!
?>
The do-while is necessary for multiple instances of letters, such
as "אנני" which would start off as "אןןי". Note that there's still the
problem of acronyms with gershiim but that's not a difficult one
to solve. The code is in use at http://gibberish.co.il which you can
use to translate wrongly-encoded Hebrew, transliterize, and some
other Hebrew-related functions.
To ensure that there will be no regular characters at the end of a
word, just convert all regular characters to their final forms, then
run this function. Enjoy!