php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #38138 russian encoding detection support
Submitted: 2006-07-19 09:41 UTC Modified: 2011-04-08 18:09 UTC
Votes:59
Avg. Score:4.4 ± 1.0
Reproduced:39 of 42 (92.9%)
Same Version:14 (35.9%)
Same OS:17 (43.6%)
From: [email protected] Assigned:
Status: Open Package: mbstring related
PHP Version: 5.* OS: *
Private report: No CVE-ID: None
 [2006-07-19 09:41 UTC] [email protected]
Description:
------------
Detection of russian encoding in mb_detect_encoding is disabled although it present among the list of supported encodings. It just three rather simple encodings - windows-1251, cp866 and koi8-r that spoil everyday life routines of russian programmer and make PHP less attractive for millions of potential PHP developers. I'll be grateful if somebody will care about them by providing default option for hosting providers, who are not too enthusiastic to experiment with server-wide configuration.


Reproduce code:
---------------
<?php

$str = "?????? ?????? ??? ??????????? ????????? ???????. ??? ????? ?????????? ? ?????? farplugins ?? CVS. ???????? ?? ??????? ? ??????????? ????? ????????? ? ???????? ? ????????? project website ??? ? ?????? ???????? farplugins-devel.";
// $encoding = mb_detect_encoding($str, "UTF-8, Windows-1251, CP866, KOI8-R");
$encoding = mb_detect_encoding($str, array("UTF-8", "Windows-1251", "CP866", "KOI8-R"));

var_dump($encoding);


Expected result:
----------------
string(12) "Windows-1251"

Actual result:
--------------
bool(false)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-07-19 09:50 UTC] [email protected]
Reclassified as feature request, where it belongs.

techtonik, I'm sure you know email addresses of ext/mbstring maintainers and can contact them about it.
Although, I don't think this will ever appear in PHP6 (because mbstring itself doesn't make much sense there) and it definitely won't appear in PHP4 (it's time to upgrade, eh?).
 [2006-07-19 17:34 UTC] [email protected]
Well, i can't say this is ok for me. At first I thought that simple configure with --enable-mbstring=all should solve the problem, but it appeared that my host of dream already has this option turned on. So autodetection of russian language is just not enabled on code level, i.e. i18n support via mbstring is somehow crippled. I evaluated PHP6 for a few days, but it was very far from being complete, unfortunately.
 [2006-07-20 06:27 UTC] [email protected]
>I evaluated PHP6 for a few days, but it was very far from
>being complete, unfortunately.
I wonder why.. probably because it's still 12+ months before the release? =) 
Feel free to help us, though. The documentation is not the only area that needs some help =)
 [2006-07-20 07:04 UTC] [email protected]
I would like to if anybody will explain "how to" port PHP functions into Unicode "for dummies". It will also be nice to see an environment to monitor the changes (?trac) and control requirements. The last one is to help analyze deprecated, inconvenient and obscure API - logical bugs - to provide means to increase usability. Like unify inlcude_path delimiters on all platforms etc. It is just to save some time and make occasional development (which I am pretty restrained to) effective.
 [2009-01-21 08:46 UTC] Roman dot Kyrylych at gmail dot com
here's a russian encoding autodetector that can be used after 
mb_detect_encoding returned false:
http://www.opennet.ru/base/dev/charset_autodetect.txt.html
 [2009-03-20 11:14 UTC] wips at mail dot ru
Another version of encoding detector http://popoff.donetsk.ua/file/text/libs/a.charset.php which works with utf8 too.
 [2010-12-31 12:43 UTC] rustamabd at gmail dot com
Windows-1251, koi8-r, cp866 are all single-byte CHARSETs, not ENCODINGs.

mb_detect_encoding() is not intended to distinguish between charsets, especially 
single-byte charsets. Its primary purpose is to detect which multibyte encoding is 
in use, i.e. UTF-8, UTF-16, shift-JIS, etc.
 [2011-04-08 18:08 UTC] [email protected]
-Package: Feature/Change Request +Package: *General Issues
 [2011-04-08 18:09 UTC] [email protected]
-Package: *General Issues +Package: mbstring related -Operating System: +Operating System: * -PHP Version: 4.4.2 +PHP Version: 5.*
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jun 04 00:01:26 2025 UTC