Tuesday, June 30, 2015

Representing Additional Types of Flags

The UTC is considering a proposal to extend the types of flags which can be reliably represented by certain sequences of Unicode characters. In addition to the current mechanism using pairs of regional indicator symbols—already widely implemented—the proposal would use sequences of the TAG characters in the range U+E0030..U+E005A to represent other types of flags. The proposal also provides guidelines to specify valid sequences of TAG characters and how to interpret them. Full details of the proposal are provided in the background document.

The UTC welcomes feedback on this proposed new mechanism. Feedback could consist of an indication of support or opposition to the proposal, with reasons why, or could consist of suggestions for improvement of the proposal.

For further information, please see the Public Review Issues page.

Wednesday, June 17, 2015

Announcing The Unicode® Standard, Version 8.0

Version 8.0 of the Unicode Standard is now available. It includes 41 new emoji characters (including five modifiers for diversity), 5,771 new ideographs for Chinese, Japanese, and Korean, the new Georgian lari currency symbol, and 86 lowercase Cherokee syllables. It also adds letters to existing scripts to support Arwi (the Tamil language written in the Arabic script), the Ik language in Uganda, Kulango in the Côte d’Ivoire, and other languages of Africa. In total, this version adds 7,716 new characters and six new scripts.

The first version of Unicode Technical Report #51, Unicode Emoji is being released at the same time. That document describes the new emoji characters. It provides design guidelines and data for improving emoji interoperability across platforms, gives background information about emoji symbols, and describes how they are selected for inclusion in the Unicode Standard. The data is used to support emoji characters in implementations, specifying which symbols are commonly displayed as emoji, how the new skin-tone modifiers work, and how composite emoji can be formed with joiners. The Unicode website now supplies charts of emoji characters, showing vendor variations and providing other useful information.

The 41 new emoji in Unicode 8.0 include the following:

Diversity
five emoji modifiers
Faces and Hands
NERD FACE, FACE WITH ROLLING EYES, ROBOT FACE
Food-Related
HOT DOG, TACO, CHEESE WEDGE, POPCORN
Sports
CRICKET BAT AND BALL, VOLLEYBALL, BOW AND ARROW
Animals
UNICORN FACE, LION FACE, CRAB, SCORPION
Religious
MOSQUE, SYNAGOGUE, PRAYER BEADS

(For the full list, including images, see emoji additions for Unicode 8.0.)

Phones and computers often need operating system updates to support new emoji, which may take some time. It is also now clear which existing characters, such as the often requested SHOPPING BAGS, can be used as emoji. Once phones and computers support these characters, people will be able to see colorful images such as the BOTTLE WITH POPPING CORK above.

Three other important Unicode specifications are updated for Version 8.0:
Some of the changes in Version 8.0 and associated Unicode technical standards may require modifications in implementations. For more information, see Unicode 8.0 Migration and the migration sections of UTS #10, UTS #39, and UTS #46. For full details on Version 8.0, see Unicode 8.0.

Monday, June 1, 2015

Join us in Santa Clara for IUC 39 (October 26-28)

IUC39The conference program has just been announced for this year's Internationalization and Unicode® Conference (IUC), October 26-28 in Santa Clara, California.

This is the premier annual event covering the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. The program focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.

Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

This highly rated conference features excellent technical content, industry-tested recommendations and updates on the latest standards and technology. Subject areas include web globalization, programming practices, endangered languages and unencoded scripts, integrating with social networking software, implementing mobile apps, and handling emoji. This year's conference will also highlight new features in Unicode and other relevant standards.

In addition, please join us in welcoming over 20 first-time speakers to the program! This is just another reason to attend; fresh talks, fresh faces, and fresh ideas!  

Friday, May 22, 2015

Unicode 9.0 Candidate Emoji

The Unicode Consortium has accepted 38 emoji characters as candidates for Unicode 9.0, scheduled for release in mid-2016. At this point, these emoji are candidates—not yet finalized—so some may be removed from the candidate list, and others may be added. Names, images, and code points may also change, so these candidates are not yet ready for use in production systems.

These emoji have been accepted as candidates for Unicode 9.0 for a variety of reasons. They may be needed for compatibility with emoji characters in existing systems. For example, the FACE WITH COWBOY HAT was accepted for compatibility with the emoji used in Yahoo Messenger. Some are chosen based on expected high frequency of use or because they are highly popular requests from online communities. Others fill gaps in the existing set of Unicode emoji, as by completing a gender pair.

Many other prospective emoji characters are still being assessed and could be approved in the future. For more information about selection criteria, see Selection Factors in UTR #51, Unicode Emoji.

The images shown below are draft black and white versions for the Unicode 9.0 charts. Once the emoji candidates have been finalized, vendors that support emoji will provide colorful and better-designed displays for each of these. For example, the emoji for shrug might appear as shown on the right.

 → 
Some of these new emoji would take the new emoji modifiers as discussed in Diversity. Some emoji may also get annotations to help guide design and usage. For example, the cucumber emoji could also be used to represent a pickle.

Candidates
Code
Image
Candidate Unicode Name
1F920
FACE WITH COWBOY HAT
1F921
CLOWN FACE
1F922
NAUSEATED FACE
1F923
ROLLING ON THE FLOOR LAUGHING
1F924
DROOLING FACE
1F925
LYING FACE
1F919
CALL ME HAND
1F933
SELFIE
1F91A
RAISED BACK OF HAND
1F91B
LEFT-FACING FIST
1F91C
RIGHT-FACING FIST
1F91D
HANDSHAKE
1F91E
HAND WITH FIRST AND
INDEX FINGER CROSSED
1F930
PREGNANT WOMAN
1F926
FACE PALM
1F937
SHRUG
1F57A
MAN DANCING
1F934
PRINCE
1F935
MAN IN TUXEDO
1F936
MOTHER CHRISTMAS
1F940
WILTED FLOWER
1F6F4
SCOOTER
1F6F5
MOTOR SCOOTER
1F6D1
OCTAGONAL SIGN
1F942
CLINKING GLASSES
1F5A4
BLACK HEART
1F950
CROISSANT
1F951
AVOCADO
1F952
CUCUMBER
1F953
BACON
1F954
POTATO
1F955
CARROT
1F98A
FOX FACE
1F985
EAGLE
1F986
DUCK
1F987
BAT
1F988
SHARK
1F989
OWL

Monday, April 27, 2015

Huawei Upgrades to Full Member of the Unicode Consortium

HuaweiThe Unicode Consortium is pleased to announce that Huawei has upgraded from associate member to a full corporate member. We look forward to their contributions to the Unicode Standard and the Common Locale data project, and are grateful for their financial support of the consortium’s work. Full members of the consortium have a vote in all technical committees, and in the governance of the consortium. For the list of members, see http://www.unicode.org/consortium/memblogo.html.

Huawei is a leading global information and communications technology (ICT) solutions provider, and in 2014 was the largest telecommunications equipment maker in the world.

Wednesday, March 25, 2015

Emoji Glyph and Annotation Recommendations


The Unicode Technical Committee has released a list of recommendations for changes in Unicode chart glyphs and/or annotations for many emoji characters, to promote better interchange across platforms. Feedback either for or against these changes is welcome. For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.

Thursday, March 19, 2015

CLDR Version 27 Released

CLDR 27 Coverage Unicode CLDR 27 has been released, providing an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

There was no Survey Tool data collection phase for CLDR 27. Instead, the release focused primarily on stability—cleaning up data inheritance and making specific fixes—as well as improvements to the JSON format of the data. Changes include the following:
  • Cleanup of region locales: A major cleanup effort was undertaken to resolve gratuitous differences between region-specific locales and the parent from which they inherit. In regional locales, it was determined where the parent value was an acceptable replacement for a child-specific value which could then be removed, providing greater consistency in behavior in the various region locales. A special effort was made to clean up country names in certain locales.
  • Changes to English inheritance: As an outcome of the cleanup effort above, the inheritance model for English locales is now simplified, making all en_XX locales inherit from either “en” directly ( for current or former U.S. territories ), or from British-influenced “en_001 - World English”. This is also reflected in some changes for measurement systems.
  • Emoji: Data for emoji annotations and an emoji collation were added, to accompany Unicode Technical Report #51, Unicode Emoji.
  • Collation: There are new sort orders for emoji (as noted above), and an Austrian phonebook sort order. Scripts can be reordered individually, rather than only in specific groups. Fractional tertiary weights are now used that are lower than common, to allow shorter sort-keys with normal Hiragana letters.
  • Specification: The LDML specification has descriptions of new or modified structure, plus a number of fixes and clarifications. See Modifications for a list of changes.
    • Improved documentation of locale inheritance and matching, bundle versus item lookup, and parent locale information.
    • Extensive clarifications to the intended use of the language matching data.
    • Explicit new definitions of Unicode identifiers, such as Unicode Calendar Identifier, for use in citations.
  • Charts: The navigation within charts has been improved, and new ones added:
  • JSON on github: The JSON form of the data is now available on github, rather than being found through the Data link.
Details are provided in http://cldr.unicode.org/index/downloads/cldr-27, along with a detailed Migration section.

Tuesday, March 10, 2015

Unicode 8.0 Beta Review


Mountain View, CA, USA – The Unicode® Consortium today announced the start of the beta review for the forthcoming Unicode 8.0.0, which is scheduled for release in June, 2015. All beta feedback must be submitted by April 27, 2015.
Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML, ...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard.

Unicode 8.0.0 comprises several changes which require careful migration in implementations, including the conversion of Cherokee to a bicameral script, a different encoding model for New Tai Lue, and additional character repertoire. Implementers need to change code and check assumptions regarding case mappings, New Tai Lue syllables, Han character ranges, and confusables. Character additions in Unicode 8.0.0 include emoji symbol modifiers for implementing skin tone diversity, other emoji symbols, a large collection of CJK unified ideographs, a new currency sign for the Georgian lari, and six new scripts. For more information on emoji in Unicode 8.0.0, see the associated draft Unicode Emoji report.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 27, 2015. Feedback instructions are on the beta page.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.

For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.