0% found this document useful (0 votes)

174 views

Unicode KbdsonWindows

The document discusses the complexities of supporting keyboard input and Unicode on Windows. It describes how keyboard input starts from hardware scan codes and is then mapped to virtual key values and keyboard layouts in software. It outlines the many Windows APIs involved in processing keystrokes and mapping them to characters based on the current keyboard layout, shift state, and codepage or Unicode. It provides examples of diverse keyboard layouts from around the world that can be used to input all of Unicode on Windows.

Uploaded by

markrosso

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

174 views

Unicode KbdsonWindows

Uploaded by

markrosso

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Unicode and Keyboards on Windows

Michael S. Kaplan
Cathy Wissink
Windows Globalization, Microsoft Corporation

1. Introduction
To implementers, it seems inputting data into applications via keyboards should be one of the
fundamentally simple features on Windows. However, once additional complexities like fonts
and rendering engines are taken into consideration, input appears to be not quite so simple
anymore. Adding many different keyboard layouts on top of over 135 locales further complicates
the issue. And finally, once you include the ability to define keyboard layouts (whether by
Microsoft interfaces or third party products) where all of Unicode can be supported, it becomes
downright complex!
This paper will discuss the many features that keyboard layouts support (such as dead keys, shift
states and ligatures), the interaction between input, fonts, and rendering engines, the issue of
code pages vs. Unicode, when IMEs are preferred and when they are not, and the collation issues
that enter into the equation. In the end, it will be clear that on Windows, the input of virtually
any characters in Unicode is possible, even if in some cases more work is required than was
originally expected.

2. The low-level details

Before diving into the details of a keyboard layout, it might be helpful to include a definition of a
keyboard layout. A keyboard layout is the collection of data for each keystroke and shift state
combination within a particular keyboard driver. It is not the physical keyboard that a user types
on, but rather, the software that the hardware calls to output text streams to applications.
Generally, anywhere this paper refers to a keyboard, keyboard layout is implied.

Starting with scan codes

Keyboard input starts at the hardware level. The keys on the physical keyboard each have a value
assigned to them called a scan code, and these scan codes are sent whenever you type a key. To
complicate things, keyboard hardware varies depending on the geographical market; in many of
the markets, you will find slightly different relationships between physical keys and scan codes.
Because of this, layout maps (like the full Windows XP list, which can be found at
http://microsoft.com/globaldev/keyboards/keyboards.asp) can be somewhat inaccurate in
some parts of the world, since the maps assume that: (a) physical key placement is identical and
(b) keys will have the same meaning, even if the hardware is different. For two examples of scan
code maps that cover the main part of the keyboard, see Figure 1 for US keyboards and Figure 2
for most European keyboards.

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 1: Scan codes for US keyboard hardware

Figure 2: Scan codes for European keyboard hardware

Note the different placement of scan codes between these two types of keyboards. For example,
scan code 0x2b is on the second row of the US keyboard, but is at the end of the third row of the
European keyboard. Scan code 0x56 is an additional scan code on the European keyboard, which
is not on the US keyboard. The shape of the enter key is also different.
(Note also that these maps here do not show other keys such as the numeric keypad or the
function keys; since those types of keys do not change when the language of the keyboard
changes, they are not covered by this paper.)
Scan code values in the hardware are invariant. Allowing scan codes to change would make the
support of multiple languages exceptionally difficult. This brings us to the Virtual Key values....

Virtual Key (VK) values

As we progress from the hardware and move to the software level, what becomes crucial is the
VK or Virtual Key value. These values fit within a byte (0x00 to 0xff) and are defined in
winuser.h: the Platform SDK header file that contains procedure declarations, constant
definitions and macros for the USER subsystem of Windows. You can see the virtual keys for the
US English keyboard layout in Figure 3. The decision of how scan codes and virtual keys map to
each other is made in the keyboard layout.

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 3: Virtual keys in the US English keyboard

Unfortunately for the implementer, the bulk of the most important VKs are not officially defined
but are implied in the comments:
/*
* VK_0 - VK_9 are the same as ASCII '0' - '9' (0x30 - 0x39)
* 0x40 : unassigned
* VK_A - VK_Z are the same as ASCII 'A' - 'Z' (0x41 - 0x5A)
*/
The rest of the virtual keys in use are explicitly defined constants, and there is no rule that keeps
all virtual keys with the same keys on the keyboard; as you change keyboard layout, those values
can change between different layouts. Note that in Figure 3, the implicit keys are all light gray,
while the explicit "OEM" keys are white1. You can obtain an array containing the state of every
VK by calling the GetKeyboardState API.
The VK values are important for the window messages that have to deal with keystrokes before
they are processed by the USER subsystem in Windows, such as WM_KEYDOWN. Although
there are minor changes in position between different keyboards even when the character values
are the same, they do not change much between different keyboard layouts. Here is an example
of a typical change: the letter "Q" is represented by the VK_Q on both the French and US English
keyboards though on the French keyboard the "Q" and "A" keys are in reversed positions relative
to the US keyboard (see the VK map for the French keyboard in Figure 4 for comparison with the
US layout in Figure 3).

The OEM keys are keys that add punctuation and symbols. The ones that commonly change
with different keyboards are OEM_1 through OEM_8, OEM_102, OEM_COMMA,
OEM_PERIOD, OEM_PLUS, and OEM_MINUS. On these keyboard layout maps they are
abbreviated with an O* prefix, followed by enough information to uniquely identify the key (e.g.
O2 for OEM_2 and OP for OEM_PERIOD).
1

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 4: Virtual Keys in the French (France) keyboard

The position of the OEM keys often changes between different layouts as well. Most of the other
VK positions are static. The changes are all quite minor when compared with the next step -where those keystrokes are processed.

Processing keystrokes
When a Windows message loop handles a VK in the WM_KEYDOWN message, it can pass the
VK to the DefWindowProc API. To handle the message, the code in the USER subsystem will
process the keystroke and convert it (when appropriate) to a character, passed as a WM_CHAR
message. This processing requires a great deal of information:

the shift state

the virtual key

the current keyboard layout

Once all of this information is collected by the USER subsystem (that is, the keyboard layout is
known for each thread and the WM_KEYDOWN message contains the VK and shift state), the
code is then is able to come up with the appropriate character, taking all the information about
shift states, VKs and current layout into account (obviously hitting arrow keys, for example,
would not be expected to insert characters; USER will not have any of this extra character-based
work run). You can mimic this behavior with several different Win32 APIs (see Table 1 for a list
of the APIs that can be useful for this).
Table 1: Keyboard input functions and what they do

Function

Description

keybd_event

Synthesizes a keystroke given a VK, a scan code, etc. (superceded by

the SendInput API)

MapVirtualKey

Maps between scan codes, VKs, and characters for the current keyboard
layout

MapVirtualKeyEx

Maps between scan codes, VKs, and characters for a specified keyboard
layout (layout must be loaded)

OemKeyScan

Maps OEMASCII codes to OEM scan codes and shift states

SendInput

Synthesizes a keystroke given a VK, a scan code, etc.

ToAscii

Maps a VK and shift state to a character on the current keyboard

layout's associated codepage

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

ToAsciiEx

Maps a VK and shift state to a character on the specified keyboard

layout's associated codepage (layout must be loaded)

ToUnicode

Maps a VK and shift state to a Unicode character per the current

keyboard layout

ToUnicodeEx

Maps a VK and shift state to a Unicode character per the specified

keyboard layout (layout must be loaded)

VkKeyScan

Converts a character to a VK and shift state for the current keyboard

layout

VkKeyScanEx

Converts a character to a VK and shift state for the specified keyboard

layout (layout must be loaded)

The functions in Table 1 are interesting in that when you read the descriptions, the functions
appear to be duplicates of each other. However, once you start needing these functions in an
application, you will see the small differences between these different functions can actually have
a great deal of importance for obtaining the features you need.2
In any case, your code has now passed a character onto an application and inserted text! You can
look at a few of the many keyboards supported on Windows (Figures 5-8) to help you see the
wide variety of possible characters to be inserted.

Figure 5: The Divehi Phonetic keyboard layout

As an example, the definitions of MapVirtualKey and VkKeyScan seem similar, but the former
does not handle shifted characters while the latter does. For more information, you can look at
the Platform SDK:
http://msdn.microsoft.com/library/enus/winui/WinUI/WindowsUserInterface/UserInput/KeyboardInput.asp
2

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 6: The Georgian keyboard layout

Figure 7: The Gujarati keyboard layout

Figure 8: The Thai Kedmanee keyboard layout

3. Language features and their influence on input

There are many features that keyboard input can require. These include:

single character keystrokes

ligatures

dead keys

shift states

AltGr shift states

Control shift states

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Caps lock key

SGCap shift states

extended shift states

Each of them is described below.

Single character keystrokes

Obviously the mainstay of many of the keyboard layouts, a simple 1-1 mapping of keystrokes to
characters is what the bulk of most keyboard layout will consist of. Some languages will use
many other features as well, but all of them are likely to have at least a few of the single character
keystrokes.

Ligatures
There are many times that a single keystroke needs to enter more than one character. In keyboard
nomenclature, these 1:many mappings are called ligatures.
Note that this definition of ligature is not identical to the one used in typography or in language
orthographies; "ligature" here is used to identify multiple UTF-16 code points that are input by a
single keystroke. This could be used in a number of ways: to represent a linguistic character
consisting of multiple UTF-16 code points (such as Sri and Ksa seen on the Tamil keyboard,
shown in Figure 9); to represent multiple linguistic characters which often work together in the
language; or to develop a keyboard layout to handle a language represented by supplementary
characters (such as the Deseret keyboard layout in Figure 10)3. (Technically, one could even
create a keyboard with a keystroke that would insert "mike" or "cath" or "hiya" using a legal
keyboard layout ligature -- as seen in the silly keyboard layout in Figure 11.)

Figure 9: The Tamil keyboard in the shifted state, showing linguistic characters Sri and Ksa as ligatures

Since keyboards support UTF-16 code points on Windows, the only way to handle
supplementary characters on keyboards is via ligatures (the high surrogate and the low surrogate
make a ligature). The process is seamless from the user perspective; the user will not experience
any difference between supplementary characters and characters on the BMP, aside from a
limitation of 4 UTF-16 code points on a single key.
3

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 10: A keyboard layout for Deseret, a language using supplementary characters (each represented by
"ligatures" of UTF-16 high and low surrogates)

Figure 11: A very silly (but real!) keyboard layout (created by a developer for personal use). This shows the
4 UTF-16 character limit for a single keystroke.

Dead keys
The dead key mechanism is either very intuitive or incredibly confusing, depending on your
experience with legacy European keyboards. The basic concept is that you type a character
defined on the particular keyboard as a dead key, then type a specific second character known as
a base character. Rather than displaying these two characters, a unique third character known as
a combining character will be shown. The reason the first character is defined as a "dead" key is
that this character is not shown, and the cursor does not advance.
Dead keys are most commonly used in European keyboard layouts; a diacritic is generally used
as the dead key. An example of this can be found on the Finnish keyboard, where typing a
diaeresis (U+00A8) will initially do nothing, but then typing any of the characters in the first
column in Table 2 will cause the character in the second column of Table 2 to be displayed. For
example, if a user types a diaeresis, followed by a small letter a, Latin small letter A diaeresis ()
will be displayed.

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Table 2: The Diaeresis dead key on the Finnish keyboard

Base Character

Combining Character

U+0020

U+00A8 ()

Any other character

U+00A8+other character

The last two rows in gray of the above table are important to note. The first gray row is a
common convention on most keyboards with dead keys; if you type the dead key and then a
space, you will get the spacing version of the character. The second one is not a part of the
keyboard layout definition, but is simply what happens if you type a dead key followed by a
character that is not defined in the keyboard layout as a base character for that dead key: the
deadkey is printed (input), followed by that second character. For example, Latin small letter C
is not defined in the keyboard layout as being a base character for the diaeresis deadkey. If
U+00A8 is typed, followed by c, those two code points will be input. No combining character
will be created.
While deadkeys are not limited to European keyboard layouts, that is where they are most
commonly used.

Shift states
A keyboard layout typically has only 47 or 48 assigned physical keys on it; even the English
alphabet would not fit, if you wanted both uppercase and lowercase A to Z (there wouldnt even
be room for punctuation characters). Therefore, keyboards usually contain another set of 47 or 48
keys that can be accessed by pressing Shift in tandem with a character (for examples, see Figure
12 and 13 for the Greek keyboard in both the unshifted and shifted states).

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 12: The Greek keyboard layout (unshifted)

Figure 13: The Greek keyboard layout (shifted)

Note how most of the letter keys are actually cased versions of each other (also note the light gray
keys; those are dead keys). By convention, most of the letters that have a cased version will
usually see that version in the shifted state. However, some languages have no notion of case, so
they do not need to use the shift state for this purpose.

AltGr shift states

Some languages need more than 96 keys to input their language properly. Using just the shift
state is not sufficient, so an additional shift state is added when Control+Alt is pressed. A
shortcut to this key combination is to use the Right Alt key, also known as the AltGr key. This
behavior is only expected for the keyboard layouts that define characters in the Control+Alt shift
state. An example of this is the Polish keyboard layout (see Figures 14-16 for the unshifted,
shifted, and Alt+Gr states of this keyboard).

Figure 14: The Polish keyboard layout (unshifted)

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Figure 15: The Polish keyboard layout (shifted)

Figure 16: The Polish keyboard layout (Alt+Ctrl or AltGr)

You can also have an AltGr+Shift state as well; thankfully, few keyboards need this, as users find
it difficult to type such characters.

Control shift states

While it is technically possible to use the Control (CTRL) key as a shift character as well, it is
highly discouraged. The reason is that many programs use the CTRL key for various command
functions (such as Ctrl+S to mean "Save...") and many times if keystrokes are assigned in the
keyboard layout, those keystrokes will not work properly in programs that specifically handle
them for other purposes.

Caps Lock key

The caps lock key is usually intended to be a version of the shift key that (a) only shifts characters
that are cased versions of each other, and (b) stays shifted without having to hold down the key.
On keyboard layouts for languages without a notion of case, the caps lock may do nothing, or it
may be used for some other purpose entirely.

SGCap shift states

Some keyboards use the Caps Lock key as an access point for an entirely independent shift state
for some of the keys. Originally named for its use in the "Swiss German" keyboard, the SGCaps
shift state is also used in the Czech and Hebrew keyboards to allow this extra shift state. Like

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

dead keys, they are either very intuitive if you are used to them and incredibly confusing if you
arent familiar with them. The only real distinction of the SGCap shift states is that the Caps Lock
key opens one to two entirely new shift states (an additional 96 characters, between the shifted
and unshifted state). Using SGCap shift states in any other keyboards is discouraged unless you
want a keyboard layout to have the same feel as one of the keyboards that uses the functionality.

Extended shift states

It is technically possible to add up to three additional keys as "Shift" keys. When combined with
all possible combinations of the other shift keys this would allow a total of 55 other shift states.
Thankfully this feature is not used in any keyboards to its fullest extent; the Canadian
Multilingual Standard keyboard layout is the only one that uses even a single extended shift
state.

4. Other technologies and their impact on keyboards

Many other features and functionalities in Windows can influence what is done with the text
created by keyboards, depending on the complexity of the writing system. Several of them are
listed in this section.

Rendering engines and what do they do

The rendering engine has a difficult job. It is tasked with properly displaying complex script
text4 in Windows and any running applications, which is a job made much more difficult by the
wide variety of scripts and languages supported on Windows. On versions of Windows prior to
Windows 2000, many clues about the language/script came from the HKL ("handle to a
keyboard layout", now known as an input locale), since the LOWORD of the HKL is a language
ID5. This usage has largely been deprecated on the newer versions of Windows6, which use the
infinitely more sophisticated Uniscribe (Microsofts shaping engine technology) and its various
engines that render text based on the writing system of the appropriate language. On downlevel
platforms, however, you can still see a great deal of information being obtained by this value.

A complex script is any writing system that needs additional processing in order to properly
display. For example, Arabic needs contextual shaping as well as bidirectional behavior,
Vietnamese needs diacritic positioning, and Indic scripts sometimes need rearrangement of
vowel marks. Uniscribe handles this kind of processing.
4

For more information, see the Platform SDK (http://msdn.microsoft.com/library/enus/winui/WinUI/WindowsUserInterface/UserInput/KeyboardInput/KeyboardInputReference

/KeyboardInputFunctions/GetKeyboardLayout.asp )
5

6 This includes any NT-based version of Windows after Windows NT 4 (Windows 2000,
Windows XP, and the upcoming Windows .NET Server 2003).

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Shaping Engine

To storage,
collation, etc.

Uniscribe

Input method

Language? Kannada
U+0C97, U+0CBF

Keyboard.dll

Script? Indic
Basis of Analysis? Syllable

Kbdinkan.dll
Unshifted VK_I

Engine breaks run into syllables

Unshifted VK_F

0C97 0CBF |

Code points

Glyphs

To display

OpenType Layout Services

Glyph substitution

Glyph positioning

Figure 17: The relationship between a keyboard, the rendering engine and display in a complex
script (Kannada, an Indic script language).

Fonts
What has diminished the importance of the HKL of a keyboard has been the increased selection
of fonts available, as well as font linking (the borrowing of information from multiple fonts to
obtain glyphs not in the current font), which was introduced in Windows 2000 and improved for
Windows XP. Obviously for a keyboard to work well, it assumed that there will be at least one
font somewhere on the machine to assist in displaying the inputted text, lest every character be
replaced by a null glyph7.

IMEs -- when are they preferred?

An Input Method Editor (IME) is a program that allows computer users to enter complex
characters and symbols, such as Japanese Kanji characters, by using a standard keyboard. It is a
solution to the issue of ideographic languages having tens of thousands of characters, or more.
IMEs allow different, alternate means of input for such cases.
Attached to each IME is a keyboard layout. On Windows the convention has always been to
attach it to the US English keyboard layout, although some third party IMEs might be attached to
other keyboards. The reason that the US English keyboard is usually preferred is that nonA null glyph is used when the font is not available on the system, generally in the shape of a
box.

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

Unicode applications using CJK languages would be relying on default system code pages that
would not include the text for other languages. Using a US English keyboard simplifies matters.
For more information on IMEs, see the Platform SDK.8

Dealing with code pages

Although Windows keyboards are exclusively Unicode, it is important to note that if a keyboard
is used with a non-Unicode application, some effort should be made to support this application
when possible by choosing characters that fit with the appropriate Windows code page (ACP).
Obviously this is not always feasible, since some languages are only supported by Unicode on
Windows (e.g., Armenian, Georgian, Hindi, etc.), and thus do not have a system code page.

Sorting out collation issues

For the most part, collation and keyboards do not have to interact. There is one major area where
they can have an impact, and that is the fact that many keyboards (both ones from Microsoft and
those provided by third parties) fail to have a consistent story in their use of composite versus
precomposed characters. This can require an extra normalization step if the input is going to be
used in XML and other technologies that expect normalized data.
Collation itself is handled well on Windows, with the proper equivalences between the
composite and precomposed forms being an important part of the sorting data kept by the OS9.

5. Keeping it under the covers

One of the most important features of keyboard layouts under Windows is the seamless
behavior: everything discussed in this paperthe USER subsystem, font technology, shapingis
not noticed by the vast majority of the people using the OS. Users simply run setup and choose a
language, and everything seems to work. Obviously it is easy for this to not work properly if the
user does not know what the content of their keyboard layout is, and their assumptions about
what the layout should be turn out to be wrong. It is in fact the users expectations and
assumptions around their keyboard choices that will often lead to the availability of multiple
keyboard layout choices for a single language. For example, there are both Divehi Phonetic and
Divehi Typewriter keyboard layouts in Windows XP, so that the user wanting to type Divehi text
is more likely to find a layout that they prefer.

6. Factors in keyboard layout creation

When developing keyboards for a particular market, a number of factors should be taken into
consideration:

Is there some kind of keyboard standard for the region or country? It is sometimes
required to have an input method which is sanctioned by the government or an
appropriate governing body. Implementers should consider contacting their local or

See http://msdn.microsoft.com/library/en-us/intl/ime_5tiq.asp.

For more detailed information of collation on Windows, please see our talk Sorting it all out: an
introduction to collation, available at
http://www.microsoft.com/globaldev/Presentations/unicode22/016.doc
9

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

national standards body prior to developing a keyboard. In addition, implementers
should consider de facto standards (that is, standards which are not official, but are used
by so many people that they are considered standard).

What languages will the keyboard support? This should be explicitly determined before
allocating keys to characters.

Does the keyboard provide input of all needed linguistic characters for the appropriate
language(s)? This requirement can be met in a number of ways: via dead keys or
additional shift states, for example (not all characters need to be on the unshifted state).
High frequency linguistic characters should be positioned where they are easy to type,
ideally in the unshifted state. (Note that if the keyboard supports multiple languages, the
high frequency keys may change.)

Does the keyboard focus on code points, and not glyphs? It is important to not place the
burden of display or shaping onto the keyboard. All technologies related to visual
display are decoupled from the keyboard (and should be handled by fonts and a
rendering engine if needed; see section 4 for more information).

Do all characters on the keyboard exist in Unicode? Since all input on Windows is based
on Unicode (UTF-16), any code points not encoded in Unicode cannot be handled.

Are supplementary characters (non-BMP characters) encoded in UTF-16 and handled in

the ligature section of the keyboard? Is the limit of 2 supplementary characters (4 UTF-16
code points) met on each key?

Ideally, a keyboard should be consistent in its behavior concerning precomposed

vs.composite characters.

7. Myths about keyboard layouts

We hear many misconceptions about keyboards and what they can do. This section will
hopefully clear up a few of these.
I get the feeling Microsoft just makes up these keyboards by themselves. Why dont they represent my
language the way I expect them to?
New keyboards for a market always get tested in their respective market. A great deal of
research does go into the keyboards shipped with the system, with feedback from linguists,
government officials, other internationalization experts, and local software providers. Often it is
the case of Beckers law applying (that is, for each expert, there is an equal and opposite expert),
unfortunately.
I dont like the keyboard layout Windows ships for my language; can we remove it or change it?
In an ideal world, customers could customize their keyboard infinitely (and there are some
projects out there that will simplify this process, which will be discussed at the presentation), but
due to backwards compatibility, we cannot simply remove a keyboard or change keys. There are
simply too many customers who count on consistent behavior across releases (even if the
behavior is not ideal). In addition, while a customer may not like the keyboard, this may be a
national standard for the language, and there may be a requirement to support this particular
keyboard. There are a number of other input options to help users input characters not on their
keyboards, including:

Character Map (available from Accessories|System Tools)

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

Unicode and Keyboards on Windows

The Insert Symbol Dialog (available in Office)

The ALT+X option, also available in Office. (Typing ALT+X after a character gives you
the Unicode value; typing ALT+X after a Unicode value gives you the character.)

I want to make sure I have every single visual variant of my characters on the keyboardthe canonical (or
isolate) version of the code point is not sufficient.
As is discussed in the other technologies section, keyboards on Windows only deal with code
points, not with glyphs. Code points are used exclusively for text processing, except for display.
At the point of display, technologies such as fonts and rendering engines map between code
points and glyphs. There is an important technical boundary between code points and glyphs,
and this exists in order to maintain at least modicum of simplicity within the system. (Imagine if
every single visual variant of a code point had to be maintained for text processing!) For this
reason, keyboards focus exclusively on code points, and leave the work of linking code points to
the appropriate visual display to fonts and shaping engines.
I want to have an IME rather than a keyboard for my language.
This is generally heard from customers working with complex script languages who feel that
they need to have all visual variants of a code point on an input method. Input Method Editors
really make sense with ideographic languages such as Chinese or Korean, where there are
literally thousands of characters needed for the language. Each of these ideographic characters is
semantically distinct. Compare this with complex scripts, where the number of semantically
distinct characters is generally less than 100, but the number of visually distinct characters is
considerable (into the hundreds). Again, keyboards work with code points, not with glyphs.
Since code points are semantically distinct and not visually distinct, a complex script language
can easily be handled via a keyboard; as noted earlier, the code points are linked to the
appropriate visual display by other non-keyboard technologies.

8. Summary
As has been described in this paper, the inner workings of keyboards are more complicated than
a developer would probably like them to be. What is crucial is understanding the association
between the virtual keys, the scan codes and the shift states in a keyboard. In addition,
developers should understand the relationship input has to other technologies, once the
keyboard passes on the code points (e.g., Uniscribe, font technologies and IMEs). This paper has
only touched upon many of the issues, but we hope that it has provided implementers enough
knowledge to avoid pitfalls, and provide customers with a seamless input experience.

23rd Internationalization and Unicode Conference

Prague, Czech Republic, March 2003

What Are System Program?: Explain Drop, Balr, BR, Start, Using, End, DS, DC, Equ
100% (1)
What Are System Program?: Explain Drop, Balr, BR, Start, Using, End, DS, DC, Equ
3 pages
Method Statement For Construction of Solar System On Rooftop
No ratings yet
Method Statement For Construction of Solar System On Rooftop
4 pages
How To Utilize Different Keyboards in Java ME
No ratings yet
How To Utilize Different Keyboards in Java ME
7 pages
Keyboards: Netpro Certification Courseware For Netpro Certified Systems Engineer - N.C.S.E
No ratings yet
Keyboards: Netpro Certification Courseware For Netpro Certified Systems Engineer - N.C.S.E
6 pages
Virtual Keyboard Is Just Another Example of Today's Computer
No ratings yet
Virtual Keyboard Is Just Another Example of Today's Computer
28 pages
Visual Studio Code Key Bindings
No ratings yet
Visual Studio Code Key Bindings
15 pages
Virtual Keyboard Is Just Another Example of Today's
No ratings yet
Virtual Keyboard Is Just Another Example of Today's
30 pages
Virtual
No ratings yet
Virtual
8 pages
Keyboard PPT - Nilendu Sarkar
No ratings yet
Keyboard PPT - Nilendu Sarkar
14 pages
Peripheral and Interfacing Assignment
No ratings yet
Peripheral and Interfacing Assignment
20 pages
R-Winedt: Uwe Ligges Sfb475, Fachbereich Statistik, Universit at Dortmund, Germany
No ratings yet
R-Winedt: Uwe Ligges Sfb475, Fachbereich Statistik, Universit at Dortmund, Germany
9 pages
Virtual Keyboard Full Report
No ratings yet
Virtual Keyboard Full Report
14 pages
Virtual Typing Full Report
No ratings yet
Virtual Typing Full Report
31 pages
Differences between socket programming on Windows, Linux, and macOS
No ratings yet
Differences between socket programming on Windows, Linux, and macOS
4 pages
Dead Key: Space So A Plain Grave Accent Can Be Typed by Pressing
No ratings yet
Dead Key: Space So A Plain Grave Accent Can Be Typed by Pressing
4 pages
MSKLC Overview 20080721
No ratings yet
MSKLC Overview 20080721
9 pages
Virtual Keyboard Is Just Another Example of Today's Computer
No ratings yet
Virtual Keyboard Is Just Another Example of Today's Computer
32 pages
Install Secwepemctsin
No ratings yet
Install Secwepemctsin
6 pages
Lab Supplement 1 Debug Assembly Language
No ratings yet
Lab Supplement 1 Debug Assembly Language
19 pages
CNC Programming 2
No ratings yet
CNC Programming 2
18 pages
Assignment: By-Anurag Kumar
No ratings yet
Assignment: By-Anurag Kumar
7 pages
Ncurses Programming Howto
No ratings yet
Ncurses Programming Howto
2 pages
Chap 1a
No ratings yet
Chap 1a
14 pages
Module 2
No ratings yet
Module 2
14 pages
Lesson1:Introduction: 1.1 A Brief Description of Visual Basic
No ratings yet
Lesson1:Introduction: 1.1 A Brief Description of Visual Basic
64 pages
Visual Basic Visual Basic Is A
No ratings yet
Visual Basic Visual Basic Is A
8 pages
2011-06-29 - Implementing Keyloggers in Windows
No ratings yet
2011-06-29 - Implementing Keyloggers in Windows
21 pages
UKRAINE
No ratings yet
UKRAINE
2 pages
White LED
No ratings yet
White LED
3 pages
Syed Usman Rafi 2421-116-057 7 Bscs (Visual Basic) : Name
No ratings yet
Syed Usman Rafi 2421-116-057 7 Bscs (Visual Basic) : Name
6 pages
Keyboard Basics
No ratings yet
Keyboard Basics
8 pages
Korean
No ratings yet
Korean
168 pages
Chapter-6 User Computer Interface
No ratings yet
Chapter-6 User Computer Interface
39 pages
Assemblynotes Revised2010
No ratings yet
Assemblynotes Revised2010
17 pages
Mac Cyrillic
No ratings yet
Mac Cyrillic
6 pages
Personal Digital Assistants Touchscreen Cell Phones Operating System Emulation Software
No ratings yet
Personal Digital Assistants Touchscreen Cell Phones Operating System Emulation Software
3 pages
Assembly Language Guess
No ratings yet
Assembly Language Guess
33 pages
Chin Simp
No ratings yet
Chin Simp
132 pages
REAL ESTATE MANAGEMENT SYSTEM Report
No ratings yet
REAL ESTATE MANAGEMENT SYSTEM Report
58 pages
Inject Your Code To A Portable Executable File - CodeProject®
100% (1)
Inject Your Code To A Portable Executable File - CodeProject®
40 pages
Inject Your Code To A Portable Executable File
No ratings yet
Inject Your Code To A Portable Executable File
25 pages
Keymap and Ioctls
No ratings yet
Keymap and Ioctls
6 pages
Windows Console
No ratings yet
Windows Console
364 pages
Beyond UTR22: Complex Legacy-To-Unicode Mappings: 1. Background
No ratings yet
Beyond UTR22: Complex Legacy-To-Unicode Mappings: 1. Background
16 pages
Unit 2 Input Output
No ratings yet
Unit 2 Input Output
22 pages
Written Assignment Unit 4
No ratings yet
Written Assignment Unit 4
9 pages
Unit 2 SP
No ratings yet
Unit 2 SP
29 pages
Final Te 2019 Sposl Lab Manual 2022-4-33
No ratings yet
Final Te 2019 Sposl Lab Manual 2022-4-33
30 pages
Original
No ratings yet
Original
34 pages
Virtual Keyboard
No ratings yet
Virtual Keyboard
20 pages
Keyboard and Screen Input Devices: Function of A Keyboard
No ratings yet
Keyboard and Screen Input Devices: Function of A Keyboard
4 pages
Basic Ref
No ratings yet
Basic Ref
1,091 pages
Report of Virtual Keyboard
No ratings yet
Report of Virtual Keyboard
11 pages
Jananese Text Pentru Download
No ratings yet
Jananese Text Pentru Download
228 pages
Convert ASCII Characters To Keyboard Scan Codes
100% (1)
Convert ASCII Characters To Keyboard Scan Codes
3 pages
Create New Language
No ratings yet
Create New Language
26 pages
Intro: Major Reason For Its Popularity Is That It Allows Programmers To Create Windows Applications Quickly and Easily
No ratings yet
Intro: Major Reason For Its Popularity Is That It Allows Programmers To Create Windows Applications Quickly and Easily
2 pages
Assembler Design
No ratings yet
Assembler Design
40 pages
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
You Press 'Enter' on the Browser: What happens when..., #1
From Everand
You Press 'Enter' on the Browser: What happens when..., #1
Dustin W. Morris
5/5 (1)
The Windows Command Line Beginner's Guide: Second Edition
From Everand
The Windows Command Line Beginner's Guide: Second Edition
Jonathan Moeller
4/5 (4)
Thesahihofalbuha 00 Bukhuoft
No ratings yet
Thesahihofalbuha 00 Bukhuoft
126 pages
Developing A Large Semantically Annotated Corpus: Valerio Basile, Johan Bos, Kilian Evang, Noortje Venhuizen
No ratings yet
Developing A Large Semantically Annotated Corpus: Valerio Basile, Johan Bos, Kilian Evang, Noortje Venhuizen
5 pages
Beyond The Standard-Spoken Divide: The Arabic Syntax As A System
No ratings yet
Beyond The Standard-Spoken Divide: The Arabic Syntax As A System
1 page
Creating A Keyboard Using MSKLC
No ratings yet
Creating A Keyboard Using MSKLC
11 pages
Ivan Panovic - Wiki Masry - Beginnings
No ratings yet
Ivan Panovic - Wiki Masry - Beginnings
22 pages
2.1to3.0. Noun, Verb .Mono
No ratings yet
2.1to3.0. Noun, Verb .Mono
2 pages
J. Daude, L. Padro & G. Rigau
No ratings yet
J. Daude, L. Padro & G. Rigau
8 pages
A Complete 1.5 To 1.6 Mapping: J. Daude, L. Padro G. Rigau
No ratings yet
A Complete 1.5 To 1.6 Mapping: J. Daude, L. Padro G. Rigau
6 pages
J. Daude, L. Padro & G. Rigau
No ratings yet
J. Daude, L. Padro & G. Rigau
8 pages
Berkeley DB XML Reference Card
No ratings yet
Berkeley DB XML Reference Card
2 pages
Lojban Thesaurus
No ratings yet
Lojban Thesaurus
18 pages
Hillel Frisch. The Arab Vote in The Israeli Elections: The Bid For Leadership
No ratings yet
Hillel Frisch. The Arab Vote in The Israeli Elections: The Bid For Leadership
19 pages
IMSLP00348-Franz Schubert - Piano Sonata in E D 157
No ratings yet
IMSLP00348-Franz Schubert - Piano Sonata in E D 157
14 pages
Contribution To AMTA2012
No ratings yet
Contribution To AMTA2012
8 pages
Answer-To-Word-Formation Week 1
No ratings yet
Answer-To-Word-Formation Week 1
2 pages
Grade 6 Progress Test - Energy, Work & Power: Student Name: - Class
No ratings yet
Grade 6 Progress Test - Energy, Work & Power: Student Name: - Class
11 pages
Đàm Phán Thi
No ratings yet
Đàm Phán Thi
29 pages
Pump NPSH Calculation: Sample Problem Statement
No ratings yet
Pump NPSH Calculation: Sample Problem Statement
2 pages
Prepared By: Lorelie S. Gellecani: Maslow'S Hierarchy of Needs
No ratings yet
Prepared By: Lorelie S. Gellecani: Maslow'S Hierarchy of Needs
3 pages
Compare and Contrast LP MTPP 3
No ratings yet
Compare and Contrast LP MTPP 3
5 pages
43 IFH2019 Benbow
No ratings yet
43 IFH2019 Benbow
6 pages
Contents
No ratings yet
Contents
4 pages
Code of Ethics
No ratings yet
Code of Ethics
29 pages
The Effects of Brine Concentrations On The Drying Characteristics and Microbial Quality of Dried Fillets of African Catfish (Clarias Gariepenus)
No ratings yet
The Effects of Brine Concentrations On The Drying Characteristics and Microbial Quality of Dried Fillets of African Catfish (Clarias Gariepenus)
6 pages
E.T. Gendlin - The New Phenomenology of Carrying Forward
No ratings yet
E.T. Gendlin - The New Phenomenology of Carrying Forward
27 pages
Kalera and Agrico Investor Presentation Final
No ratings yet
Kalera and Agrico Investor Presentation Final
50 pages
Warranty Cost Models State-Of-Art A Practical Review
No ratings yet
Warranty Cost Models State-Of-Art A Practical Review
10 pages
UiTM Hydrostatic Force Experiment - Suggestions and Improvements
No ratings yet
UiTM Hydrostatic Force Experiment - Suggestions and Improvements
2 pages
Space Exploration British English Teacher
No ratings yet
Space Exploration British English Teacher
16 pages
2017 12 02 PDF
No ratings yet
2017 12 02 PDF
4 pages
ESD Grounding Standards
No ratings yet
ESD Grounding Standards
2 pages
Channels of Communication
No ratings yet
Channels of Communication
17 pages
AJ-X160A User Manual
No ratings yet
AJ-X160A User Manual
40 pages
Forecasting Seasonal Time Series Decomposition
No ratings yet
Forecasting Seasonal Time Series Decomposition
32 pages
Toleshi Wakjira
No ratings yet
Toleshi Wakjira
89 pages
PROPSHOP Internship Report
No ratings yet
PROPSHOP Internship Report
13 pages
The Crush
No ratings yet
The Crush
3 pages
Culminating Review January 2024
No ratings yet
Culminating Review January 2024
2 pages
Safety Data Sheet: SECTION 1: Identification of The Substance/mixture and of The Company/undertaking
No ratings yet
Safety Data Sheet: SECTION 1: Identification of The Substance/mixture and of The Company/undertaking
8 pages
EEE-PDC R13 Lesson Plan
No ratings yet
EEE-PDC R13 Lesson Plan
3 pages
Calculation Api 653
No ratings yet
Calculation Api 653
23 pages
Marketing Presentation (Juice Shop)
No ratings yet
Marketing Presentation (Juice Shop)
19 pages
Unit 3 FEM Bar & Beam Elements PDF
No ratings yet
Unit 3 FEM Bar & Beam Elements PDF
14 pages