Jump to content

General Punctuation

From Wikipedia, the free encyclopedia
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
General Punctuation
RangeU+2000..U+206F
(112 code points)
PlaneBMP
ScriptsCommon (109 char.)
Inherited (2 char.)
Symbol setsPunctuation
Spaces
Format controls
Assigned111 code points
Unused1 reserved code points
6 deprecated
Unicode version history
1.0.0 (1991)67 (+67)
1.1 (1993)76 (+9)
3.0 (1999)83 (+7)
3.2 (2002)95 (+12)
4.0 (2003)97 (+2)
4.1 (2005)106 (+9)
5.1 (2008)107 (+1)
6.3 (2013)111 (+4)
Unicode documentation
Code chart ∣ Web page
Note: [1][2]

General Punctuation is a Unicode block containing punctuation, spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width spaces, joining formats, directional formats, smart quotes, archaic and novel punctuation such as the interrobang, and invisible mathematical operators.

Additional punctuation characters are in the Supplemental Punctuation block and sprinkled in dozens of other Unicode blocks.

Block

General Punctuation[1][2][3]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+200x NQ
 SP 
MQ
 SP 
EN
 SP 
EM
 SP 
 3/M 
SP
 4/M 
SP
 6/M 
SP
F
 SP 
P
 SP 
TH
 SP 
H
 SP 
ZW
 SP 
ZW
 NJ 
 ZW 
J
 LRM   RLM 
U+201x  NB 
U+202x L
 SEP 
P
 SEP 
 LRE   RLE   PDF   LRO   RLO   NNB 
SP
U+203x
U+204x
U+205x MM
  SP  
U+206x  WJ   ƒ()    ×     ,     +    LRI   RLI   FSI   PDI  I
 SS 
A
 SS 
I
 AFS 
A
 AFS 
NA
 DS 
NO
 DS 
Notes
1.^ As of Unicode version 16.0
2.^ Grey area indicates non-assigned code point
3.^ Unicode code points U+206A - U+206F are deprecated as of Unicode version 3.0

Several characters in this block are usually not rendered with a directly visible glyph. Ten whitespace characters U+2002 through U+200B (fixed en or 1⁄2 em, em, 1⁄3 em, 1⁄4 em, 1⁄6 em, figure and punctuation space, variable thin or 1⁄5 em and hair space, fixed zero-width space) and U+205F (math medium or 2⁄9 em space) differ by horizontal width, while U+2000 and U+2001 (en and em quad) are effectively aliases of U+2002 and U+2003, respectively; another two, U+202F and U+2060 (ill-termed word joiner) are variants of U+2009 or U+2004 and U+200B that prohibit line-breaks. Three zero-width characters U+200B through U+200D (space, non-joiner and joiner) differ in how they affect ligation and shaping of adjacent letters such as contextual forms in Arabic. Eleven invisible characters U+200E, U+200F (left-to-right and right-to-left mark), U+202A through U+202E (embeds, pops and overrides) and U+2066 through U+2069 (isolates) control the directionality of text unless higher-level markup overrides them. There are explicit line and paragraph separators at U+2028 and U+2029.

Variation selectors

Starting with Unicode 16 (2024), the block has variation sequences defined for East Asian punctuation positional variants of the curly quotation marks ‘...’ and “...”. They use U+FE00 VARIATION SELECTOR-1 (VS01) and U+FE01 VARIATION SELECTOR-2 (VS02):[3]

Variation sequences for fullwidth quotation marks
U+ 2018 2019 201C 201D Description
base code point
base + VS01 ‘︀ ’︀ “︀ ”︀ non-fullwidth form
base + VS02 ‘︁ ’︁ “︁ ”︁ justified fullwidth form

The non-fullwidth forms are expected to be separated with a space on one side, the fullwidth forms are not:

The red registration corners mark the glyph metrics and show how the glyph aligns within the space allotted to the character. For variable-width display (left), an adjacent space is expected; for full-width CJK display (right), a space is not necessary.

In vertical text, the fullwidth forms should display somewhat differently, and even as regular CJK quotation marks 「...」 and 『...』 if the vertical orientation property is set to "Hans":

CJK behaviour of generic quotation marks in horizontal and vertical text when variation selector VS02 is appended. The 'horizontal' column at left is the 'VS2' column of the preceding table.

Emoji

The General Punctuation block contains two emoji: U+203C and U+2049.[4][5]

The block has four standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the two emoji, both of which default to a text presentation.[6]

Emoji variation sequences
U+ 203C 2049
base code point
base+VS15 (text) ‼︎ ⁉︎
base+VS16 (emoji) ‼️ ⁉️

History

The following Unicode-related documents record the purpose and process of defining specific characters in the General Punctuation block:

References

  1. ^ "Unicode character database". The Unicode Standard. Retrieved 2023-07-26.
  2. ^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023-07-26.
  3. ^ Lunde, Ken (2023-10-14). "L2/23-212R: Proposal to add standardized variation sequences for four quotation marks" (PDF).
  4. ^ "UTR #51: Unicode Emoji". Unicode Consortium. 2023-09-05.
  5. ^ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2023-02-01.
  6. ^ "UTS #51 Emoji Variation Sequences". The Unicode Consortium.