Open In App

HTML Charsets

Last Updated : 22 Feb, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

HTML charsets define how characters are represented in a web document. The character encoding ensures that text appears correctly across different devices and platforms.

The <meta> tag's charset attribute is used to specify which character encoding the HTML document uses. By setting the charset, we ensure proper rendering of special characters, symbols, and text.

Common Character Encodings

1. ASCII

The American Standard Code for Information Interchange (ANSII) created this character encoding. This character encoding is used in C/C++ programming.

It has 128 alphanumeric characters consisting of alphabets(A-Z) and (a-z) and some special symbols like + - * / ( ) @ etc.

2. ANSI (Windows-1252)

American National Standards Institute (ANSI) created character encoding supported 256 characters. It is used as the default character set in Microsoft Windows. 

3. ISO-8859-1

It is used as the default character set of HTML4 and also supports 256 characters. The International Standards Organization (ISO) defines the standard character sets for different alphabets/languages. It contains numbers, upper and lowercase English letters, and some special characters. 

4. UTF-8

UTF-8 and UTF-16 standards was developed by Unicode Consortium, because the ISO-8859 character-sets are limited, and not compatible a multilingual environment. It consists all the character and punctuation symbols. 

Attribute

Web browser must know the character encoding standard used in the html page and this we do as given below. 

Example: 

  • HTML 4
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
  • HTML 5
<meta charset="UTF-8">

Note:

  • The first values from 0 to 127 are considered as the "Standard" ASCII character set.
  • Characters with values from 128 to 255 are the "Extended" Character set.

Why Character Encoding is Important?

  • Consistency: Encoding defines how text, numbers, and symbols are interpreted, ensuring that content appears correctly regardless of the user's device or browser.
  • Global Compatibility: Without proper encoding, characters in different languages or special symbols may display as unreadable or incorrect.
  • Web Development: By specifying the charset, you avoid issues with rendering characters and improve your site’s accessibility across diverse languages

Character set for different Character Encoding Standard

Following list shows different character encoding standards with their characters and their assigned number codes.

Table 1 (ASCII Device Control Characters)

This table contains Characters which are designed to control hardware devices. These are also known as control characters.

NumbersCharactersDescriptions
00NULnull character
01SOHstart of header
02STXstart of text
03ETXend of text
04EOTend of transmission
05ENQenquiry
06ACKacknowledge
07BELbell(ring)
08BSbackspace
09HThorizontal tab
10LFline feed
11VTvertical tab
12FFform feed
13CRcarriage return
14SOshift out
15SIshift in
16DLEdata link escape
17DC1device contyrol 1 
18DC2device contyrol 2
19DC3device contyrol 3
20DC4device contyrol 4
21NAKnegative acknowledge
22SYNsynchronize
23ETBend transmission block
24CANcancel
25EMend of medium
26SUBsubstitute
27ESCescape
28FSfile separator
29GSgroup separator
30RSrecord separator
31USunit separator
127DELdelete

Table 2: This table contains characters having the same numbers assigned in different character encoding.

NUMBERCharactersDescription
32 Space
33!Exclamation Mark
34"Quotation Mark
35#Hash Sign
36$Dollar Sign
37%Percent Sign
38&Ampersand Sign
39'Apostrophe Sign
40(Opening Paranthesis
41)Closing Parenthesis
42*Asterisk Sign
43+Plus Sign
44,Comma
45-Hyphen/minus Sign
46.Full-stop
47/Slash/Divide Sign
480Number Zero
491Number One
502Number Two
513Number Three
524Number Four
535Number Five
546Number Six
557Number Seven
568Number Eight
579Number Nine
58:Colon
59;Semicolon
60<Lessthan Sign
61=Equalto Sign
62>Greaterthan Sign
63?Question Mark
64@at Sign
65ALetter A
66BLetter B
67CLetter C
68DLetter D
69ELetter E
70FLetter F
71GLetter G
72HLetter H
73ILetter I
74JLetter J
75KLetter K
76LLetter L
77MLetter M
78NLetter N
79OLetter O
80PLetter P
81QLetter Q
82RLetter R
83SLetter S
84TLetter T
85ULetter U
86VLetter V
87WLetter W
88XLetter X
89YLetter Y
90ZLetter Z
91[Opening Square Bracket
92\Backslash
93]Closing Square Bracket
94^Circumflex Accent
95_Low Line
96`Grave Accent
97aLetter a
98bLetter b
99cLetter c
100dLetter d
101eLetter e
102fLetter f
103gLetter g
104hLetter h
105iLetter i
106jLetter j
107kLetter k
108lLetter l
109mLetter m
110nLetter n
111oLetter o
112pLetter p
113qLetter q
114rLetter r
115sLetter s
116tLetter t
117uLetter u
118vLetter v
119wLetter w
120xLetter x
121yLetter y
122zLetter z
123{Opening Curly Bracket
124|Vertical Line
125}Closing Curly Bracket
126~Tilde
127DELdelete

Table 3: This table contains character having different character encoding.

NumbersDescription
128
129not used
130
131ƒ
132
133
134
135
136ˆ
137
138Š
139
140Œ
141Not Used
142Ž
143Not Used
144Not Used
145
146
147
148
149
150
151
152˜
153
154š
155
156œ
157Not Used
158ž
159Ÿ
160no-break Space
161¡
162¢
163£
164¤
165¥
166¦
167§
168¨
169©
170ª
171«
172¬
173­�
174®
175¯
176°
177±
178²
179³
180´
181µ
182
183·
184¸
185¹
186º
187»
188¼
189½
190¾
191¿
192À
193Á
194Â
195Ã
196Ä
197Å
198Æ
199Ç
200È
201É
202Ê
203Ë
204Ì
205Í
206Î
207Ï
208Ð
209Ñ
210Ò
211Ó
212Ô
213Õ
214Ö
215×
216Ø
217Ù
218Ú
219Û
220Ü
221Ý
222Þ
223ß
224à
225á
226â
227ã
228ä
229å
230æ
231ç
232è
233é
234ê
235ë
236ì
237í
238î
239ï
240ð
241ñ
242ò
243ó
244ô
245õ
246ö
247÷
248ø
249ù
250ú
251û
252ü
253ý
254þ
255ÿ

HTML Charsets - FAQs

What is a charset in HTML?

Charset (character set) defines the encoding used to represent characters in an HTML document, ensuring text displays correctly.

How to specify the charset in an HTML document?

Use the <meta charset="UTF-8"> tag in the <head> section to specify the character encoding, with UTF-8 being the most common.

What is UTF-8?

UTF-8 (Unicode Transformation Format - 8-bit) is a character encoding that supports all characters in the Unicode standard, widely used for web content.

Why is UTF-8 recommended?

UTF-8 supports a vast range of characters, including special symbols and emojis, and is compatible with most languages, making it ideal for global web content.

How to specify a different charset?

Replace UTF-8 with the desired charset in the <meta> tag, e.g., <meta charset="ISO-8859-1"> for Latin-1 encoding.

What happens if I don’t specify a charset?

If no charset is specified, the browser may use a default or detected encoding, potentially leading to incorrect character display.


Next Article
Article Tags :

Similar Reads