What is Internal Storage Encoding of Characters(ISCII)?
Last Updated :
17 Jul, 2020
We all know, the computer does not directly store letters, numbers, and pictures directly. It converts them into small pieces called bits, which either have two values, 0 or 1. To represent each letter or number properly, we need some rules to correctly store them. These rules correspond to the encoding schema. We will look at the 3 most popular storage encoding schema:
ASCII
ASCII Stands for American Standard Code for Information Interchange. ASCII was introduced in the year 1963 by the American Standards Association (ASA). ASCII is broadly classified into 2 sub-categories:
- Standard ASCII: Standard ASCII represents the first half of ASCII that is, the first 128 characters from 0 to 127. Standard ASCII comprises non-printable and the lower ASCII. Non-printable ASCII contains the characters that cannot be printed on the screen and comprise various system codes. They start from range 0 to 31. Lower ASCII comprises the remaining range of Standard ASCII, that is, from 32 to 127. It contains alphabets, numbers as well as special symbols.
- Extended ASCII: Extended ASCII was proposed because though standard ASCII was enough to represent all major characters from major languages yet it was not sufficient to cover all of them. Extended ASCII solves this by adding more 128 characters, thus taking the total ASCII characters to 256.
ISCII
ISCII stands for the Indian Script Code for Information Interchange. It was proposed by the Bureau of Indian Standards (BIS) in the year 1991. It is an 8-bit standard where the first 128 characters, that is, from 0 to 127 are the same as standard ASCII. The next 128 characters constitute the characters of Indian scripts. Most popular languages that are spoken in India are present in the encoding. These include Devanagari, Gujarati, Bengali, Oriya, Punjabi, Assamese, Kannada, Telugu, Malayalam, Tamil.
Unicode
With the invention of ASCII, it was felt that the character encoding was limited and was not enough to cover all the languages of the world. Hence, a new encoding schema was needed to cover all languages. The Unicode Consortium, a non-profit organization, designed and developed Unicode in the year 1991. Initially, there were only 50, 000 characters present. But today, the Unicode covers more than 128, 000 characters.
Types of Unicode encoding:
- UTF-8: It uses 8 bits for its encoding. It is used in email over the internet. It is a standard encoding scheme used on web and software programs.
- UTF-16: It uses 2 bytes i.e. 16 bits for encoding.
- UTF-32: It uses 4 bytes i.e. 32 bits for encoding.
Why do we need Unicode?
- Unicode allows us to design a single application for many various platforms and languages. We do not need to remake the same application for launching it in another language.
- This leads to reduced application development costs.
- It prevents data corruption.
- It acts as a single encoding schema across all languages and platforms.
- It can be considered a superset of all encoding schema and hence we can convert all encoding schemas to Unicode and vice-versa.
Similar Reads
What is Character Encoding System? As we all know, computers do not understand the English alphabet, numbers except 0 and 1, or text symbols. We use encoding to convert these. So, encoding is the method or process of converting a series of characters, i.e, letters, numbers, punctuation, and symbols into a special or unique format for
5 min read
Explain different kinds of character set available in HTML Explain the different kinds of character sets available in HTML? Before looking at different kinds of character sets available in HTML, let us first know what character sets in HTML actually are. HTML Character Sets: Have you ever wondered how the browser displays the numbers, alphabets, and other s
5 min read
Character Literal in C++ 17: Unicode and UTF-8 Prefix C++17 programming introduces a foundational category known as character literals, which serve the purpose of embodying a single character. Quotation marks are employed to define them, such as 'a', 'z', or '0'. But in previous versions of C++, the available selection for character literals was confin
3 min read
How to set character encoding for document in HTML5 ? In this article, we will learn how to set character encoding for document in HTML5. Character encoding is a method of defining a mapping between bytes and text. To display an HTML document correctly, we must choose a proper character encoding. The different types of character encoding include: ASCII
2 min read
Python Encode Unicode and non-ASCII characters into JSON This article will provide a comprehensive guide on how to work with Unicode and non-ASCII characters in Python when generating and parsing JSON data. We will look at the different ways to handle Unicode and non-ASCII characters in JSON. By the end of this article, you should have a good understandin
5 min read