Open In App

What is utf 8 in HTML

Last Updated : 10 Oct, 2024
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

When creating websites and web applications, one important can ensuring that content displays correctly for users around the world. Text encoding can play a critical role in this, as it defines how characters are represented in the digital form. UTF-8 (Unicode Transformation Format 8-bit) is one of the most commonly used text encodings on the web. It can ensure that the wide range of characters from different languages can be displayed properly.

These are the following topics that we are going to discuss:

What is UTF-8?

UTF-8 is the character that represents each character using one to four bytes. It can be capable of encoding all 1,112,064 valid character code points in the Unicode using a variable-length encoding. This flexibility can make it efficient for encoding common ASCII characters, which can use only one byte, while also supporting more complex characters from different scripts with additional bytes.

Why Use UTF-8 in HTML?

  • Global Compatibility: UTF-8 can allow websites to support characters from nearly all languages, making it can be suitable for international websites. It includes support for the alphabet, symbols, emojis, and more.
  • Standard Encoding for the Web: As per the World Wide Web Consortium (W3C) recommendations. UTF-8 is the default character encoding for the HTML5. Most modern browsers and web servers also default to the UTF-8.
  • Space Efficiency: For texts primarily in English or containing common symbols, UTF-8 is space-efficient since it can use a single byte for the ASCII characters. Non-ASCII characters may be used for the four bytes, but the flexibility outweighs the minimal storage overhead.

Setting UTF-8 in HTML

To ensure that the HTML document uses the UTF-8 encoding, we should specify it within the <head> section using the <meta> tag. How the how you can do it:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>UTF-8 in HTML</title>
</head>
<body>
<h1>Welcome to UTF-8 Encoded Page</h1>
<p>This page is encoded with UTF-8, making it capable of displaying characters from various languages, like हिंदी, 中文, عربى, and more.</p>
</body>
</html>

In the above example:

  • The meta tag with charset="UTF-8" ensures that the browser interprets the HTML file using the UTF-8 encoding.
  • It can allows the page to display the special characters and symbols correctly.

UTF-8 and Special Characters

UTF-8 can enables you to include the special characters directly in the HTML content. This can be particularly useful when you need to display the non-English characters, emojis, or mathematical symbols. For example:

<p>Smiley Face Emoji: 😊</p>
<p>Math Symbol: ∑ (summation)</p>
<p>Chinese Characters: 汉字</p>

Without the proper UTF-8 encoding then the browser might display such characters as the garbled text (known as "mojibake").

Verifying UTF-8 Encoding

It can be essential that the server is also sending the correct content type header specifying the UTF-8. This can be done through the server configuration files like .htaccess for Apache or nginx.conf for Nginx.

For Apache, we can add the following line to the .htaccess file:

AddDefaultCharset UTF-8

For the Nginx, add this in the nginx.conf file:

charset utf-8;

These configurations ensure that the server tells the browser to interpret the content as the UTF-8.

Example: This example will demonstrates how to set up the HTML page with UTF-8 encoding and how to display the text and symbols from various language. We will create the simple webpage that showcases text in the multiple languages, special characters, and emojis, all encoded with UTF-8.

HTML
<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <title>UTF-8 HTML Example</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            line-height: 1.5;
            background-color: #f4f4f4;
            margin: 0;
            padding: 20px;
        }

        .container {
            max-width: 800px;
            margin: 0 auto;
            background: white;
            padding: 20px;
            border-radius: 8px;
            box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
        }

        h1 {
            color: #333;
        }

        .example-text {
            margin-top: 20px;
            padding: 10px;
            background: #e9e9e9;
            border-radius: 5px;
        }
    </style>
</head>

<body>
    <div class="container">
        <h1>Welcome to the UTF-8 HTML Example</h1>
        <p>This page demonstrates UTF-8 encoding in an HTML 
            document, allowing the display of a wide range of
            characters, including various languages, symbols, and emojis.</p>

        <div class="example-text">
            <h2>Languages:</h2>
            <p>English: Hello, World!</p>
            <p>Spanish: ¡Hola, Mundo!</p>
            <p>Chinese: 你好,世界!</p>
            <p>Hindi: नमस्ते, दुनिया!</p>
            <p>Arabic: مرحباً بالعالم!</p>
            <p>Japanese: こんにちは、世界!</p>
            <p>Korean: 안녕하세요, 세계!</p>
        </div>

        <div class="example-text">
            <h2>Special Characters & Symbols:</h2>
            <p>Mathematical Symbol: ∑ (summation)</p>
            <p>Greek Letter: Ω (Omega)</p>
            <p>Currencies: $ (Dollar), € (Euro), ¥ (Yen), ₹ (Rupee)</p>
        </div>

        <div class="example-text">
            <h2>Emojis:</h2>
            <p>Smiley Face: 😊</p>
            <p>Rocket: 🚀</p>
            <p>Thumbs Up: 👍</p>
            <p>Earth Globe: 🌍</p>
        </div>

        <footer>
            <p>&copy; 2024 UTF-8 HTML Example. All rights reserved.</p>
        </footer>
    </div>
</body>

</html>

Output:

Conclusion

UTF-8 is the most widely used character encoding format for the web pages. It can ensures that the website can display the diverse content from various languages, making it essential for creating the global user experience, By including the meta charset="UTF-8" in the HTML, we can avoid character display issues and ensure that the content is accessible to the everyone.


Next Article
Article Tags :

Similar Reads