Open In App

UTF-8 to Wide Char Conversion in C++ STL

Last Updated : 18 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

UTF - 8 is a variable-length encoding that represents Unicode characters using 1 to 4 bytes. It’s widely used for text storage and transmission due to its compactness and compatibility with ASCII. Wide Characters (wchar_t) is a type that represents a single character in a wide character encoding (usually UTF-16 or UTF-32). The size of wchar_t varies across platforms (e.g., 2 bytes on Windows, 4 bytes on Unix-like systems).

In this article, we’ll explore how to convert between UTF-8 and wide character (wchar_t) strings using the C++ standard library.

Methods to Convert UTF-8 characters to Wide Char in C++

There are multiple methods to convert between UTF-8 and wide character (wchar_t) strings using the C++ standard library. Here are few of them:

1. Convert UTF-8 characters to Wide Char using std::wstring_convert

std::wstring_convert is part of the C++11 standard library, defined in the <codecvt> header. It's a template class that facilitates conversions between different character encodings.

Syntax to Create std::wstring_convert

wstring_convert<facet> converter;

where, facet is the codecvt facet for the conversion of the given type of character string to another. For UTF-8 to wchar conversion, it is: codecvt_utf8.

Afterwards, we can use this convertor to convert the given string as shown in the below

Example

C++
// C++ program to convert utf8 to wchar_t using wstring_convert
#include <iostream>
#include <string>
#include <codecvt>
#include <locale>

using namespace std;

int main() {
    // UTF-8 encoded string
    string utf8_str = "Hello, 世界"; 

    // Create a wstring_convert object
    wstring_convert<codecvt_utf8<wchar_t>> converter;

    // Convert UTF-8 string to wide string
    wstring wide_str = converter.from_bytes(utf8_str);

    // Output the wide string
    wcout << L"Converted wide string: " << wide_str << endl;

    return 0;
}

Output
Converted wide string: Hello, ??

Time Complexity: O(n), where n is the number of characters in the string.
Space Complexity: O(1)

2. Convert UTF-8 characters to Wide Char Using std::mbstowcs

The std::mbstowcs function is used to convert a multibyte string to a wide character string. It is defined inside <cstdlib> header file.

Syntax

mbstowcs(dest, src, len);

where,

  • dest: destination string.
  • src: source string
  • len: length of the string to be converted.

But before using this function, we need to set the locale to a locale that supports UTF-8. We can do that using the following statement:

setlocale(LC_ALL, "");

Example

C++
// C++ program to convert utf8 to wchar_t using mbstowcs
#include <cstdlib>
#include <iostream>
#include <string>

using namespace std;

int main()
{
    // Set locale to handle UTF-8 multibyte characters
    setlocale(LC_ALL, "");

    // UTF-8 encoded string
    string utf8_str = "Hello, 世界";

    // Convert UTF-8 string to wide string
    wstring wide_str(utf8_str.size(), L'\0');
    mbstowcs(&wide_str[0], utf8_str.c_str(),
             utf8_str.size());

    // Output the wide string
    wcout << L"Converted wide string: " << wide_str << endl;

    return 0;
}


Output

Converted wide string: Hello, 世界

Time Complexity: O(n), where n is the number of characters in the string.
Space Complexity: O(1)

3. Convert UTF-8 characters to Wide Char Using MultiByteToWideChar on Windows

In C++, MultiByteToWideChar() is a Windows API function that converts a string from a multibyte character set to a wide character (Unicode) string. It's part of the Windows SDK defined inside windows.h header file.

Example

C++
// C++ program to convert utf8 to wchar_t using
// MultiByteToWideChar
#include <iostream>
#include <string>
#include <windows.h>

using namespace std;

int main()
{
    // UTF-8 encoded string
    string utf8_str = "Hello, 世界";

    // Determine the length of the wide string
    int len = MultiByteToWideChar(
        CP_UTF8, 0, utf8_str.c_str(), -1, nullptr, 0);
    if (len == 0) {
        cerr << "Error in MultiByteToWideChar: "
             << GetLastError() << endl;
        return 1;
    }

    // Convert the string
    wstring wide_str(len, 0);
    MultiByteToWideChar(CP_UTF8, 0, utf8_str.c_str(), -1,
                        &wide_str[0], len);

    // Output the wide string
    wcout << L"Converted wide string: " << wide_str << endl;

    return 0;
}


Output

Converted wide string: Hello, 世界

Time Complexity: O(n), where n is the number of characters in the string.
Space Complexity: O(1)

4. Convert UTF-8 characters to Wide Char Using iconv on Unix-like Systems

iconv is a standardized library for converting between character encodings. It's available on Unix-like systems under iconv.h header file.

Example

C++
// C++ program to convert utf8 to wchar_t using iconv
#include <iconv.h>
#include <iostream>
#include <string>
#include <vector>

using namespace std;

int main()
{
    // UTF-8 encoded string
    string utf8_str = "Hello, 世界";

    // Open iconv descriptor
    iconv_t conv = iconv_open("WCHAR_T", "UTF-8");
    if (conv == (iconv_t)-1) {
        perror("iconv_open");
        return 1;
    }

    // Set up conversion buffers
    size_t in_bytes = utf8_str.size();
    char* in_buf = const_cast<char*>(utf8_str.c_str());

    vector<wchar_t> wide_buf(in_bytes + 1);
    char* out_buf
        = reinterpret_cast<char*>(wide_buf.data());
    size_t out_bytes = wide_buf.size() * sizeof(wchar_t);

    // Perform conversion
    if (iconv(conv, &in_buf, &in_bytes, &out_buf,
              &out_bytes)
        == (size_t)-1) {
        perror("iconv");
        iconv_close(conv);
        return 1;
    }

    // Create wide string
    wstring wide_str(wide_buf.data());

    // Close iconv descriptor
    iconv_close(conv);

    // Output the wide string
    wcout << L"Converted wide string: " << wide_str << endl;

    return 0;
}

Output

Converted wide string: Hello, 世界

Time Complexity: O(n), where n is the number of characters in the string.
Space Complexity: O(1)









































































Next Article
Article Tags :
Practice Tags :

Similar Reads