UTF-8 to Wide Char Conversion in C++ STL
Last Updated :
18 Jul, 2024
UTF - 8 is a variable-length encoding that represents Unicode characters using 1 to 4 bytes. It’s widely used for text storage and transmission due to its compactness and compatibility with ASCII. Wide Characters (wchar_t) is a type that represents a single character in a wide character encoding (usually UTF-16 or UTF-32). The size of wchar_t varies across platforms (e.g., 2 bytes on Windows, 4 bytes on Unix-like systems).
In this article, we’ll explore how to convert between UTF-8 and wide character (wchar_t) strings using the C++ standard library.
Methods to Convert UTF-8 characters to Wide Char in C++
There are multiple methods to convert between UTF-8 and wide character (wchar_t) strings using the C++ standard library. Here are few of them:
1. Convert UTF-8 characters to Wide Char using std::wstring_convert
std::wstring_convert
is part of the C++11 standard library, defined in the <codecvt>
header. It's a template class that facilitates conversions between different character encodings.
Syntax to Create std::wstring_convert
wstring_convert<facet> converter;
where, facet is the codecvt facet for the conversion of the given type of character string to another. For UTF-8 to wchar conversion, it is: codecvt_utf8.
Afterwards, we can use this convertor to convert the given string as shown in the below
Example
C++
// C++ program to convert utf8 to wchar_t using wstring_convert
#include <iostream>
#include <string>
#include <codecvt>
#include <locale>
using namespace std;
int main() {
// UTF-8 encoded string
string utf8_str = "Hello, 世界";
// Create a wstring_convert object
wstring_convert<codecvt_utf8<wchar_t>> converter;
// Convert UTF-8 string to wide string
wstring wide_str = converter.from_bytes(utf8_str);
// Output the wide string
wcout << L"Converted wide string: " << wide_str << endl;
return 0;
}
OutputConverted wide string: Hello, ??
Time Complexity: O(n), where n is the number of characters in the string.
Space Complexity: O(1)
2. Convert UTF-8 characters to Wide Char Using std::mbstowcs
The std::mbstowcs
function is used to convert a multibyte string to a wide character string. It is defined inside <cstdlib> header file.
Syntax
mbstowcs(dest, src, len);
where,
- dest: destination string.
- src: source string
- len: length of the string to be converted.
But before using this function, we need to set the locale to a locale that supports UTF-8. We can do that using the following statement:
setlocale(LC_ALL, "");
Example
C++
// C++ program to convert utf8 to wchar_t using mbstowcs
#include <cstdlib>
#include <iostream>
#include <string>
using namespace std;
int main()
{
// Set locale to handle UTF-8 multibyte characters
setlocale(LC_ALL, "");
// UTF-8 encoded string
string utf8_str = "Hello, 世界";
// Convert UTF-8 string to wide string
wstring wide_str(utf8_str.size(), L'\0');
mbstowcs(&wide_str[0], utf8_str.c_str(),
utf8_str.size());
// Output the wide string
wcout << L"Converted wide string: " << wide_str << endl;
return 0;
}
Output
Converted wide string: Hello, 世界
Time Complexity: O(n), where n is the number of characters in the string.
Space Complexity: O(1)
3. Convert UTF-8 characters to Wide Char Using MultiByteToWideChar on Windows
In C++, MultiByteToWideChar() is a Windows API function that converts a string from a multibyte character set to a wide character (Unicode) string. It's part of the Windows SDK defined inside windows.h header file.
Example
C++
// C++ program to convert utf8 to wchar_t using
// MultiByteToWideChar
#include <iostream>
#include <string>
#include <windows.h>
using namespace std;
int main()
{
// UTF-8 encoded string
string utf8_str = "Hello, 世界";
// Determine the length of the wide string
int len = MultiByteToWideChar(
CP_UTF8, 0, utf8_str.c_str(), -1, nullptr, 0);
if (len == 0) {
cerr << "Error in MultiByteToWideChar: "
<< GetLastError() << endl;
return 1;
}
// Convert the string
wstring wide_str(len, 0);
MultiByteToWideChar(CP_UTF8, 0, utf8_str.c_str(), -1,
&wide_str[0], len);
// Output the wide string
wcout << L"Converted wide string: " << wide_str << endl;
return 0;
}
Output
Converted wide string: Hello, 世界
Time Complexity: O(n), where n is the number of characters in the string.
Space Complexity: O(1)
4. Convert UTF-8 characters to Wide Char Using iconv
on Unix-like Systems
iconv
is a standardized library for converting between character encodings. It's available on Unix-like systems under iconv.h header file.
Example
C++
// C++ program to convert utf8 to wchar_t using iconv
#include <iconv.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int main()
{
// UTF-8 encoded string
string utf8_str = "Hello, 世界";
// Open iconv descriptor
iconv_t conv = iconv_open("WCHAR_T", "UTF-8");
if (conv == (iconv_t)-1) {
perror("iconv_open");
return 1;
}
// Set up conversion buffers
size_t in_bytes = utf8_str.size();
char* in_buf = const_cast<char*>(utf8_str.c_str());
vector<wchar_t> wide_buf(in_bytes + 1);
char* out_buf
= reinterpret_cast<char*>(wide_buf.data());
size_t out_bytes = wide_buf.size() * sizeof(wchar_t);
// Perform conversion
if (iconv(conv, &in_buf, &in_bytes, &out_buf,
&out_bytes)
== (size_t)-1) {
perror("iconv");
iconv_close(conv);
return 1;
}
// Create wide string
wstring wide_str(wide_buf.data());
// Close iconv descriptor
iconv_close(conv);
// Output the wide string
wcout << L"Converted wide string: " << wide_str << endl;
return 0;
}
Output
Converted wide string: Hello, 世界
Time Complexity: O(n), where n is the number of characters in the string.
Space Complexity: O(1)
Similar Reads
Type Conversion in C++
Type conversion means converting one type of data to another compatible type such that it doesn't lose its meaning. It is essential for managing different data types in C++. Let's take a look at an example: [GFGTABS] C++ #include <iostream> using namespace std; int main() { // Two variables of
4 min read
Wide char and library functions in C++
Wide char is similar to char data type, except that wide char take up twice the space and can take on much larger values as a result. char can take 256 values which corresponds to entries in the ASCII table. On the other hand, wide char can take on 65536 values which corresponds to UNICODE values wh
7 min read
Convert String to int in C++
Converting a string to int is one of the most frequently encountered tasks in C++. As both string and int are not in the same object hierarchy, we cannot perform implicit or explicit type casting as we can do in case of double to int or float to int conversion. Conversion is mostly done so that we c
8 min read
Multi-Character Literal in C/C++
Character literals for C and C++ are char, string, and their Unicode and Raw type. Also, there is a multi-character literal that contains more than one c-char. A single c-char literal has type char and a multi-character literal is conditionally-supported, has type int, and has an implementation-defi
6 min read
Convert String to size_t in C++
To convert String to size_t in C++ we will use stringstream, It associates a string object with a stream allowing you to read from the string as if it were a stream (like cin). We must include the stream header file in order to use stringstream. When parsing input, the stringstream class comes in qu
1 min read
strtod() function in C/C++
The strtod() is a builtin function in C and C++ STL which interprets the contents of the string as a floating point number and return its value as a double. It sets a pointer to point to the first character after the last valid character of the string, only if there is any, otherwise it sets the poi
4 min read
strtoul() function in C/C++
The strtoul() function in C/C++ which converts the initial part of the string in str to an unsigned long int value according to the given base, which must be between 2 and 36 inclusive, or be the special value 0. This function discard any white space characters until the first non-whitespace charact
3 min read
Char Comparison in C
Char is a keyword used for representing characters in C. Character size in C is 1 byte. There are two methods to compare characters in C and these are: Using ASCII valuesUsing strcmp( ) .1. Using ASCII values to compare characters The first method is pretty simple, we all know that each character ca
3 min read
How to write long strings in Multi-lines C/C++?
Image a situation where we want to use or print a long long string in C or C++, how to do this? In C/C++, we can break a string at any point in the middle using two double quotes in the middle. Below is a simple example to demonstrate the same. C/C++ Code #include<stdio.h> int main() { // We c
2 min read
Convert given Binary Array to String in C++ with Examples
Given a binary array arr[] containing N integer elements, the task is to create a string s which contains all N elements at the same indices as they were in array arr[]. Example: Input: arr[] = {0, 1, 0, 1}Output: string = "0101" Input: arr[] = { 1, 1, 0, 0, 1, 1}Output: string = "110011" Different
6 min read