CharsetDecoder Class in Java
Last Updated :
27 May, 2024
For encoding and decoding tasks, many methods are offered in Charset Encoder and Charset Decoder classes in Java. The Charset Decoder class is used for text handling to convert bytes to characters. The Charset decoder accepts a sequence of bytes as its input and displays Unicode characters as output. The input provided to the charset decoder must belong to the UTF-8 character set. In Software Development, charset is used for Text handling and decoding control processes.
What is CharsetDecoder?
The Charset Decoder class is imported from the "java.nio.charset" package with functionalities of the decoder class complementing with the encoder class. Input buffers are used to process the input byte sequence as distinct buffers. The output of each buffer is written to a character buffer which upon concatenation forms meaningful strings.
This decoder is used by making the following sequence of method invocations:
- The reset method resets the value of the decoder unless previously used.
- Decode method with the endOfInput argument as "false" fills the input buffer and flushes the output buffer between invocations. The false value of the argument conveys that the input set of bytes may not be complete. So, decoder will process as many bytes as possible in the input buffer.
- The decode method with the endOfInput argument as "true" is passed one final time and then the flush method is used so that the decoder can flush any internal state to the output buffer.
The above sequence of operations together comprise the functions of the decode method. Each invocation of this decoding method will decode as many bytes as possible from the input buffer and write the resulting characters to the output buffer.
Syntax of CharsetDecoder
public abstract class CharsetDecoder extends Object
Constructor of CharsetDecoder
Constructor
| Modifier
| Description
|
---|
CharsetDecoder(Charset cs, float averageCharsPerByte, float maxCharsPerByte)
| protected
| This is the initialization of the decoder
|
---|
Methods in CharsetDecoder
Modifier with type
| Method
| Description
|
---|
final float
| averageCharsPerByte()
| Returns average number of characters produced for each byte of input.
|
---|
final Charset
| charset()
| Returns charset that created this decoder.
|
---|
final CharBuffer
| decode(ByteBuffer in)
| Convenience method that decodes remaining content of single input byte buffer into a newly-allocated character buffer.
|
---|
final CoderResult
| decode(ByteBuffer in, CharBuffer out, boolean endOfInput)
| Decodes multiple possible bytes from input buffer, and writes results to the output buffer.
|
---|
protected abstract CoderResult
| decodeLoop(ByteBuffer in, CharBuffer out)
| Decodes one or more bytes into one or more characters.
|
---|
Charset
| detectedCharset()
| Retrieves charset detected by the decoder (optional operation).
|
---|
final CoderResult
| flush(CharBuffer out)
| Flushes decoder.
|
---|
protected CoderResult
| implFlush(CharBuffer out)
| Flushes decoder.
|
---|
protected void
| implOnMalformedInput(CodingErrorAction newAction)
| Reports a change to decoder's malformed-input action.
|
---|
protected void
| implOnUnmappableCharacter(CodingErrorAction newAction)
| Reports a change to unmappable-character action of decoder.
|
---|
protected void
| implReplaceWith(String newReplacement)
| Reports a change to this decoder's replacement value.
|
---|
protected void
| implReset()
| Resets decoder and clears any charset-specific internal state.
|
---|
Boolean
| isAutoDetecting()
| Tells whether decoder implements an auto-detecting charset.
|
---|
Boolean
| isCharsetDetected()
| Tells whether or not decoder has detected a charset (optional operation).
|
---|
CodingErrorAction
| malformedInputAction()
| Returns decoder's last action for malformed-inputs.
|
---|
final float
| maxCharsPerByte()
| Returns the maximum number of characters produced for each input byte.
|
---|
final CharsetDecoder
| onMalformedInput(CodingErrorAction newAction)
| Changes decoder's action for malformed-input errors.
|
---|
final CharsetDecoder
| onUnmappableCharacter(CodingErrorAction newAction)
| Changes decoder's action for unmappable-character errors.
|
---|
final String
| replacement()
| Returns replacement value of decoder.
|
---|
final CharsetDecoder
| replaceWith(String newReplacement)
| Changes replacement value of decoder.
|
---|
final CharsetDecoder
| reset()
| Resets this decoder, after clearing internal state.
|
---|
CodingErrorAction
| unmappableCharacterAction()
| Returns unmappable-character errors of decoder and returns them.
|
---|
Above table describing modifier, method and it's description has been mentioned according to the documentation in Java SDK 21. The java SE APIs 21 defines core Java platform for general purpose computing.
Error Handling during decoding
There are 2 cases of error handling in CharsetDecoder class namely, malformed byte sequence or unmappable character. The error can be dealt with using ignore, report or replace with actions. Malformed inputs can be reported using onMalformedInput method.
Java
// Java Program to Implement
// CharsetDecoder Class
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;//decoding operations library
import java.nio.charset.CoderResult;// result promtomg!
import java.nio.charset.CodingErrorAction; // Error message handling
class GFG {
public static void main (String[] args) {
byte[] bytes = { (byte) 0x40, (byte) 0x40};// '@' is being passed twice in UTF - 8
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
//Error handling actions for malformed and unmappable characters
decoder.onMalformedInput(CodingErrorAction.REPLACE);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
//I/O buffer creation
CharBuffer charStore = CharBuffer.allocate(bytes.length);
ByteBuffer utfStore = ByteBuffer.wrap(bytes);
//Output string instance to concatenate each decoded byte
StringBuilder decodedText = new StringBuilder();
CoderResult result;
do {
result = decoder.decode(utfStore, charStore, false);
charStore.flip();
decodedText.append(charStore);
charStore.clear();
if (result.isError()) {
// Error handling logic
if (result.isMalformed()) {
System.err.println("Encountered malformed byte sequence!");//Malformed error
} else if (result.isUnmappable()) {
System.err.println("Encountered unmappable character!");//Unmappable error
}
}
} while (!result.isUnderflow());
System.out.println("Decoded Text: " + decodedText); // Decoded text is shown as output!
}
}
Explanation of the above Program:
The above code depicts how '@' is being decoded using CharsetDecoder library with correct error handling procedure. Following steps are involved in above code:
- bytes is an input list of type byte containing '@' passed as UTF-8 encoded values.
- Error handling is performed such that malformed inputs are replaced and unmappable inputs are reported.
- Two buffers CharBuffer and byteBuffer are created to store output characters and input byte sequence respectively.
- decodedText is used to concate each output character and display the output to users.
- The do-while performs the decoding task for each byte in the input byte sequence using previously created buffers under underflow condition.
- After completing the loop output of the decosing program is prointed to the terminal as '@@'.
Similar Reads
java.nio.charset.CharsetEncoder Class in Java
For the purpose of character encoding and decoding, java offers a number of classes in the 'java.nio.charset' package. The 'CharsetEncoder' class of this package performs the important task of encoding. In this article, let us understand this class, its syntax, different methods, and some examples o
6 min read
CharMatcher Class | Guava | Java
CharMatcher determines a true or false value for any Java char value. This class provides various methods to handle various Java types for char values. Declaration: The declaration for com.google.common.base.CharMatcher is as: @GwtCompatible(emulated = true) public final class CharMatcher extends Ob
3 min read
java.nio.charset.Charset Class in Java
In Java, Charset is a mapping technique used in Java to map the 16-bit Unicode sequence and sequences of bytes. It is also used to encode and decode the string data text into different character encoding. It comes under java.nio.charset.Charset package. The charset must begin with a number or letter
2 min read
CharsetDecoder charset() in Java with examples
CharsetDecoder.charset() is an in-built method in Java of CharsetDecoder class that returns the charset that created this decoder. Syntax: public final Charset charset() Parameter: The function does not accepts any parameter. Return value: The function returns the decoder's charset. Program below de
1 min read
Chars Class | Guava | Java
Chars is a utility class for primitive type char. It provides Static utility methods pertaining to char primitives, that are not already found in either Character or Arrays. All the operations in this class treat char values strictly numerically, i.e, they are neither Unicode-aware nor locale-depend
3 min read
CharsetDecoder reset() method in Java with Examples
The reset() method is a built-in method of the java.nio.charset.CharsetDecoder class which resets this CharsetDecoder and clears its internal state. Syntax: public final CharsetDecoder reset() Parameters: The function does not accepts any parameter. Return Value: The function returns this CharsetDec
1 min read
CaseFormat Class | Guava | Java
CaseFormat is a utility class for converting between various ASCII case formats. Behavior is undefined for non-ASCII input. Declaration: The declaration for com.google.common.base.CaseFormat is as: @GwtCompatible public enum CaseFormat extends Enum Below table gives the summary of Enum Constants and
2 min read
CharsetDecoder detectedCharset() method in Java with Examples
The detectedCharset() method is a built-in method of the java.nio.charset.CharsetDecoder class which retrieves the charset that has been detected by this decoder. The default implementation of this method always throws an UnsupportedOperationException. It should be overridden by auto-detecting decod
2 min read
CharsetEncoder reset() method in Java with Examples
The reset() method is a built-in method of the java.nio.charset.CharsetEncoder resets this encoder, and clears all the internal states if there are any. It also resets charset-independent state and also invokes the implReset method in order to perform any charset-specific reset actions. Syntax: publ
2 min read
Java.lang.Character.Subset Class in Java
Character.Subset Class represents particular subsets of the Unicode(standards using hexadecimal values to express characters - 16bit) character set. The subset, it defines in Character set is UnicodeBlock. Declaration : public static class Character.Subset extends Object Constructors : protected Cha
2 min read