Package org.apache.directory.api.util
Class Unicode
- java.lang.Object
-
- org.apache.directory.api.util.Unicode
-
public final class Unicode extends Object
Various unicode manipulation methods that are more efficient then chaining operations: all is done in the same buffer without creating a bunch of string objects.- Author:
- Apache Directory Project
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static char
bytesToChar(byte[] bytes)
Return the Unicode char which is coded in the bytes at position 0.static char
bytesToChar(byte[] bytes, int pos)
Return the Unicode char which is coded in the bytes at the given position.static byte[]
charToBytes(char car)
Return the Unicode char which is coded in the bytes at the given position.static int
countBytes(char[] chars)
Count the number of bytes included in the given char[].static int
countBytesPerChar(byte[] bytes, int pos)
Count the number of bytes needed to return an Unicode char.static int
countChars(byte[] bytes)
Count the number of chars included in the given byte[].static int
countNbBytesPerChar(char car)
Return the number of bytes that hold an Unicode char.static boolean
isUnicodeSubset(byte b)
Check if the current byte is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'static boolean
isUnicodeSubset(char c)
Check if the current char is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'static boolean
isUnicodeSubset(String str, int pos)
Check if the current char is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'static String
readUTF(ObjectInput objectInput)
Reads in a string that has been encoded using a modified UTF-8 format.static void
writeUTF(ObjectOutput objectOutput, String str)
Writes four bytes of length information to the output stream, followed by the modified UTF-8 representation of every character in the string str.
-
-
-
Method Detail
-
countBytesPerChar
public static int countBytesPerChar(byte[] bytes, int pos)
Count the number of bytes needed to return an Unicode char. This can be from 1 to 6.- Parameters:
bytes
- The bytes to readpos
- Position to start counting. It must be a valid start of a encoded char !- Returns:
- The number of bytes to create a char, or -1 if the encoding is wrong. TODO : Should stop after the third byte, as a char is only 2 bytes long.
-
bytesToChar
public static char bytesToChar(byte[] bytes)
Return the Unicode char which is coded in the bytes at position 0.- Parameters:
bytes
- The byte[] represntation of an Unicode string.- Returns:
- The first char found.
-
bytesToChar
public static char bytesToChar(byte[] bytes, int pos)
Return the Unicode char which is coded in the bytes at the given position.- Parameters:
bytes
- The byte[] represntation of an Unicode string.pos
- The current position to start decoding the char- Returns:
- The decoded char, or -1 if no char can be decoded TODO : Should stop after the third byte, as a char is only 2 bytes long.
-
countNbBytesPerChar
public static int countNbBytesPerChar(char car)
Return the number of bytes that hold an Unicode char.- Parameters:
car
- The character to be decoded- Returns:
- The number of bytes to hold the char. TODO : Should stop after the third byte, as a char is only 2 bytes long.
-
countBytes
public static int countBytes(char[] chars)
Count the number of bytes included in the given char[].- Parameters:
chars
- The char array to decode- Returns:
- The number of bytes in the char array
-
countChars
public static int countChars(byte[] bytes)
Count the number of chars included in the given byte[].- Parameters:
bytes
- The byte array to decode- Returns:
- The number of char in the byte array
-
charToBytes
public static byte[] charToBytes(char car)
Return the Unicode char which is coded in the bytes at the given position.- Parameters:
car
- The character to be transformed to an array of bytes- Returns:
- The byte array representing the char TODO : Should stop after the third byte, as a char is only 2 bytes long.
-
isUnicodeSubset
public static boolean isUnicodeSubset(String str, int pos)
Check if the current char is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'- Parameters:
str
- The string to checkpos
- Position of the current char- Returns:
- True if the current char is in the unicode subset
-
isUnicodeSubset
public static boolean isUnicodeSubset(char c)
Check if the current char is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'- Parameters:
c
- The char to check- Returns:
- True if the current char is in the unicode subset
-
isUnicodeSubset
public static boolean isUnicodeSubset(byte b)
Check if the current byte is in the unicodeSubset : all chars but '\0', '(', ')', '*' and '\'- Parameters:
b
- The byte to check- Returns:
- True if the current byte is in the unicode subset
-
writeUTF
public static void writeUTF(ObjectOutput objectOutput, String str) throws IOException
Writes four bytes of length information to the output stream, followed by the modified UTF-8 representation of every character in the string str. If str is null, the string value 'null' is written with a length of 0 instead of throwing an NullPointerException. Each character in the string s is converted to a group of one, two, or three bytes, depending on the value of the character. Due to given restrictions (total number of written bytes in a row can't exceed 65535) the total length is written in the length information (four bytes (writeInt)) and the string is split into smaller parts if necessary and written. As each character may be converted to a group of maximum 3 bytes and 65535 bytes can be written at maximum we're on the save side when writing a chunk of only 21845 (65535/3) characters at once. See alsoDataOutput.writeUTF(String)
.- Parameters:
objectOutput
- The objectOutput to write tostr
- The value to write- Throws:
IOException
- If the value can't be written to the file
-
readUTF
public static String readUTF(ObjectInput objectInput) throws IOException
Reads in a string that has been encoded using a modified UTF-8 format. The general contract of readUTF is that it reads a representation of a Unicode character string encoded in modified UTF-8 format; this string of characters is then returned as a String. First, four bytes are read (readInt) and used to construct an unsigned 16-bit integer in exactly the manner of the readUnsignedShort method . This integer value is called the UTF length and specifies the number of additional bytes to be read. These bytes are then converted to characters by considering them in groups. The length of each group is computed from the value of the first byte of the group. The byte following a group, if any, is the first byte of the next group. See alsoDataInput.readUTF()
.- Parameters:
objectInput
- The objectInput to read from- Returns:
- The read string
- Throws:
IOException
- If the value can't be read
-
-