Class StringUtil


  • @Internal
    public class StringUtil
    extends java.lang.Object
    Collection of string handling utilities
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  StringUtil.StringsIterator
      An Iterator over an array of Strings.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.nio.charset.Charset BIG5  
      static java.nio.charset.Charset UTF16LE  
      static java.nio.charset.Charset UTF8  
      static java.nio.charset.Charset WIN_1252  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static int countMatches​(java.lang.CharSequence haystack, char needle)
      Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches
      static boolean endsWithIgnoreCase​(java.lang.String haystack, java.lang.String suffix)
      Tests if the string ends with the specified suffix, ignoring case consideration.
      static int getEncodedSize​(java.lang.String value)  
      static java.lang.String getFromCompressedUnicode​(byte[] string, int offset, int len)
      Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return.
      static java.lang.String getFromUnicodeLE​(byte[] string)
      Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.
      static java.lang.String getFromUnicodeLE​(byte[] string, int offset, int len)
      Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.
      static java.lang.String getFromUnicodeLE0Terminated​(byte[] string, int offset, int len)
      Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.
      static java.lang.String getPreferredEncoding()  
      static byte[] getToUnicodeLE​(java.lang.String string)
      Convert String to 16-bit unicode characters in little endian format
      static boolean hasMultibyte​(java.lang.String value)
      check the parameter has multibyte character
      static boolean isUnicodeString​(java.lang.String value)
      Checks to see if a given String needs to be represented as Unicode
      static boolean isUpperCase​(char c)  
      static java.lang.String join​(java.lang.Object[] array)  
      static java.lang.String join​(java.lang.Object[] array, java.lang.String separator)  
      static java.lang.String join​(java.lang.String separator, java.lang.Object... array)  
      static void mapMsCodepoint​(int msCodepoint, int unicodeCodepoint)  
      static java.lang.String mapMsCodepointString​(java.lang.String string)
      Some strings may contain encoded characters of the unicode private use area.
      static void putCompressedUnicode​(java.lang.String input, byte[] output, int offset)
      Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage).
      static void putCompressedUnicode​(java.lang.String input, LittleEndianOutput out)  
      static void putUnicodeLE​(java.lang.String input, byte[] output, int offset)
      Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array.
      static void putUnicodeLE​(java.lang.String input, LittleEndianOutput out)  
      static java.lang.String readCompressedUnicode​(LittleEndianInput in, int nChars)  
      static java.lang.String readUnicodeLE​(LittleEndianInput in, int nChars)  
      static java.lang.String readUnicodeString​(LittleEndianInput in)
      InputStream in is expected to contain: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
      static java.lang.String readUnicodeString​(LittleEndianInput in, int nChars)
      InputStream in is expected to contain: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
      static boolean startsWithIgnoreCase​(java.lang.String haystack, java.lang.String prefix)
      Tests if the string starts with the specified prefix, ignoring case consideration.
      static java.lang.String toLowerCase​(char c)  
      static java.lang.String toUpperCase​(char c)  
      static void writeUnicodeString​(LittleEndianOutput out, java.lang.String value)
      OutputStream out will get: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
      static void writeUnicodeStringFlagAndData​(LittleEndianOutput out, java.lang.String value)
      OutputStream out will get: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • UTF16LE

        public static final java.nio.charset.Charset UTF16LE
      • UTF8

        public static final java.nio.charset.Charset UTF8
      • WIN_1252

        public static final java.nio.charset.Charset WIN_1252
      • BIG5

        public static final java.nio.charset.Charset BIG5
    • Method Detail

      • getFromUnicodeLE

        public static java.lang.String getFromUnicodeLE​(byte[] string,
                                                        int offset,
                                                        int len)
                                                 throws java.lang.ArrayIndexOutOfBoundsException,
                                                        java.lang.IllegalArgumentException
        Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.

        { 0x16, 0x00 } -0x16

        Parameters:
        string - the byte array to be converted
        offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
        len - the length of the final string
        Returns:
        the converted string, never null.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
        java.lang.IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)
      • getFromUnicodeLE

        public static java.lang.String getFromUnicodeLE​(byte[] string)
        Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.

        { 0x16, 0x00 } -0x16

        Parameters:
        string - the byte array to be converted
        Returns:
        the converted string, never null
      • getToUnicodeLE

        public static byte[] getToUnicodeLE​(java.lang.String string)
        Convert String to 16-bit unicode characters in little endian format
        Parameters:
        string - the string
        Returns:
        the byte array of 16-bit unicode characters
      • getFromCompressedUnicode

        public static java.lang.String getFromCompressedUnicode​(byte[] string,
                                                                int offset,
                                                                int len)
        Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)
        Parameters:
        string - byte array to read
        offset - offset to read byte array
        len - length to read byte array
        Returns:
        String generated String instance by reading byte array
      • readCompressedUnicode

        public static java.lang.String readCompressedUnicode​(LittleEndianInput in,
                                                             int nChars)
      • readUnicodeString

        public static java.lang.String readUnicodeString​(LittleEndianInput in)
        InputStream in is expected to contain:
        1. ushort nChars
        2. byte is16BitFlag
        3. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.

        This structure is also known as a XLUnicodeString.

      • readUnicodeString

        public static java.lang.String readUnicodeString​(LittleEndianInput in,
                                                         int nChars)
        InputStream in is expected to contain:
        1. byte is16BitFlag
        2. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.
        This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, readUnicodeString(LittleEndianInput) can be used.
      • writeUnicodeString

        public static void writeUnicodeString​(LittleEndianOutput out,
                                              java.lang.String value)
        OutputStream out will get:
        1. ushort nChars
        2. byte is16BitFlag
        3. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.
      • writeUnicodeStringFlagAndData

        public static void writeUnicodeStringFlagAndData​(LittleEndianOutput out,
                                                         java.lang.String value)
        OutputStream out will get:
        1. byte is16BitFlag
        2. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.
        This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, writeUnicodeString(LittleEndianOutput, String) can be used.
      • putCompressedUnicode

        public static void putCompressedUnicode​(java.lang.String input,
                                                byte[] output,
                                                int offset)
        Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)
        Parameters:
        input - the String containing the data to be written
        output - the byte array to which the data is to be written
        offset - an offset into the byte arrat at which the data is start when written
      • putCompressedUnicode

        public static void putCompressedUnicode​(java.lang.String input,
                                                LittleEndianOutput out)
      • putUnicodeLE

        public static void putUnicodeLE​(java.lang.String input,
                                        byte[] output,
                                        int offset)
        Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)
        Parameters:
        input - the String containing the unicode data to be written
        output - the byte array to hold the uncompressed unicode, should be twice the length of the String
        offset - the offset to start writing into the byte array
      • putUnicodeLE

        public static void putUnicodeLE​(java.lang.String input,
                                        LittleEndianOutput out)
      • readUnicodeLE

        public static java.lang.String readUnicodeLE​(LittleEndianInput in,
                                                     int nChars)
      • getPreferredEncoding

        public static java.lang.String getPreferredEncoding()
        Returns:
        the encoding we want to use, currently hardcoded to ISO-8859-1
      • hasMultibyte

        public static boolean hasMultibyte​(java.lang.String value)
        check the parameter has multibyte character
        Parameters:
        value - string to check
        Returns:
        boolean result true:string has at least one multibyte character
      • isUnicodeString

        public static boolean isUnicodeString​(java.lang.String value)
        Checks to see if a given String needs to be represented as Unicode
        Parameters:
        value - The string to look at.
        Returns:
        true if string needs Unicode to be represented.
      • startsWithIgnoreCase

        public static boolean startsWithIgnoreCase​(java.lang.String haystack,
                                                   java.lang.String prefix)
        Tests if the string starts with the specified prefix, ignoring case consideration.
      • endsWithIgnoreCase

        public static boolean endsWithIgnoreCase​(java.lang.String haystack,
                                                 java.lang.String suffix)
        Tests if the string ends with the specified suffix, ignoring case consideration.
      • toLowerCase

        @Internal
        public static java.lang.String toLowerCase​(char c)
      • toUpperCase

        @Internal
        public static java.lang.String toUpperCase​(char c)
      • isUpperCase

        @Internal
        public static boolean isUpperCase​(char c)
      • mapMsCodepoint

        public static void mapMsCodepoint​(int msCodepoint,
                                          int unicodeCodepoint)
      • join

        @Internal
        public static java.lang.String join​(java.lang.Object[] array,
                                            java.lang.String separator)
      • join

        @Internal
        public static java.lang.String join​(java.lang.Object[] array)
      • join

        @Internal
        public static java.lang.String join​(java.lang.String separator,
                                            java.lang.Object... array)
      • countMatches

        public static int countMatches​(java.lang.CharSequence haystack,
                                       char needle)
        Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches
        Parameters:
        haystack - the CharSequence to check, may be null
        needle - the character to count the quantity of
        Returns:
        the number of occurrences, 0 if the CharSequence is null
      • getFromUnicodeLE0Terminated

        public static java.lang.String getFromUnicodeLE0Terminated​(byte[] string,
                                                                   int offset,
                                                                   int len)
                                                            throws java.lang.ArrayIndexOutOfBoundsException,
                                                                   java.lang.IllegalArgumentException
        Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. Scans the byte array for two continous 0 bytes and returns the string before.

        #61881: there seem to be programs out there, which write the 0-termination also at the beginning of the string. Check if the next two bytes contain a valid ascii char and correct the _recdata with a '?' char

        Parameters:
        string - the byte array to be converted
        offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
        len - the max. length of the final string
        Returns:
        the converted string, never null.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
        java.lang.IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)