Class WhitespaceTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public class WhitespaceTokenizer
    extends java.lang.Object
    This tokenizer uses white spaces to tokenize the input text. To obtain an instance of this tokenizer use the static final INSTANCE field.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String[] tokenize​(java.lang.String s)
      Splits a string into its atomic parts
      Span[] tokenizePos​(java.lang.String d)
      Finds the boundaries of atomic parts in a string.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • tokenizePos

        public Span[] tokenizePos​(java.lang.String d)
        Description copied from interface: Tokenizer
        Finds the boundaries of atomic parts in a string.
        Parameters:
        d - The string to be tokenized.
        Returns:
        The Span[] with the spans (offsets into s) for each token as the individuals array elements.
      • tokenize

        public java.lang.String[] tokenize​(java.lang.String s)
        Description copied from interface: Tokenizer
        Splits a string into its atomic parts
        Specified by:
        tokenize in interface Tokenizer
        Parameters:
        s - The string to be tokenized.
        Returns:
        The String[] with the individual tokens as the array elements.