Class StandardAnalyzer

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class StandardAnalyzer
    extends StopwordAnalyzerBase
    Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

    You must specify the required Version compatibility when creating StandardAnalyzer:

    • As of 3.4, Hiragana and Han characters are no longer wrongly split from their combining characters. If you use a previous version number, you get the exact broken behavior for backwards compatibility.
    • As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords. ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
    • As of 2.9, StopFilter preserves position increments
    • As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1068)
    • Field Detail

      • DEFAULT_MAX_TOKEN_LENGTH

        public static final int DEFAULT_MAX_TOKEN_LENGTH
        Default maximum allowed token length
        See Also:
        Constant Field Values
      • STOP_WORDS_SET

        public static final CharArraySet STOP_WORDS_SET
        An unmodifiable set containing some common English words that are usually not useful for searching.
    • Constructor Detail

      • StandardAnalyzer

        public StandardAnalyzer​(Version matchVersion,
                                CharArraySet stopWords)
        Builds an analyzer with the given stop words.
        Parameters:
        matchVersion - Lucene version to match See {@link above}
        stopWords - stop words
      • StandardAnalyzer

        public StandardAnalyzer​(Version matchVersion)
        Builds an analyzer with the default stop words (STOP_WORDS_SET).
        Parameters:
        matchVersion - Lucene version to match See {@link above}
      • StandardAnalyzer

        public StandardAnalyzer​(Version matchVersion,
                                java.io.Reader stopwords)
                         throws java.io.IOException
        Builds an analyzer with the stop words from the given reader.
        Parameters:
        matchVersion - Lucene version to match See {@link above}
        stopwords - Reader to read stop words from
        Throws:
        java.io.IOException
        See Also:
        WordlistLoader.getWordSet(Reader, Version)
    • Method Detail

      • setMaxTokenLength

        public void setMaxTokenLength​(int length)
        Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.