Class PersianAnalyzer

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class PersianAnalyzer
    extends StopwordAnalyzerBase
    Analyzer for Persian.

    This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.

    • Field Detail

      • DEFAULT_STOPWORD_FILE

        public static final java.lang.String DEFAULT_STOPWORD_FILE
        File containing default Persian stopwords. Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html The stopword list is BSD-Licensed.
        See Also:
        Constant Field Values
      • STOPWORDS_COMMENT

        public static final java.lang.String STOPWORDS_COMMENT
        The comment character in the stopwords file. All lines prefixed with this will be ignored
        See Also:
        Constant Field Values
    • Constructor Detail

      • PersianAnalyzer

        public PersianAnalyzer​(Version matchVersion,
                               CharArraySet stopwords)
        Builds an analyzer with the given stop words
        Parameters:
        matchVersion - lucene compatibility version
        stopwords - a stopword set
    • Method Detail

      • getDefaultStopSet

        public static CharArraySet getDefaultStopSet()
        Returns an unmodifiable instance of the default stop-words set.
        Returns:
        an unmodifiable instance of the default stop-words set.