Class SimilarityBase

  • Direct Known Subclasses:
    DFRSimilarity, IBSimilarity, LMSimilarity

    public abstract class SimilarityBase
    extends Similarity
    A subclass of Similarity that provides a simplified API for its descendants. Subclasses are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, float, float) and toString() methods. Implementing explain(Explanation, BasicStats, int, float, float) is optional, inasmuch as SimilarityBase already provides a basic explanation of the score and the term frequency. However, implementers of a subclass are encouraged to include as much detail about the scoring method as possible.

    Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.

    • Constructor Detail

      • SimilarityBase

        public SimilarityBase()
        Sole constructor. (For invocation by subclass constructors, typically implicit.)
    • Method Detail

      • setDiscountOverlaps

        public void setDiscountOverlaps​(boolean v)
        Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
        See Also:
        computeNorm(org.apache.lucene.index.FieldInvertState)
      • getDiscountOverlaps

        public boolean getDiscountOverlaps()
        Returns true if overlap tokens are discounted from the document's length.
        See Also:
        setDiscountOverlaps(boolean)
      • computeWeight

        public final Similarity.SimWeight computeWeight​(float queryBoost,
                                                        CollectionStatistics collectionStats,
                                                        TermStatistics... termStats)
        Description copied from class: Similarity
        Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.
        Specified by:
        computeWeight in class Similarity
        Parameters:
        queryBoost - the query-time boost.
        collectionStats - collection-level statistics, such as the number of tokens in the collection.
        termStats - term-level statistics, such as the document frequency of a term across the collection.
        Returns:
        SimWeight object with the information this Similarity needs to score a query.
      • toString

        public abstract java.lang.String toString()
        Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
        Overrides:
        toString in class java.lang.Object
      • computeNorm

        public long computeNorm​(FieldInvertState state)
        Encodes the document length in the same way as TFIDFSimilarity.
        Specified by:
        computeNorm in class Similarity
        Parameters:
        state - current processing state for this field
        Returns:
        computed norm value
      • log2

        public static double log2​(double x)
        Returns the base two logarithm of x.