Class CommonTermsQuery

  • All Implemented Interfaces:
    java.lang.Cloneable

    public class CommonTermsQuery
    extends Query
    A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain BooleanQuery scorer mainly due to differences in the number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.

    CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.

    Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.

    • Constructor Detail

    • Method Detail

      • add

        public void add​(Term term)
        Adds a term to the CommonTermsQuery
        Parameters:
        term - the term to add
      • rewrite

        public Query rewrite​(IndexReader reader)
                      throws java.io.IOException
        Description copied from class: Query
        Expert: called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys.
        Overrides:
        rewrite in class Query
        Throws:
        java.io.IOException
      • collectTermContext

        public void collectTermContext​(IndexReader reader,
                                       java.util.List<AtomicReaderContext> leaves,
                                       TermContext[] contextArray,
                                       Term[] queryTerms)
                                throws java.io.IOException
        Throws:
        java.io.IOException
      • isCoordDisabled

        public boolean isCoordDisabled()
        Returns true iff Similarity.coord(int,int) is disabled in scoring for the high and low frequency query instance. The top level query will always disable coords.
      • setLowFreqMinimumNumberShouldMatch

        public void setLowFreqMinimumNumberShouldMatch​(float min)
        Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

        By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

        Parameters:
        min - the number of optional clauses that must match
      • getLowFreqMinimumNumberShouldMatch

        public float getLowFreqMinimumNumberShouldMatch()
        Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.
      • setHighFreqMinimumNumberShouldMatch

        public void setHighFreqMinimumNumberShouldMatch​(float min)
        Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

        By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

        Parameters:
        min - the number of optional clauses that must match
      • getHighFreqMinimumNumberShouldMatch

        public float getHighFreqMinimumNumberShouldMatch()
        Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.
      • extractTerms

        public void extractTerms​(java.util.Set<Term> terms)
        Description copied from class: Query
        Expert: adds all terms occurring in this query to the terms set. Only works if this query is in its rewritten form.
        Overrides:
        extractTerms in class Query
      • toString

        public java.lang.String toString​(java.lang.String field)
        Description copied from class: Query
        Prints a query to a string, with field assumed to be the default field and omitted.
        Specified by:
        toString in class Query
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Query
      • equals

        public boolean equals​(java.lang.Object obj)
        Overrides:
        equals in class Query