PatternTokenizerFactory (The Adobe Experience Manager SDK 2022.11.9850.20221116T162329Z-220900)

java.lang.Object
- org.apache.lucene.analysis.util.AbstractAnalysisFactory
- - org.apache.lucene.analysis.util.TokenizerFactory
  - - org.apache.lucene.analysis.pattern.PatternTokenizerFactory

```
public class PatternTokenizerFactory
extends TokenizerFactory
```
Factory for PatternTokenizer. This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens): String.split(java.lang.String)

Using group >= 0 selects the matching group as the token. For example, if you have:
```
  pattern = \'([^\']+)\'
  group = 0
  input = aaa 'bbb' 'ccc'
 
```
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)

NOTE: This Tokenizer does not output tokens that are of zero length.
```
 <fieldType name="text_ptn" class="solr.TextField" positionIncrementGap="100">
   <analyzer>
     <tokenizer class="solr.PatternTokenizerFactory" pattern="\'([^\']+)\'" group="1"/>
   </analyzer>
 </fieldType>
```
Since:

solr1.2

See Also:

PatternTokenizer

Field Summary

Fields
Modifier and Type Field Description

static java.lang.String GROUP

static java.lang.String PATTERN
- Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
  LUCENE_MATCH_VERSION_PARAM

Constructor Summary

Constructors
Constructor Description

PatternTokenizerFactory(java.util.Map<java.lang.String,java.lang.String> args)
Creates a new PatternTokenizerFactory

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`PatternTokenizer`	`create(AttributeSource.AttributeFactory factory, java.io.Reader in)`	Split the input using configured pattern

Methods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
availableTokenizers, create, forName, lookupClass, reloadTokenizers

Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getChar, getClassArg, getLuceneMatchVersion, getOriginalArgs, getSet, isExplicitLuceneMatchVersion, require, require, require, requireChar, setExplicitLuceneMatchVersion

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

PATTERN

public static final java.lang.String PATTERN

See Also:: Constant Field Values

GROUP

public static final java.lang.String GROUP

See Also:: Constant Field Values

Constructor Detail

PatternTokenizerFactory

public PatternTokenizerFactory(java.util.Map<java.lang.String,java.lang.String> args)

Creates a new PatternTokenizerFactory

Method Detail

create

public PatternTokenizer create(AttributeSource.AttributeFactory factory,
                               java.io.Reader in)

Split the input using configured pattern

Specified by:: create in class TokenizerFactory