Class TokenizerFactory

    • Constructor Detail

      • TokenizerFactory

        public TokenizerFactory()
        Creates a TokenizerFactory that provides the default implementation of the resources.
      • TokenizerFactory

        public TokenizerFactory​(java.lang.String languageCode,
                                Dictionary abbreviationDictionary,
                                boolean useAlphaNumericOptimization,
                                java.util.regex.Pattern alphaNumericPattern)
        Creates a TokenizerFactory. Use this constructor to programmatically create a factory.
        Parameters:
        languageCode - the language of the natural text
        abbreviationDictionary - an abbreviations dictionary
        useAlphaNumericOptimization - if true alpha numerics are skipped
        alphaNumericPattern - null or a custom alphanumeric pattern (default is: "^[A-Za-z0-9]+$", provided by Factory.DEFAULT_ALPHANUMERIC
    • Method Detail

      • createArtifactMap

        public java.util.Map<java.lang.String,​java.lang.Object> createArtifactMap()
        Description copied from class: BaseToolFactory
        Creates a Map with pairs of keys and objects. The models implementation should call this constructor that creates a model programmatically.

        The base implementation will return a HashMap that should be populated by sub-classes.

        Overrides:
        createArtifactMap in class BaseToolFactory
      • createManifestEntries

        public java.util.Map<java.lang.String,​java.lang.String> createManifestEntries()
        Description copied from class: BaseToolFactory
        Creates the manifest entries that will be added to the model manifest
        Overrides:
        createManifestEntries in class BaseToolFactory
        Returns:
        the manifest entries to added to the model manifest
      • create

        public static TokenizerFactory create​(java.lang.String subclassName,
                                              java.lang.String languageCode,
                                              Dictionary abbreviationDictionary,
                                              boolean useAlphaNumericOptimization,
                                              java.util.regex.Pattern alphaNumericPattern)
                                       throws InvalidFormatException
        Factory method the framework uses create a new TokenizerFactory.
        Parameters:
        subclassName - the name of the class implementing the TokenizerFactory
        languageCode - the language code the tokenizer should use
        abbreviationDictionary - an optional dictionary containing abbreviations, or null if not present
        useAlphaNumericOptimization - indicate if the alpha numeric optimization should be enabled or disabled
        alphaNumericPattern - the pattern the alpha numeric optimization should use
        Returns:
        the instance of the Tokenizer Factory
        Throws:
        InvalidFormatException - if once of the input parameters doesn't comply if the expected format
      • getAlphaNumericPattern

        public java.util.regex.Pattern getAlphaNumericPattern()
        Gets the alpha numeric pattern.
        Returns:
        the user specified alpha numeric pattern or a default.
      • isUseAlphaNumericOptmization

        public boolean isUseAlphaNumericOptmization()
        Gets whether to use alphanumeric optimization.
        Returns:
        true if the alpha numeric optimization is enabled, otherwise false
      • getAbbreviationDictionary

        public Dictionary getAbbreviationDictionary()
        Gets the abbreviation dictionary
        Returns:
        null or the abbreviation dictionary
      • getLanguageCode

        public java.lang.String getLanguageCode()
        Retrieves the language code.
        Returns:
        the language code
      • getContextGenerator

        public TokenContextGenerator getContextGenerator()
        Gets the context generator
        Returns:
        a new instance of the context generator