Class LanguageProfilerBuilder


  • @Deprecated
    public class LanguageProfilerBuilder
    extends java.lang.Object
    Deprecated.
    This class runs a ngram analysis over submitted text, results might be used for automatic language identification. The similarity calculation is at experimental level. You have been warned. Methods are provided to build new NGramProfiles profiles.
    • Constructor Summary

      Constructors 
      Constructor Description
      LanguageProfilerBuilder​(java.lang.String name)
      Deprecated.
      Constructs a new ngram profile where minlen=3, maxlen=3
      LanguageProfilerBuilder​(java.lang.String name, int minlen, int maxlen)
      Deprecated.
      Constructs a new ngram profile
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      void add​(java.lang.StringBuffer word)
      Deprecated.
      Adds ngrams from a single word to this profile
      void analyze​(java.lang.StringBuilder text)
      Deprecated.
      Analyzes a piece of text
      static LanguageProfilerBuilder create​(java.lang.String name, java.io.InputStream is, java.lang.String encoding)
      Deprecated.
      Creates a new Language profile from (preferably quite large - 5-10k of lines) text file
      java.lang.String getName()
      Deprecated.
       
      float getSimilarity​(LanguageProfilerBuilder another)
      Deprecated.
      Calculates a score how well NGramProfiles match each other
      java.util.List<org.apache.tika.language.LanguageProfilerBuilder.NGramEntry> getSorted()
      Deprecated.
      Returns a sorted list of ngrams (sort done by 1.
      void load​(java.io.InputStream is)
      Deprecated.
      Loads a ngram profile from an InputStream (assumes UTF-8 encoded content)
      static void main​(java.lang.String[] args)
      Deprecated.
      main method used for testing only
      void save​(java.io.OutputStream os)
      Deprecated.
      Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encoding
      java.lang.String toString()
      Deprecated.
       
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • LanguageProfilerBuilder

        public LanguageProfilerBuilder​(java.lang.String name,
                                       int minlen,
                                       int maxlen)
        Deprecated.
        Constructs a new ngram profile
        Parameters:
        name - is the name of the profile
        minlen - is the min length of ngram sequences
        maxlen - is the max length of ngram sequences
      • LanguageProfilerBuilder

        public LanguageProfilerBuilder​(java.lang.String name)
        Deprecated.
        Constructs a new ngram profile where minlen=3, maxlen=3
        Parameters:
        name - is a name of profile, usually two length string
        Since:
        Tika 1.0
    • Method Detail

      • getName

        public java.lang.String getName()
        Deprecated.
        Returns:
        Returns the name.
      • add

        public void add​(java.lang.StringBuffer word)
        Deprecated.
        Adds ngrams from a single word to this profile
        Parameters:
        word - is the word to add
      • analyze

        public void analyze​(java.lang.StringBuilder text)
        Deprecated.
        Analyzes a piece of text
        Parameters:
        text - the text to be analyzed
      • getSorted

        public java.util.List<org.apache.tika.language.LanguageProfilerBuilder.NGramEntry> getSorted()
        Deprecated.
        Returns a sorted list of ngrams (sort done by 1. frequency 2. sequence)
        Returns:
        sorted vector of ngrams
      • toString

        public java.lang.String toString()
        Deprecated.
        Overrides:
        toString in class java.lang.Object
      • getSimilarity

        public float getSimilarity​(LanguageProfilerBuilder another)
                            throws TikaException
        Deprecated.
        Calculates a score how well NGramProfiles match each other
        Parameters:
        another - ngram profile to compare against
        Returns:
        similarity 0=exact match
        Throws:
        TikaException - if could not calculate a score
      • load

        public void load​(java.io.InputStream is)
                  throws java.io.IOException
        Deprecated.
        Loads a ngram profile from an InputStream (assumes UTF-8 encoded content)
        Parameters:
        is - the InputStream to read
        Throws:
        java.io.IOException
      • create

        public static LanguageProfilerBuilder create​(java.lang.String name,
                                                     java.io.InputStream is,
                                                     java.lang.String encoding)
                                              throws TikaException
        Deprecated.
        Creates a new Language profile from (preferably quite large - 5-10k of lines) text file
        Parameters:
        name - to be given for the profile
        is - a stream to be read
        encoding - is the encoding of stream
        Throws:
        TikaException - if could not create a language profile
      • save

        public void save​(java.io.OutputStream os)
                  throws java.io.IOException
        Deprecated.
        Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encoding
        Parameters:
        os - the Stream to output to
        Throws:
        java.io.IOException
      • main

        public static void main​(java.lang.String[] args)
        Deprecated.
        main method used for testing only
        Parameters:
        args -