Class AutoDetectParser

    • Constructor Detail

      • AutoDetectParser

        public AutoDetectParser()
        Creates an auto-detecting parser instance using the default Tika configuration.
      • AutoDetectParser

        public AutoDetectParser​(Detector detector)
      • AutoDetectParser

        public AutoDetectParser​(Parser... parsers)
        Creates an auto-detecting parser instance using the specified set of parser. This allows one to create a Tika configuration where only a subset of the available parsers have their 3rd party jars included, as otherwise the use of the default TikaConfig will throw various "ClassNotFound" exceptions.
        Parameters:
        parsers -
      • AutoDetectParser

        public AutoDetectParser​(Detector detector,
                                Parser... parsers)
      • AutoDetectParser

        public AutoDetectParser​(TikaConfig config)
    • Method Detail

      • getDetector

        public Detector getDetector()
        Returns the type detector used by this parser to auto-detect the type of a document.
        Returns:
        type detector
        Since:
        Apache Tika 0.4
      • setDetector

        public void setDetector​(Detector detector)
        Sets the type detector used by this parser to auto-detect the type of a document.
        Parameters:
        detector - type detector
        Since:
        Apache Tika 0.4
      • parse

        public void parse​(java.io.InputStream stream,
                          org.xml.sax.ContentHandler handler,
                          Metadata metadata,
                          ParseContext context)
                   throws java.io.IOException,
                          org.xml.sax.SAXException,
                          TikaException
        Description copied from class: CompositeParser
        Delegates the call to the matching component parser.

        Potential RuntimeExceptions, IOExceptions and SAXExceptions unrelated to the given input stream and content handler are automatically wrapped into TikaExceptions to better honor the Parser contract.

        Specified by:
        parse in interface Parser
        Overrides:
        parse in class CompositeParser
        Parameters:
        stream - the document stream (input)
        handler - handler for the XHTML SAX events (output)
        metadata - document metadata (input and output)
        context - parse context
        Throws:
        java.io.IOException - if the document stream could not be read
        org.xml.sax.SAXException - if the SAX events could not be processed
        TikaException - if the document could not be parsed