Class CompositeParser

  • All Implemented Interfaces:
    java.io.Serializable, Parser
    Direct Known Subclasses:
    AutoDetectParser, CompositeExternalParser, DefaultParser

    public class CompositeParser
    extends AbstractParser
    Composite parser that delegates parsing tasks to a component parser based on the declared content type of the incoming document. A fallback parser is defined for cases where a parser for the given content type is not available.
    See Also:
    Serialized Form
    • Constructor Detail

      • CompositeParser

        public CompositeParser​(MediaTypeRegistry registry,
                               java.util.List<Parser> parsers,
                               java.util.Collection<java.lang.Class<? extends Parser>> excludeParsers)
      • CompositeParser

        public CompositeParser()
    • Method Detail

      • findDuplicateParsers

        public java.util.Map<MediaType,​java.util.List<Parser>> findDuplicateParsers​(ParseContext context)
        Utility method that goes through all the component parsers and finds all media types for which more than one parser declares support. This is useful in tracking down conflicting parser definitions.
        Parameters:
        context - parsing context
        Returns:
        media types that are supported by at least two component parsers
        Since:
        Apache Tika 0.10
        See Also:
        TIKA-660
      • getMediaTypeRegistry

        public MediaTypeRegistry getMediaTypeRegistry()
        Returns the media type registry used to infer type relationships.
        Returns:
        media type registry
        Since:
        Apache Tika 0.8
      • setMediaTypeRegistry

        public void setMediaTypeRegistry​(MediaTypeRegistry registry)
        Sets the media type registry used to infer type relationships.
        Parameters:
        registry - media type registry
        Since:
        Apache Tika 0.8
      • getAllComponentParsers

        public java.util.List<Parser> getAllComponentParsers()
        Returns all parsers registered with the Composite Parser, including ones which may not currently be active. This won't include the Fallback Parser, if defined
      • getParsers

        public java.util.Map<MediaType,​Parser> getParsers()
        Returns the component parsers.
        Returns:
        component parsers, keyed by media type
      • setParsers

        public void setParsers​(java.util.Map<MediaType,​Parser> parsers)
        Sets the component parsers.
        Parameters:
        parsers - component parsers, keyed by media type
      • getFallback

        public Parser getFallback()
        Returns the fallback parser.
        Returns:
        fallback parser
      • setFallback

        public void setFallback​(Parser fallback)
        Sets the fallback parser.
        Parameters:
        fallback - fallback parser
      • getSupportedTypes

        public java.util.Set<MediaType> getSupportedTypes​(ParseContext context)
        Description copied from interface: Parser
        Returns the set of media types supported by this parser when used with the given parse context.
        Parameters:
        context - parse context
        Returns:
        immutable set of media types
      • parse

        public void parse​(java.io.InputStream stream,
                          org.xml.sax.ContentHandler handler,
                          Metadata metadata,
                          ParseContext context)
                   throws java.io.IOException,
                          org.xml.sax.SAXException,
                          TikaException
        Delegates the call to the matching component parser.

        Potential RuntimeExceptions, IOExceptions and SAXExceptions unrelated to the given input stream and content handler are automatically wrapped into TikaExceptions to better honor the Parser contract.

        Parameters:
        stream - the document stream (input)
        handler - handler for the XHTML SAX events (output)
        metadata - document metadata (input and output)
        context - parse context
        Throws:
        java.io.IOException - if the document stream could not be read
        org.xml.sax.SAXException - if the SAX events could not be processed
        TikaException - if the document could not be parsed