Class OfficeParser

    • Constructor Detail

      • OfficeParser

        public OfficeParser()
    • Method Detail

      • getSupportedTypes

        public java.util.Set<MediaType> getSupportedTypes​(ParseContext context)
        Description copied from interface: Parser
        Returns the set of media types supported by this parser when used with the given parse context.
        Parameters:
        context - parse context
        Returns:
        immutable set of media types
      • parse

        public void parse​(java.io.InputStream stream,
                          org.xml.sax.ContentHandler handler,
                          Metadata metadata,
                          ParseContext context)
                   throws java.io.IOException,
                          org.xml.sax.SAXException,
                          TikaException
        Extracts properties and text from an MS Document input stream
        Parameters:
        stream - the document stream (input)
        handler - handler for the XHTML SAX events (output)
        metadata - document metadata (input and output)
        context - parse context
        Throws:
        java.io.IOException - if the document stream could not be read
        org.xml.sax.SAXException - if the SAX events could not be processed
        TikaException - if the document could not be parsed
      • extractMacros

        public static void extractMacros​(POIFSFileSystem fs,
                                         org.xml.sax.ContentHandler xhtml,
                                         EmbeddedDocumentExtractor embeddedDocumentExtractor)
                                  throws java.io.IOException,
                                         org.xml.sax.SAXException
        Helper to extract macros from an NPOIFS/vbaProject.bin As of POI-3.15-final, there are still some bugs in VBAMacroReader. For now, we are swallowing NPE and other runtime exceptions
        Parameters:
        fs - NPOIFS to extract from
        xhtml - SAX writer
        embeddedDocumentExtractor - extractor for embedded documents
        Throws:
        java.io.IOException - on IOException if it occurs during the extraction of the embedded doc
        org.xml.sax.SAXException - on SAXException for writing to xhtml