Class XHTMLContentHandler

  • All Implemented Interfaces:
    org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

    public class XHTMLContentHandler
    extends SafeContentHandler
    Content handler decorator that simplifies the task of producing XHTML events for Tika content parsers.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.util.Set<java.lang.String> ENDLINE
      The elements that get appended with the NL character.
      static java.lang.String XHTML
      The XHTML namespace URI
    • Constructor Summary

      Constructors 
      Constructor Description
      XHTMLContentHandler​(org.xml.sax.ContentHandler handler, Metadata metadata)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void characters​(char[] ch, int start, int length)  
      void characters​(java.lang.String characters)  
      void element​(java.lang.String name, java.lang.String value)
      Emits an XHTML element with the given text content.
      void endDocument()
      Ends the XHTML document by writing the following footer and clearing the namespace mappings:
      void endElement​(java.lang.String name)  
      void endElement​(java.lang.String uri, java.lang.String local, java.lang.String name)
      Ends the given element.
      void newline()  
      void startDocument()
      Starts an XHTML document by setting up the namespace mappings when called for the first time.
      void startElement​(java.lang.String name)  
      void startElement​(java.lang.String name, java.lang.String attribute, java.lang.String value)  
      void startElement​(java.lang.String uri, java.lang.String local, java.lang.String name, org.xml.sax.Attributes attributes)
      Starts the given element.
      void startElement​(java.lang.String name, org.xml.sax.helpers.AttributesImpl attributes)  
      • Methods inherited from class org.xml.sax.helpers.DefaultHandler

        error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • XHTML

        public static final java.lang.String XHTML
        The XHTML namespace URI
        See Also:
        Constant Field Values
      • ENDLINE

        public static final java.util.Set<java.lang.String> ENDLINE
        The elements that get appended with the NL character.
    • Constructor Detail

      • XHTMLContentHandler

        public XHTMLContentHandler​(org.xml.sax.ContentHandler handler,
                                   Metadata metadata)
    • Method Detail

      • startDocument

        public void startDocument()
                           throws org.xml.sax.SAXException
        Starts an XHTML document by setting up the namespace mappings when called for the first time. The standard XHTML prefix is generated lazily when the first element is started.
        Specified by:
        startDocument in interface org.xml.sax.ContentHandler
        Overrides:
        startDocument in class ContentHandlerDecorator
        Throws:
        org.xml.sax.SAXException
      • endDocument

        public void endDocument()
                         throws org.xml.sax.SAXException
        Ends the XHTML document by writing the following footer and clearing the namespace mappings:
           </body>
         </html>
         
        Specified by:
        endDocument in interface org.xml.sax.ContentHandler
        Overrides:
        endDocument in class SafeContentHandler
        Throws:
        org.xml.sax.SAXException
      • startElement

        public void startElement​(java.lang.String uri,
                                 java.lang.String local,
                                 java.lang.String name,
                                 org.xml.sax.Attributes attributes)
                          throws org.xml.sax.SAXException
        Starts the given element. Table cells and list items are automatically indented by emitting a tab character as ignorable whitespace.
        Specified by:
        startElement in interface org.xml.sax.ContentHandler
        Overrides:
        startElement in class SafeContentHandler
        Throws:
        org.xml.sax.SAXException
      • endElement

        public void endElement​(java.lang.String uri,
                               java.lang.String local,
                               java.lang.String name)
                        throws org.xml.sax.SAXException
        Ends the given element. Block elements are automatically followed by a newline character.
        Specified by:
        endElement in interface org.xml.sax.ContentHandler
        Overrides:
        endElement in class SafeContentHandler
        Throws:
        org.xml.sax.SAXException
      • characters

        public void characters​(char[] ch,
                               int start,
                               int length)
                        throws org.xml.sax.SAXException
        Specified by:
        characters in interface org.xml.sax.ContentHandler
        Overrides:
        characters in class SafeContentHandler
        Throws:
        org.xml.sax.SAXException
        See Also:
        TIKA-210
      • startElement

        public void startElement​(java.lang.String name)
                          throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • startElement

        public void startElement​(java.lang.String name,
                                 java.lang.String attribute,
                                 java.lang.String value)
                          throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • startElement

        public void startElement​(java.lang.String name,
                                 org.xml.sax.helpers.AttributesImpl attributes)
                          throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • endElement

        public void endElement​(java.lang.String name)
                        throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • characters

        public void characters​(java.lang.String characters)
                        throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • newline

        public void newline()
                     throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • element

        public void element​(java.lang.String name,
                            java.lang.String value)
                     throws org.xml.sax.SAXException
        Emits an XHTML element with the given text content. If the given text value is null or empty, then the element is not written.
        Parameters:
        name - XHTML element name
        value - element value, possibly null
        Throws:
        org.xml.sax.SAXException - if the content element could not be written