Class ToTextContentHandler

  • All Implemented Interfaces:
    org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler
    Direct Known Subclasses:
    ToXMLContentHandler

    public class ToTextContentHandler
    extends org.xml.sax.helpers.DefaultHandler
    SAX event handler that writes all character content out to a character stream. No escaping or other transformations are made on the character content.

    As of Tika 1.20, this handler ignores content within <script> and <style> tags.

    Since:
    Apache Tika 0.10
    • Constructor Summary

      Constructors 
      Constructor Description
      ToTextContentHandler()
      Creates a content handler that writes character events to an internal string buffer.
      ToTextContentHandler​(java.io.OutputStream stream)
      Creates a content handler that writes character events to the given output stream using the platform default encoding.
      ToTextContentHandler​(java.io.OutputStream stream, java.lang.String encoding)
      Creates a content handler that writes character events to the given output stream using the given encoding.
      ToTextContentHandler​(java.io.Writer writer)
      Creates a content handler that writes character events to the given writer.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void characters​(char[] ch, int start, int length)
      Writes the given characters to the given character stream.
      void endDocument()
      Flushes the character stream so that no characters are forgotten in internal buffers.
      void endElement​(java.lang.String uri, java.lang.String localName, java.lang.String qName)  
      void ignorableWhitespace​(char[] ch, int start, int length)
      Writes the given ignorable characters to the given character stream.
      void startElement​(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)  
      java.lang.String toString()
      Returns the contents of the internal string buffer where all the received characters have been collected.
      • Methods inherited from class org.xml.sax.helpers.DefaultHandler

        endPrefixMapping, error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • ToTextContentHandler

        public ToTextContentHandler​(java.io.Writer writer)
        Creates a content handler that writes character events to the given writer.
        Parameters:
        writer - writer
      • ToTextContentHandler

        public ToTextContentHandler​(java.io.OutputStream stream)
        Creates a content handler that writes character events to the given output stream using the platform default encoding.
        Parameters:
        stream - output stream
      • ToTextContentHandler

        public ToTextContentHandler​(java.io.OutputStream stream,
                                    java.lang.String encoding)
                             throws java.io.UnsupportedEncodingException
        Creates a content handler that writes character events to the given output stream using the given encoding.
        Parameters:
        stream - output stream
        encoding - output encoding
        Throws:
        java.io.UnsupportedEncodingException - if the encoding is unsupported
      • ToTextContentHandler

        public ToTextContentHandler()
        Creates a content handler that writes character events to an internal string buffer. Use the toString() method to access the collected character content.
    • Method Detail

      • characters

        public void characters​(char[] ch,
                               int start,
                               int length)
                        throws org.xml.sax.SAXException
        Writes the given characters to the given character stream.
        Specified by:
        characters in interface org.xml.sax.ContentHandler
        Overrides:
        characters in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • ignorableWhitespace

        public void ignorableWhitespace​(char[] ch,
                                        int start,
                                        int length)
                                 throws org.xml.sax.SAXException
        Writes the given ignorable characters to the given character stream. The default implementation simply forwards the call to the characters(char[], int, int) method.
        Specified by:
        ignorableWhitespace in interface org.xml.sax.ContentHandler
        Overrides:
        ignorableWhitespace in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • endDocument

        public void endDocument()
                         throws org.xml.sax.SAXException
        Flushes the character stream so that no characters are forgotten in internal buffers.
        Specified by:
        endDocument in interface org.xml.sax.ContentHandler
        Overrides:
        endDocument in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException - if the stream can not be flushed
        See Also:
        TIKA-179
      • startElement

        public void startElement​(java.lang.String uri,
                                 java.lang.String localName,
                                 java.lang.String qName,
                                 org.xml.sax.Attributes atts)
                          throws org.xml.sax.SAXException
        Specified by:
        startElement in interface org.xml.sax.ContentHandler
        Overrides:
        startElement in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • endElement

        public void endElement​(java.lang.String uri,
                               java.lang.String localName,
                               java.lang.String qName)
                        throws org.xml.sax.SAXException
        Specified by:
        endElement in interface org.xml.sax.ContentHandler
        Overrides:
        endElement in class org.xml.sax.helpers.DefaultHandler
        Throws:
        org.xml.sax.SAXException
      • toString

        public java.lang.String toString()
        Returns the contents of the internal string buffer where all the received characters have been collected. Only works when this object was constructed using the empty default constructor or by passing a StringWriter to the other constructor.
        Overrides:
        toString in class java.lang.Object