Class BoilerpipeContentHandler

    • Constructor Summary

      Constructors 
      Constructor Description
      BoilerpipeContentHandler​(java.io.Writer writer)
      Creates a content handler that writes XHTML body character events to the given writer.
      BoilerpipeContentHandler​(org.xml.sax.ContentHandler delegate)
      Creates a new boilerpipe-based content extractor, using the DefaultExtractor extraction rules and "delegate" as the content handler.
      BoilerpipeContentHandler​(org.xml.sax.ContentHandler delegate, de.l3s.boilerpipe.BoilerpipeExtractor extractor)
      Creates a new boilerpipe-based content extractor, using the given extraction rules.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void characters​(char[] chars, int offset, int length)  
      void endDocument()  
      void endElement​(java.lang.String uri, java.lang.String localName, java.lang.String qName)  
      de.l3s.boilerpipe.document.TextDocument getTextDocument()
      Retrieves the built TextDocument
      boolean isIncludeMarkup()  
      void setIncludeMarkup​(boolean includeMarkup)  
      void startDocument()  
      void startElement​(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)  
      void startPrefixMapping​(java.lang.String prefix, java.lang.String uri)  
      • Methods inherited from class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler

        addWhitespaceIfNecessary, endPrefixMapping, getTitle, ignorableWhitespace, processingInstruction, recycle, setDocumentLocator, setTitle, skippedEntity, toTextDocument
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • BoilerpipeContentHandler

        public BoilerpipeContentHandler​(org.xml.sax.ContentHandler delegate)
        Creates a new boilerpipe-based content extractor, using the DefaultExtractor extraction rules and "delegate" as the content handler.
        Parameters:
        delegate - The ContentHandler object
      • BoilerpipeContentHandler

        public BoilerpipeContentHandler​(java.io.Writer writer)
        Creates a content handler that writes XHTML body character events to the given writer.
        Parameters:
        writer - writer
      • BoilerpipeContentHandler

        public BoilerpipeContentHandler​(org.xml.sax.ContentHandler delegate,
                                        de.l3s.boilerpipe.BoilerpipeExtractor extractor)
        Creates a new boilerpipe-based content extractor, using the given extraction rules. The extracted main content will be passed to the content handler.
        Parameters:
        delegate - The ContentHandler object
        extractor - Extraction rules to use, e.g. ArticleExtractor
    • Method Detail

      • isIncludeMarkup

        public boolean isIncludeMarkup()
      • setIncludeMarkup

        public void setIncludeMarkup​(boolean includeMarkup)
      • getTextDocument

        public de.l3s.boilerpipe.document.TextDocument getTextDocument()
        Retrieves the built TextDocument
        Returns:
        TextDocument
      • startDocument

        public void startDocument()
                           throws org.xml.sax.SAXException
        Specified by:
        startDocument in interface org.xml.sax.ContentHandler
        Overrides:
        startDocument in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
        Throws:
        org.xml.sax.SAXException
      • startPrefixMapping

        public void startPrefixMapping​(java.lang.String prefix,
                                       java.lang.String uri)
                                throws org.xml.sax.SAXException
        Specified by:
        startPrefixMapping in interface org.xml.sax.ContentHandler
        Overrides:
        startPrefixMapping in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
        Throws:
        org.xml.sax.SAXException
      • startElement

        public void startElement​(java.lang.String uri,
                                 java.lang.String localName,
                                 java.lang.String qName,
                                 org.xml.sax.Attributes atts)
                          throws org.xml.sax.SAXException
        Specified by:
        startElement in interface org.xml.sax.ContentHandler
        Overrides:
        startElement in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
        Throws:
        org.xml.sax.SAXException
      • characters

        public void characters​(char[] chars,
                               int offset,
                               int length)
                        throws org.xml.sax.SAXException
        Specified by:
        characters in interface org.xml.sax.ContentHandler
        Overrides:
        characters in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
        Throws:
        org.xml.sax.SAXException
      • endElement

        public void endElement​(java.lang.String uri,
                               java.lang.String localName,
                               java.lang.String qName)
                        throws org.xml.sax.SAXException
        Specified by:
        endElement in interface org.xml.sax.ContentHandler
        Overrides:
        endElement in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
        Throws:
        org.xml.sax.SAXException
      • endDocument

        public void endDocument()
                         throws org.xml.sax.SAXException
        Specified by:
        endDocument in interface org.xml.sax.ContentHandler
        Overrides:
        endDocument in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
        Throws:
        org.xml.sax.SAXException