Class HTMLScanner

  • All Implemented Interfaces:
    Scanner, org.xml.sax.Locator

    public class HTMLScanner
    extends java.lang.Object
    implements Scanner, org.xml.sax.Locator
    This class implements a table-driven scanner for HTML, allowing for lots of defects. It implements the Scanner interface, which accepts a Reader object to fetch characters from and a ScanHandler object to report lexical events to.
    • Constructor Summary

      Constructors 
      Constructor Description
      HTMLScanner()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int getColumnNumber()  
      int getLineNumber()  
      java.lang.String getPublicId()  
      java.lang.String getSystemId()  
      static void main​(java.lang.String[] argv)
      Test procedure.
      void resetDocumentLocator​(java.lang.String publicid, java.lang.String systemid)
      Reset document locator, supplying systemid and publicid.
      void scan​(java.io.Reader r0, ScanHandler h)
      Scan HTML source, reporting lexical events.
      void startCDATA()
      A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • HTMLScanner

        public HTMLScanner()
    • Method Detail

      • getLineNumber

        public int getLineNumber()
        Specified by:
        getLineNumber in interface org.xml.sax.Locator
      • getColumnNumber

        public int getColumnNumber()
        Specified by:
        getColumnNumber in interface org.xml.sax.Locator
      • getPublicId

        public java.lang.String getPublicId()
        Specified by:
        getPublicId in interface org.xml.sax.Locator
      • getSystemId

        public java.lang.String getSystemId()
        Specified by:
        getSystemId in interface org.xml.sax.Locator
      • resetDocumentLocator

        public void resetDocumentLocator​(java.lang.String publicid,
                                         java.lang.String systemid)
        Reset document locator, supplying systemid and publicid.
        Specified by:
        resetDocumentLocator in interface Scanner
        Parameters:
        systemid - System id
        publicid - Public id
      • scan

        public void scan​(java.io.Reader r0,
                         ScanHandler h)
                  throws java.io.IOException,
                         org.xml.sax.SAXException
        Scan HTML source, reporting lexical events.
        Specified by:
        scan in interface Scanner
        Parameters:
        r0 - Reader that provides characters
        h - ScanHandler that accepts lexical events.
        Throws:
        java.io.IOException
        org.xml.sax.SAXException
      • startCDATA

        public void startCDATA()
        A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.
        Specified by:
        startCDATA in interface Scanner
      • main

        public static void main​(java.lang.String[] argv)
                         throws java.io.IOException,
                                org.xml.sax.SAXException
        Test procedure. Reads HTML from the standard input and writes PYX to the standard output.
        Throws:
        java.io.IOException
        org.xml.sax.SAXException