Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-224

Missing body in HtmlParser

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      The html is: <html><body>tika</body></html>
      The result is: <?xml version="1.0" encoding="UTF-8"?>tika<html xmlns="http://www.w3.org/1999/xhtml"><head><title/></head><body/></html>

      The parser ignores the texts before the first occurence of <li>, <a>, <ul>... tags within the body, .

      Attachments

        Activity

          People

            jukkaz Jukka Zitting
            szetamas Tamas Szendrei
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: