Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-906

Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names not being valid XML tag names

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1
    • 1.2
    • web gui
    • None
    • Debian GNU/Linux 64-bit

    • Patch Available

    Description

      The Nutch FAQ explains that OpenSearch includes "all fields that are available at search result time." However, some Lucene column names can start with numbers. Valid XML tags cannot. If Nutch is generating OpenSearch results for a document with a Lucene document column whose name starts with numbers, the underlying Xerces library throws this exception:

      org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.

      So I have written a patch that tests strings before they are used to generate tags within OpenSearch.

      I hope you merge this, or a better version of the patch!

      Attachments

        Activity

          People

            ab Andrzej Bialecki
            paulproteus Asheesh Laroia
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 20m
                20m
                Remaining:
                Remaining Estimate - 20m
                20m
                Logged:
                Time Spent - Not Specified
                Not Specified