Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-906

Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names not being valid XML tag names

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1
    • Fix Version/s: 1.2
    • Component/s: web gui
    • Labels:
      None
    • Environment:

      Debian GNU/Linux 64-bit

    • Patch Info:
      Patch Available

      Description

      The Nutch FAQ explains that OpenSearch includes "all fields that are available at search result time." However, some Lucene column names can start with numbers. Valid XML tags cannot. If Nutch is generating OpenSearch results for a document with a Lucene document column whose name starts with numbers, the underlying Xerces library throws this exception:

      org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.

      So I have written a patch that tests strings before they are used to generate tags within OpenSearch.

      I hope you merge this, or a better version of the patch!

        Attachments

          Activity

            People

            • Assignee:
              ab Andrzej Bialecki
              Reporter:
              paulproteus Asheesh Laroia
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 20m
                20m
                Remaining:
                Remaining Estimate - 20m
                20m
                Logged:
                Time Spent - Not Specified
                Not Specified