Solr
  1. Solr
  2. SOLR-2960

XPathEntityProcessor does not clear nulls from empty multi-valued fields

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7
    • Labels:
      None

      Description

      I can't confidently say I completeley understand all that these classes so boldy tackle (that is, XPathEntityProcessor and XPathRecordReader) , but there may be someone who does. Nonetheless, I think I've got some or most of this right, and more likely there are more someones like that. So, I won't qualify everything I say with a maybe – lets this be the refactoring of those.

      Whenever mapping an XML file into a Solr Index, within the XPathRecordReader, (used by the XPathEntityProcessor within the DataImportHandler), if (A) a field is perceived to be null and is multivalued, it is pushed a value of null (on top of any other values it previously had). Otherwise (B) for multivalued fields, any found value is pushed onto its existing list of values, and the field is marked as found within the frame (a.k.a record).

      In general, when the end-tag of a record is seen, (C) the XPathRecordReader clears all of the field's values which have been marked as found, as tidiness is a value and they are supposedly no longer useful.
      However, suppose that for a given record and multivalued field, a value is never found (though it may have been found for other fields in the record), only (A) will have occurred, never will (B) have occurred, the field will never have been marked as found, and thus (C) never will have occurred for the field.

      So, the field will remain, with its list of nulls.
      This list of nulls will grow until either the last record or a non-null value is seen.
      And so, (1) an out-of-memory error may occur, given sufficiently many records and a mortal computer.
      Moreover, (2), a transformer cannot reliably depend on the number of nulls in the field (and this information cannot be guaranteed to be determined by some other value).

      I will try to provide more information, if this seems an issue and if there doesn't seem to be an answer.
      At this point, if I understand the problem correctly, it seems the answer is to 'mark' those null fields, considering 'null' and added value.

      1. SOLR-2960.patch
        11 kB
        James Dyer
      2. SOLR-2960.patch
        2 kB
        Michael Watts

        Issue Links

          Activity

          Hide
          Michael Watts added a comment -

          for branch_3x

          Show
          Michael Watts added a comment - for branch_3x
          Hide
          Hoss Man added a comment -

          Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19.

          Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited

          Show
          Hoss Man added a comment - Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19. Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Hoss Man added a comment -

          removing fixVersion=4.0 since there is no evidence that anyone is currently working on this issue. (this can certainly be revisited if volunteers step forward)

          but also assigning to James Dyer to triage (in spite of it's age, the patch still applies cleanly, but does not have any sort of test)

          Show
          Hoss Man added a comment - removing fixVersion=4.0 since there is no evidence that anyone is currently working on this issue. (this can certainly be revisited if volunteers step forward) but also assigning to James Dyer to triage (in spite of it's age, the patch still applies cleanly, but does not have any sort of test)
          Hide
          James Dyer added a comment -

          Here is an update of Michael Watts patch for current Trunk and also a unit test. I plan to commit this soon.

          Show
          James Dyer added a comment - Here is an update of Michael Watts patch for current Trunk and also a unit test. I plan to commit this soon.
          Hide
          ASF subversion and git services added a comment -

          Commit 1553285 from James Dyer in branch 'dev/trunk'
          [ https://svn.apache.org/r1553285 ]

          SOLR-2960: XPathEntityProcessor was adding spurious nulls to multi-valued fields

          Show
          ASF subversion and git services added a comment - Commit 1553285 from James Dyer in branch 'dev/trunk' [ https://svn.apache.org/r1553285 ] SOLR-2960 : XPathEntityProcessor was adding spurious nulls to multi-valued fields
          Hide
          ASF subversion and git services added a comment -

          Commit 1553305 from James Dyer in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1553305 ]

          SOLR-2960: XPathEntityProcessor was adding spurious nulls to multi-valued fields

          Show
          ASF subversion and git services added a comment - Commit 1553305 from James Dyer in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1553305 ] SOLR-2960 : XPathEntityProcessor was adding spurious nulls to multi-valued fields
          Hide
          James Dyer added a comment -

          Thanks, Michael. I apologize it took so long to commit this!

          Show
          James Dyer added a comment - Thanks, Michael. I apologize it took so long to commit this!

            People

            • Assignee:
              James Dyer
              Reporter:
              Michael Watts
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development