Solr
  1. Solr
  2. SOLR-3779

LineEntityProcessor processes only one document

    Details

      Description

      LineEntityProcessor processes only one document when combined with FileListEntityProcessor.

      <dataConfig>
      <dataSource type="FileDataSource" encoding="UTF-8" name="fds"/>
          <document>
             <entity name="f" processor="FileListEntityProcessor" fileName=".*txt" baseDir="/Volumes/data/Documents" recursive="false" rootEntity="false" dataSource="null" transformer="TemplateTransformer" >
                   <entity onError="skip" name="jc"   processor="LineEntityProcessor" url="${f.fileAbsolutePath}" dataSource="fds"  rootEntity="true" transformer="TemplateTransformer">
                	  <field column="link" template="hello${f.fileAbsolutePath},${jc.rawLine}" />
                	  <field column="rawLine" name="rawLine" />
                   </entity>          	  
              </entity>
          </document>
      </dataConfig>
      
      1. SOLR-3779.patch
        5 kB
        James Dyer
      2. SOLR-3779.patch
        1.0 kB
        Ahmet Arslan

        Activity

        Hide
        Ahmet Arslan added a comment -

        With this patch I was able to index lines of multiple files.

        Show
        Ahmet Arslan added a comment - With this patch I was able to index lines of multiple files.
        Hide
        James Dyer added a comment -

        Ahmet, thanks for reporting this and providing a fix! I'm pretty sure this was caused by SOLR-2382, see item #6 in the description "change the semantics of entity.destroy()". And I do think your fix is correct: just close the reader when it runs out of data so that the next time around it will open a new reader on the next file in the list. LEP is the only EntityProcessor that depended on the old semantics of destroy().

        The disturbing thing here is that TestLineEntityProcessor passes, so clearly it is not testing the combination of FLEP/LEP correctly, even though the code comments indicate this was the intention. Likely we need to replace this test with something in the spirit of the test included with SOLR-3307, or at least improve the mock-up LEP with something more realistic. In any case, we'll need a unit test that actually fails prior to your patch and then passes with it applied...

        Show
        James Dyer added a comment - Ahmet, thanks for reporting this and providing a fix! I'm pretty sure this was caused by SOLR-2382 , see item #6 in the description "change the semantics of entity.destroy()". And I do think your fix is correct: just close the reader when it runs out of data so that the next time around it will open a new reader on the next file in the list. LEP is the only EntityProcessor that depended on the old semantics of destroy(). The disturbing thing here is that TestLineEntityProcessor passes, so clearly it is not testing the combination of FLEP/LEP correctly, even though the code comments indicate this was the intention. Likely we need to replace this test with something in the spirit of the test included with SOLR-3307 , or at least improve the mock-up LEP with something more realistic. In any case, we'll need a unit test that actually fails prior to your patch and then passes with it applied...
        Hide
        Simon Boyle added a comment - - edited

        We've noticed similar issues in 3.6.1 after upgrading from 3.5
        Only the first file processed in a multi-file FileListEntityProcessor/LineEntityProcessor combination,
        and with only the first value of a multi-valued entry listed in a nested SqlEntityProcessor.

        Show
        Simon Boyle added a comment - - edited We've noticed similar issues in 3.6.1 after upgrading from 3.5 Only the first file processed in a multi-file FileListEntityProcessor/LineEntityProcessor combination, and with only the first value of a multi-valued entry listed in a nested SqlEntityProcessor.
        Hide
        James Dyer added a comment -

        Here is a patch with a unit test. I will commit this shortly as this needs to be fixed in 4.0.

        Show
        James Dyer added a comment - Here is a patch with a unit test. I will commit this shortly as this needs to be fixed in 4.0.
        Hide
        Hoss Man added a comment -

        James: should this be backported to the 3.6 branch for inclusions in a (probably) 3.6.2 as well?

        Show
        Hoss Man added a comment - James: should this be backported to the 3.6 branch for inclusions in a (probably) 3.6.2 as well?
        Hide
        James Dyer added a comment -

        problem was introduced with 3.6, so fixing in 3.6 branch also.

        Show
        James Dyer added a comment - problem was introduced with 3.6, so fixing in 3.6 branch also.
        Hide
        James Dyer added a comment -

        committed. Thank you, Ahmet.

        Trunk: r1384816
        4x: r1384828
        3x: r1384834

        Show
        James Dyer added a comment - committed. Thank you, Ahmet. Trunk: r1384816 4x: r1384828 3x: r1384834
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] James Dyer
        http://svn.apache.org/viewvc?view=revision&revision=1384828

        SOLR-3779/SOLR-3791: fix for DIH LineEntityProcessor & CachedSqlEntityProdessor

        Show
        Commit Tag Bot added a comment - [branch_4x commit] James Dyer http://svn.apache.org/viewvc?view=revision&revision=1384828 SOLR-3779 / SOLR-3791 : fix for DIH LineEntityProcessor & CachedSqlEntityProdessor
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            James Dyer
            Reporter:
            Ahmet Arslan
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development