Solr
  1. Solr
  2. SOLR-3307

DIH FileListEntityProcessor not multi-threading after applying patch SOLR-3011

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.6
    • Fix Version/s: 3.6
    • Labels:
      None

      Description

      As reported in issue SOLR-3011 the FileListEntityProcessor is not recursing through all sub-directories and files after applying SOLR-3011.patch.

      1. SOLR-3307.patch
        7 kB
        James Dyer
      2. SOLR-3307-UnitTest.patch
        4 kB
        Bernd Fehling

        Activity

        Bernd Fehling created issue -
        Hide
        Bernd Fehling added a comment - - edited

        Unit test patch TestMultiThreadedFileReader added.

        Show
        Bernd Fehling added a comment - - edited Unit test patch TestMultiThreadedFileReader added.
        Bernd Fehling made changes -
        Field Original Value New Value
        Attachment SOLR-3307-UnitTest.patch [ 12520979 ]
        Hide
        Bernd Fehling added a comment -

        As far as I could figure out the differences between "3.5" and "3.6 with SOLR-3011" are:

        • with 3.5
          • I get a single FileListEntityProcessor with multi-threaded XPathEntityProcessor (according to number of "threads")
          • "threads" parameter effects rootEntity
        • with 3.6 and SOLR-3011
          • I get multi-threaded FileListEntityProcessor (according to number of "threads") with multi-threaded XPathEntityProcessor
          • "threads" parameter effects also all entities above rootEntity
        Show
        Bernd Fehling added a comment - As far as I could figure out the differences between "3.5" and "3.6 with SOLR-3011 " are: with 3.5 I get a single FileListEntityProcessor with multi-threaded XPathEntityProcessor (according to number of "threads") "threads" parameter effects rootEntity with 3.6 and SOLR-3011 I get multi-threaded FileListEntityProcessor (according to number of "threads") with multi-threaded XPathEntityProcessor "threads" parameter effects also all entities above rootEntity
        Hide
        James Dyer added a comment -

        Bernd,

        Here's a patch that passed your new unit test. Do you think you can give it a quick review?

        All other DIH tests pass too, including Mikhail's new tests from SOLR-3011.

        If all's good, I'd like to try and slip this into 3.6 as this is to fix a 3.6-only regression (feature is removed in Trunk).

        Show
        James Dyer added a comment - Bernd, Here's a patch that passed your new unit test. Do you think you can give it a quick review? All other DIH tests pass too, including Mikhail's new tests from SOLR-3011 . If all's good, I'd like to try and slip this into 3.6 as this is to fix a 3.6-only regression (feature is removed in Trunk).
        James Dyer made changes -
        Attachment SOLR-3307.patch [ 12521068 ]
        Hide
        Bernd Fehling added a comment -

        Excellent, works now as expected.

        Now its getting more difficult because when loading a large amount of records each index segment gets a ".del" file of different size.
        Nevertheless all data is loaded without loss and an optimize will clean up everything.
        Obviously another bad side effect of multi-threading and NOT seen with version 3.5.
        Will be hard to find out what happens.
        An overlapping of solrwriter?

        Does it make sence to spend more time looking into this if multi-threading will be removed anyway in 4.x?

        Show
        Bernd Fehling added a comment - Excellent, works now as expected. Now its getting more difficult because when loading a large amount of records each index segment gets a ".del" file of different size. Nevertheless all data is loaded without loss and an optimize will clean up everything. Obviously another bad side effect of multi-threading and NOT seen with version 3.5. Will be hard to find out what happens. An overlapping of solrwriter? Does it make sence to spend more time looking into this if multi-threading will be removed anyway in 4.x?
        Hide
        Robert Muir added a comment -

        Now its getting more difficult because when loading a large amount of records each index segment gets a ".del" file of different size.
        Nevertheless all data is loaded without loss and an optimize will clean up everything.
        Obviously another bad side effect of multi-threading and NOT seen with version 3.5.
        Will be hard to find out what happens.
        An overlapping of solrwriter?

        Are you sure its not just because DIH no longer optimizes itself by default in 3.6?

        * SOLR-3142: Imports no longer default optimize to true, instead false. If you want to force all segments to be merged
                     into one, you can specify this parameter yourself. NOTE: this can be very expensive operation and usually
                     does not make sense for delta-imports.
        

        Of course if you are seeing a lot of .dels after importing data, it sounds like you have
        some kind of impedence mismatch (duplicate unique ids) in your source data...

        Show
        Robert Muir added a comment - Now its getting more difficult because when loading a large amount of records each index segment gets a ".del" file of different size. Nevertheless all data is loaded without loss and an optimize will clean up everything. Obviously another bad side effect of multi-threading and NOT seen with version 3.5. Will be hard to find out what happens. An overlapping of solrwriter? Are you sure its not just because DIH no longer optimizes itself by default in 3.6? * SOLR-3142: Imports no longer default optimize to true, instead false. If you want to force all segments to be merged into one, you can specify this parameter yourself. NOTE: this can be very expensive operation and usually does not make sense for delta-imports. Of course if you are seeing a lot of .dels after importing data, it sounds like you have some kind of impedence mismatch (duplicate unique ids) in your source data...
        Hide
        Robert Muir added a comment -

        If all's good, I'd like to try and slip this into 3.6 as this is to fix a 3.6-only regression

        Sounds like the right thing to do James. Thanks for working on this.

        Show
        Robert Muir added a comment - If all's good, I'd like to try and slip this into 3.6 as this is to fix a 3.6-only regression Sounds like the right thing to do James. Thanks for working on this.
        Hide
        James Dyer added a comment -

        committed 3_x: r1309004
        Thank you, Bernd.

        Show
        James Dyer added a comment - committed 3_x: r1309004 Thank you, Bernd.
        James Dyer made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            James Dyer
            Reporter:
            Bernd Fehling
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development