Solr
  1. Solr
  2. SOLR-2382 DIH Cache Improvements
  3. SOLR-2933

DIHCacheSupport ignores left side of where="xid=x.id" attribute

    Details

      Description

      DIHCacheSupport introduced at SOLR-2382 uses new config attributes cachePk and cacheLookup. But support old one where="xid=x.id" is broken by DIHCacheSupport.<init> - it never put where="" sides into the context, but it revealed by SortedMapBackedCache.<init>, which takes just first column as a primary key. That's why all tests are green.

      To reproduce the issue I need just reorder entry at line 219 and make desc first and picked up as a primary key.

      To do that I propose to chose concrete map class randomly for all DIH test cases at createMap().

      I'm attaching test breaking patch and seed.

        Activity

        Hide
        Mikhail Khludnev added a comment -

        James,

        Can't you have a look into my new SOLR-3011? I'm sure it rocks.

        Thanks

        Show
        Mikhail Khludnev added a comment - James, Can't you have a look into my new SOLR-3011 ? I'm sure it rocks. Thanks
        Hide
        Mikhail Khludnev added a comment -

        Great, James! Thank you.

        Let me refresh SOLR-3011 patch at next week, and I also would like to think about same thread-proof paging for plain JDBC EntityProcessor (w/o caches).

        Show
        Mikhail Khludnev added a comment - Great, James! Thank you. Let me refresh SOLR-3011 patch at next week, and I also would like to think about same thread-proof paging for plain JDBC EntityProcessor (w/o caches).
        Hide
        James Dyer added a comment -

        fixed. This bug was caused by SOLR-2382, trunk only, so no CHANGES.txt entry.

        Trunk: r1245109 (bug fix & test improvement)
        Branch_3x: r1245129 (test improvement only)

        Thanks again, Mikhail.

        Show
        James Dyer added a comment - fixed. This bug was caused by SOLR-2382 , trunk only, so no CHANGES.txt entry. Trunk: r1245109 (bug fix & test improvement) Branch_3x: r1245129 (test improvement only) Thanks again, Mikhail.
        Hide
        James Dyer added a comment -

        I will commit this one shortly.

        Show
        James Dyer added a comment - I will commit this one shortly.
        Hide
        James Dyer added a comment -

        for 3.6, we should backport the test improvement (only).

        Show
        James Dyer added a comment - for 3.6, we should backport the test improvement (only).
        Hide
        James Dyer added a comment -

        Why not add randomization for choosing map class

        createMap() is but a trifle to facilitate testing. Personally I think randomizing on various Map impl's here is confusing and it would not be intuitive why it was done. What is important is the order of the fields being sent to the MockDataSource. In this case, there are 2 fields: "id" and "desc". Having some rows put "id" first and the others put "desc" first is adequate to test all the possible variants. I also think it is a good idea to always use a Map that preserves order because when writing these tests, it is not intuitive (for me anyhow) that the Map you're creating is going to iterate in an order different than the one you specify. That is why I changed the Abstract class to always use a LinkedHashMap.

        "assuming that first column is a primary key" is a really user-"hostile" feature.

        I didn't investigate whether this is just a function of the test or if DIH "in real life" behaves this way. In any case, if this is truly a DIH feature, I wouldn't consider it hostile. The primary key comes first most of the time. In any case, I wanted to solve the bug that this JIRA issue address and no more. If we have questions about default behaviors and want to change how the features work that is probably left to a separate non-"bug" issue.

        test methods names withWhereClause() and withKeyAndLookup() at TestCachedSqlEntityProcessor should be swapped each other.

        I think you're right, but then again I didn't write these tests, so I'm not sure why they were named this way. Also, changing the names is not directly related to fixing the bug here. Personally, I would love to see the DIH tests get revamped someday so you could just glance through them and understand what they do and how.

        In any case, now that I understand the bug and how to fix it, I would rather create a lean patch that someone can quickly evaluate and commit. If we add bloat or make it hard for a committer to understand it might just languish and remain unfixed for a long time.

        Show
        James Dyer added a comment - Why not add randomization for choosing map class createMap() is but a trifle to facilitate testing. Personally I think randomizing on various Map impl's here is confusing and it would not be intuitive why it was done. What is important is the order of the fields being sent to the MockDataSource. In this case, there are 2 fields: "id" and "desc". Having some rows put "id" first and the others put "desc" first is adequate to test all the possible variants. I also think it is a good idea to always use a Map that preserves order because when writing these tests, it is not intuitive (for me anyhow) that the Map you're creating is going to iterate in an order different than the one you specify. That is why I changed the Abstract class to always use a LinkedHashMap. "assuming that first column is a primary key" is a really user-"hostile" feature. I didn't investigate whether this is just a function of the test or if DIH "in real life" behaves this way. In any case, if this is truly a DIH feature, I wouldn't consider it hostile. The primary key comes first most of the time. In any case, I wanted to solve the bug that this JIRA issue address and no more. If we have questions about default behaviors and want to change how the features work that is probably left to a separate non-"bug" issue. test methods names withWhereClause() and withKeyAndLookup() at TestCachedSqlEntityProcessor should be swapped each other. I think you're right, but then again I didn't write these tests, so I'm not sure why they were named this way. Also, changing the names is not directly related to fixing the bug here. Personally, I would love to see the DIH tests get revamped someday so you could just glance through them and understand what they do and how. In any case, now that I understand the bug and how to fix it, I would rather create a lean patch that someone can quickly evaluate and commit. If we add bloat or make it hard for a committer to understand it might just languish and remain unfixed for a long time.
        Hide
        Mikhail Khludnev added a comment -

        James,

        I reviewed the path. It solves the issue. I have minor questions:

        • you changed the AbstractDataImportHandlerTestCase.createMap() but for the particular issue. Why not add randomization for choosing map class as I did at my patch? It keeps all current tests green but gives us a chance to find similar cases, which will be somehow added further;
        • but I still think that "assuming that first column is a primary key" is a really user-"hostile" feature. IMHO DIH config can be tuned by someone who is not necessarily a developer. For me it's quite natural to insert new field at the top of the field list. So, the current implementation can be unexpectedly impacted by reordering fields. Can't we prohibit SortedMapBackedCache.iterator(Object) if there is no cachePk="" specified, and remove primaryKeyName = rec.keySet().iterator().next()? i.e. what's the motivation for this implicit case?
        • firstly I proposed to exclude one of the "feature" - leave only where="" or keep cachePk="" and cacheLookup="" only. But I've got that cachePk="" is used for keying cache's multimap. So, it can be useful without cacheLookup="". Now, I agree that both ways should be supported;
        • test methods names withWhereClause() and withKeyAndLookup() at TestCachedSqlEntityProcessor should be swapped each other;
        Show
        Mikhail Khludnev added a comment - James, I reviewed the path. It solves the issue. I have minor questions: you changed the AbstractDataImportHandlerTestCase.createMap() but for the particular issue. Why not add randomization for choosing map class as I did at my patch? It keeps all current tests green but gives us a chance to find similar cases, which will be somehow added further; but I still think that "assuming that first column is a primary key" is a really user-"hostile" feature. IMHO DIH config can be tuned by someone who is not necessarily a developer. For me it's quite natural to insert new field at the top of the field list. So, the current implementation can be unexpectedly impacted by reordering fields. Can't we prohibit SortedMapBackedCache.iterator(Object) if there is no cachePk="" specified, and remove primaryKeyName = rec.keySet().iterator().next()? i.e. what's the motivation for this implicit case? firstly I proposed to exclude one of the "feature" - leave only where="" or keep cachePk="" and cacheLookup="" only. But I've got that cachePk="" is used for keying cache's multimap. So, it can be useful without cacheLookup="". Now, I agree that both ways should be supported; test methods names withWhereClause() and withKeyAndLookup() at TestCachedSqlEntityProcessor should be swapped each other;
        Hide
        James Dyer added a comment -

        Mikhail,

        I appreciate your being persistent with this one. I apologize for not understanding what you were getting at the first couple times you tried to explain, but I think I finally have a fix. See SOLR-2933.patch, attached.

        The problem was caused by my not copying code properly from EntityProcessorBase to DIHCacheSupport. This broke support for the "where" parameter for cached entity processors. ("cachePk" & "cacheLookup" still work and can be used instead of "where" as a workaround).

        Further, a shortcoming in TestCachedSqlEntityProcessor masked the problem. Although the primary key was not being picked up from the "where" property, the primary key was the first property being passed in. DIH then assumed (correctly in this case) the first column must be the primary key. So the test (#withKeyAndLookup) would pass even though by accident.

        This patch:
        1. Fixes the bug in DIHCacheSupport.
        2. Modifies TestCachedSqlEntityProcessor to reverse the order of one of the incoming documents so that the primary key is not first.
        3. Changes AbstractDataImportHandlerTestCas#createMap to use a LinkedHashMap so that the tested order will be preserved. The HashMap that was used prior would always put the primary key column before the others, even if the test specifies a different order, further masking the failure.

        Mikhail, thank you for both pointing out my error and also the prior problems with the test.

        Show
        James Dyer added a comment - Mikhail, I appreciate your being persistent with this one. I apologize for not understanding what you were getting at the first couple times you tried to explain, but I think I finally have a fix. See SOLR-2933 .patch, attached. The problem was caused by my not copying code properly from EntityProcessorBase to DIHCacheSupport. This broke support for the "where" parameter for cached entity processors. ("cachePk" & "cacheLookup" still work and can be used instead of "where" as a workaround). Further, a shortcoming in TestCachedSqlEntityProcessor masked the problem. Although the primary key was not being picked up from the "where" property, the primary key was the first property being passed in. DIH then assumed (correctly in this case) the first column must be the primary key. So the test (#withKeyAndLookup) would pass even though by accident. This patch: 1. Fixes the bug in DIHCacheSupport. 2. Modifies TestCachedSqlEntityProcessor to reverse the order of one of the incoming documents so that the primary key is not first. 3. Changes AbstractDataImportHandlerTestCas#createMap to use a LinkedHashMap so that the tested order will be preserved. The HashMap that was used prior would always put the primary key column before the others, even if the test specifies a different order, further masking the failure. Mikhail, thank you for both pointing out my error and also the prior problems with the test.
        Hide
        Mikhail Khludnev added a comment - - edited

        AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map class randomly in all DIH testcases.
        It breaks TestCachedSqlEntityProcessor.withKeyAndLookup. More explanations are 1 2 3

        seed which reproduces the fail

        ant test -Dtestcase=TestCachedSqlEntityProcessor -Dtestmethod=withKeyAndLookup -Dtests.seed=7735f677498f3558:-29c15941cc37921e:-32c8bd2280b92536 -Dargs="-Dfile.encoding=UTF-8"
        

        Let me attach the fix tomorrow. It's not a big deal anyway.

        Show
        Mikhail Khludnev added a comment - - edited AbstractDataImportHandlerTestCase.java-choose-map-randomly.patch pick up map class randomly in all DIH testcases. It breaks TestCachedSqlEntityProcessor.withKeyAndLookup . More explanations are 1 2 3 seed which reproduces the fail ant test -Dtestcase=TestCachedSqlEntityProcessor -Dtestmethod=withKeyAndLookup -Dtests.seed=7735f677498f3558:-29c15941cc37921e:-32c8bd2280b92536 -Dargs= "-Dfile.encoding=UTF-8" Let me attach the fix tomorrow. It's not a big deal anyway.

          People

          • Assignee:
            James Dyer
            Reporter:
            Mikhail Khludnev
          • Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development