Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9284

The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4, 7.0
    • Component/s: hdfs
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None
    1. SOLR-9284.patch
      11 kB
      Mark Miller
    2. SOLR-9284.patch
      6 kB
      Mark Miller

      Issue Links

        Activity

        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        A fix for keysToRelease is relatively straightforward, but names is a little tougher. We may just need to have a configurable max names to track size that defaults to something fairly healthy for a normal index.

        Show
        markrmiller@gmail.com Mark Miller added a comment - A fix for keysToRelease is relatively straightforward, but names is a little tougher. We may just need to have a configurable max names to track size that defaults to something fairly healthy for a normal index.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Just a quick patch, needs a bit of review and polish, but this is along the lines I was thinking of for a fix.

        Not super satisfied with the 'names' issue, but not sure what we should do at the moment. We might want a higher upper limit and/or to make it configurable. Or perhaps there should be some kind of cleaner thread that prunes occasionally?

        Show
        markrmiller@gmail.com Mark Miller added a comment - Just a quick patch, needs a bit of review and polish, but this is along the lines I was thinking of for a fix. Not super satisfied with the 'names' issue, but not sure what we should do at the moment. We might want a higher upper limit and/or to make it configurable. Or perhaps there should be some kind of cleaner thread that prunes occasionally?
        Hide
        michael.sun Michael Sun added a comment -

        Thanks Mark Miller for patch. Here is some of my thoughts.

        1. In BlockDirectoryCache, the map between name and integer is changed to use Caffeine cache without setting up removal listener. I am not sure if it's correct. If Caffeine cache removes a name in cache, underlying Block Cache needs to delete all blocks related to that name.
        2. There are a few occurrences of System.out.println() in BlockDirectoryCache. It's better to use log.

        Show
        michael.sun Michael Sun added a comment - Thanks Mark Miller for patch. Here is some of my thoughts. 1. In BlockDirectoryCache, the map between name and integer is changed to use Caffeine cache without setting up removal listener. I am not sure if it's correct. If Caffeine cache removes a name in cache, underlying Block Cache needs to delete all blocks related to that name. 2. There are a few occurrences of System.out.println() in BlockDirectoryCache. It's better to use log.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -
        • 1. In BlockDirectoryCache, the map between name and integer is changed to use Caffeine cache without setting up removal listener...

        It never had a removal listener. This is the names map mentioned above. We can't easily remove names, that is the point of just setting an upper size on it.

        • 2. There are a few occurrences of System.out.println() in BlockDirectoryCache. It's better to use log.

        This is just debug statements for development, no need to log, they just need to be removed before commit.

        Show
        markrmiller@gmail.com Mark Miller added a comment - 1. In BlockDirectoryCache, the map between name and integer is changed to use Caffeine cache without setting up removal listener... It never had a removal listener. This is the names map mentioned above. We can't easily remove names, that is the point of just setting an upper size on it. 2. There are a few occurrences of System.out.println() in BlockDirectoryCache. It's better to use log. This is just debug statements for development, no need to log, they just need to be removed before commit.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Updated patch attached.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Updated patch attached.
        Hide
        michael.sun Michael Sun added a comment -

        It never had a removal listener. This is the names map mentioned above. We can't easily remove names, that is the point of just setting an upper size on it.

        I understand this patch doesn't want to remove names. However the Caffeine cache may decide to remove a name on his own. It's the case even the map didn't reach the max size. Therefore removing can happen under the hood anytime. A removal listener is necessary if Caffeine cache is used. Here is the doc of maximumSize() which explains the behavior https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/Caffeine.java#L277.

        If names are not intended to be removed, it's probably better to use ConcurrentHashMap. I think the growth of name map is less critical compared to the growth of BlockKey. There seems no evidence in test so far that name map grows significantly to be on the radar either.

        Show
        michael.sun Michael Sun added a comment - It never had a removal listener. This is the names map mentioned above. We can't easily remove names, that is the point of just setting an upper size on it. I understand this patch doesn't want to remove names. However the Caffeine cache may decide to remove a name on his own. It's the case even the map didn't reach the max size. Therefore removing can happen under the hood anytime. A removal listener is necessary if Caffeine cache is used. Here is the doc of maximumSize() which explains the behavior https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/Caffeine.java#L277 . If names are not intended to be removed, it's probably better to use ConcurrentHashMap. I think the growth of name map is less critical compared to the growth of BlockKey. There seems no evidence in test so far that name map grows significantly to be on the radar either.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        A removal listener is necessary if Caffeine cache is used.

        Why though? The underlying data will just be reused as the cache is LRU - do we really need to explicitly free anything here?

        Show
        markrmiller@gmail.com Mark Miller added a comment - A removal listener is necessary if Caffeine cache is used. Why though? The underlying data will just be reused as the cache is LRU - do we really need to explicitly free anything here?
        Hide
        michael.sun Michael Sun added a comment -

        Why though (A removal listener is necessary if Caffeine cache is used.)

        As long as the name can be removed (in this case, implicitly by Caffeine cache), the block cache items related to that name needs to be removed as well. That's the reason a removal listener is necessary.

        do we really need to explicitly free anything here

        We don't explicitly free anything. The point is that names can be freed implicitly by Caffeine cache. The doc of this behavior is mentioned in previous comment.

        Show
        michael.sun Michael Sun added a comment - Why though (A removal listener is necessary if Caffeine cache is used.) As long as the name can be removed (in this case, implicitly by Caffeine cache), the block cache items related to that name needs to be removed as well. That's the reason a removal listener is necessary. do we really need to explicitly free anything here We don't explicitly free anything. The point is that names can be freed implicitly by Caffeine cache. The doc of this behavior is mentioned in previous comment.
        Hide
        ben.manes Ben Manes added a comment -

        Hopefully I didn't break this behavior when upgrading from ConcurrentLinkedHashMap (Caffeine's predecessor). That code used an eviction listener, so I think it was a direct translation. Can you take a look and see if the prior version was more correct?

        Note that the cache, in its current form, will only evict after the maximum size threshold is crossed. However, Guava does evict prior due to being split into multiple segments that are operated on exclusively during a write. I kept that wording in the JavaDoc to provide flexibility, just in case.

        Show
        ben.manes Ben Manes added a comment - Hopefully I didn't break this behavior when upgrading from ConcurrentLinkedHashMap (Caffeine's predecessor). That code used an eviction listener, so I think it was a direct translation. Can you take a look and see if the prior version was more correct? Note that the cache, in its current form, will only evict after the maximum size threshold is crossed. However, Guava does evict prior due to being split into multiple segments that are operated on exclusively during a write. I kept that wording in the JavaDoc to provide flexibility, just in case.
        Hide
        michael.sun Michael Sun added a comment -

        Can you take a look and see if the prior version was more correct?

        For the name map (BlockDirectoryCache.names) mentioned in my previous comments, it's currently ConcurrentHashMap, not ConcurrentLinkedHashMap. ConcurrentHashMap doesn't evict items implicitly. Therefore there was no need to setup eviction listener and prior version is ok.

        The patch changes the name map from ConcurrentHashMap to Caffeine which can evict items implicitly. Therefore it's necessary to setup a removal listener. Or keep ConcurrentHashMap since the name map usually doesn't grow much from test results.

        Ben Manes I guess you are talking about the BlockCache.cache, which was using ConcurrentLinkedHashMap, and now Caffeine. There is a removal listener setup in the code and it looks ok. Feel free to open a JIRA if you have any specific concern about it.

        Show
        michael.sun Michael Sun added a comment - Can you take a look and see if the prior version was more correct? For the name map (BlockDirectoryCache.names) mentioned in my previous comments, it's currently ConcurrentHashMap, not ConcurrentLinkedHashMap. ConcurrentHashMap doesn't evict items implicitly. Therefore there was no need to setup eviction listener and prior version is ok. The patch changes the name map from ConcurrentHashMap to Caffeine which can evict items implicitly. Therefore it's necessary to setup a removal listener. Or keep ConcurrentHashMap since the name map usually doesn't grow much from test results. Ben Manes I guess you are talking about the BlockCache.cache, which was using ConcurrentLinkedHashMap, and now Caffeine. There is a removal listener setup in the code and it looks ok. Feel free to open a JIRA if you have any specific concern about it.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        As long as the name can be removed (in this case, implicitly by Caffeine cache), the block cache items related to that name needs to be removed as well. That's the reason a removal listener is necessary.

        But that does not address "why". It just restates what you said:

        "the block cache items related to that name needs to be removed as well."

        I have the same questions though. Even if the name is removed, why do we care if the data remains in the cache?

        Why does the underlying data in the cache need to be removed? The underlying cache locations should simply be reclaimed by the LRU cache replacement policy.

        Do we gain much by working to explicitly free anything early in this case?

        Show
        markrmiller@gmail.com Mark Miller added a comment - As long as the name can be removed (in this case, implicitly by Caffeine cache), the block cache items related to that name needs to be removed as well. That's the reason a removal listener is necessary. But that does not address "why". It just restates what you said: "the block cache items related to that name needs to be removed as well." I have the same questions though. Even if the name is removed, why do we care if the data remains in the cache? Why does the underlying data in the cache need to be removed? The underlying cache locations should simply be reclaimed by the LRU cache replacement policy. Do we gain much by working to explicitly free anything early in this case?
        Hide
        michael.sun Michael Sun added a comment -

        Why does the underlying data in the cache need to be removed? The underlying cache locations should simply be reclaimed by the LRU cache replacement policy.

        Ah, I see your question. I agree that inaccessible data can be removed by the LRU logic of block cache eventually. The main gain of my suggestion to help cache efficiency. For example, if cached data related to the name removed is newly cached, instead of pushing out them, the LRU cache may decide to push out some older cached data which may be still useful.

        And releasing unused memory early is in general a good practice.

        With that said, the name map implementation in patch is better than current implementation (using ConcurrentHashMap). I was hoping to make max use of memory by removing related items once a name is deleted. But if it's hard to achieve, the current patch is good to go IMO.

        Show
        michael.sun Michael Sun added a comment - Why does the underlying data in the cache need to be removed? The underlying cache locations should simply be reclaimed by the LRU cache replacement policy. Ah, I see your question. I agree that inaccessible data can be removed by the LRU logic of block cache eventually. The main gain of my suggestion to help cache efficiency. For example, if cached data related to the name removed is newly cached, instead of pushing out them, the LRU cache may decide to push out some older cached data which may be still useful. And releasing unused memory early is in general a good practice. With that said, the name map implementation in patch is better than current implementation (using ConcurrentHashMap). I was hoping to make max use of memory by removing related items once a name is deleted. But if it's hard to achieve, the current patch is good to go IMO.
        Hide
        ben.manes Ben Manes added a comment -

        Michael Sun: If you upgrade to Caffeine 2.x then it will take advantage of frequency in addition to recency. A path is available in SOLR-8241, but its been stalled due to Shawn not having the bandwidth to drive the changes forward.

        Show
        ben.manes Ben Manes added a comment - Michael Sun : If you upgrade to Caffeine 2.x then it will take advantage of frequency in addition to recency. A path is available in SOLR-8241 , but its been stalled due to Shawn not having the bandwidth to drive the changes forward.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        But if it's hard to achieve

        It's not exactly simple - that is why the current delete methods take a file name, but also do not release any of the block keys. I think if we wanted to that, we either have to do long scans of cache keys, or start storing cache key list maps keyed by file name.

        Show
        markrmiller@gmail.com Mark Miller added a comment - But if it's hard to achieve It's not exactly simple - that is why the current delete methods take a file name, but also do not release any of the block keys. I think if we wanted to that, we either have to do long scans of cache keys, or start storing cache key list maps keyed by file name.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        Let's spin that off into a new issue if you want to tackle it. I'll commit the progress we have now.

        Show
        markrmiller@gmail.com Mark Miller added a comment - Let's spin that off into a new issue if you want to tackle it. I'll commit the progress we have now.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 0325722e675c336ba71f5d47b19133753c2a42e5 in lucene-solr's branch refs/heads/master from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0325722 ]

        SOLR-9284: The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 0325722e675c336ba71f5d47b19133753c2a42e5 in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0325722 ] SOLR-9284 : The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 3d9101601448f0a69b91de2151e64f1f48895fab in lucene-solr's branch refs/heads/branch_6x from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3d91016 ]

        SOLR-9284: The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 3d9101601448f0a69b91de2151e64f1f48895fab in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3d91016 ] SOLR-9284 : The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.
        Hide
        steve_rowe Steve Rowe added a comment -

        OOM issues likely caused by commit here: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/18288/

        Also reproducible, from my Jenkins:

          [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BlockDirectoryTest -Dtests.method=testEOF -Dtests.seed=81253F7E7D614B6C -Dtests.slow=true -Dtests.locale=bg -Dtests.timezone=Etc/UCT -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
           [junit4] ERROR   1.41s | BlockDirectoryTest.testEOF <<<
           [junit4]    > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory
           [junit4]    >        at java.nio.Bits.reserveMemory(Bits.java:693)
           [junit4]    >        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
           [junit4]    >        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
           [junit4]    >        at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68)
           [junit4]    >        at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119)
           [junit4]    >        at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException
           [junit4]    >        at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
         
        Show
        steve_rowe Steve Rowe added a comment - OOM issues likely caused by commit here: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/18288/ Also reproducible, from my Jenkins: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=BlockDirectoryTest -Dtests.method=testEOF -Dtests.seed=81253F7E7D614B6C -Dtests.slow=true -Dtests.locale=bg -Dtests.timezone=Etc/UCT -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 1.41s | BlockDirectoryTest.testEOF <<< [junit4] > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory [junit4] > at java.nio.Bits.reserveMemory(Bits.java:693) [junit4] > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) [junit4] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) [junit4] > at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68) [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119) [junit4] > at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 358c164620f774820bd22278fcf425c599a254b2 in lucene-solr's branch refs/heads/master from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=358c164 ]

        SOLR-9284: Reduce off heap cache size and fix test asserts.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 358c164620f774820bd22278fcf425c599a254b2 in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=358c164 ] SOLR-9284 : Reduce off heap cache size and fix test asserts.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit b90b4dc694edb9b31c5afd69b477e6d90f24adfd in lucene-solr's branch refs/heads/branch_6x from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b90b4dc ]

        SOLR-9284: Reduce off heap cache size and fix test asserts.

        Show
        jira-bot ASF subversion and git services added a comment - Commit b90b4dc694edb9b31c5afd69b477e6d90f24adfd in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b90b4dc ] SOLR-9284 : Reduce off heap cache size and fix test asserts.
        Hide
        steve_rowe Steve Rowe added a comment -

        My Jenkins found a reproducing seed half an hour ago (after the commits above) - note that I had to run the test without -Dtests.method=ensureCacheConfigurable to get it to reproduce:

          [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BlockDirectoryTest -Dtests.method=ensureCacheConfigurable -Dtests.seed=281E6C2B5FD2D4E1 -Dtests.slow=true -Dtests.locale=tr-TR -Dtests.timezone=PST8PDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          [junit4] ERROR   1.39s J3  | BlockDirectoryTest.ensureCacheConfigurable <<<
          [junit4]    > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory
          [junit4]    > 	at java.nio.Bits.reserveMemory(Bits.java:693)
          [junit4]    > 	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
          [junit4]    > 	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
          [junit4]    > 	at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68)
          [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119)
          [junit4]    > 	at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException
          [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
        
        Show
        steve_rowe Steve Rowe added a comment - My Jenkins found a reproducing seed half an hour ago (after the commits above) - note that I had to run the test without -Dtests.method=ensureCacheConfigurable to get it to reproduce: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=BlockDirectoryTest -Dtests.method=ensureCacheConfigurable -Dtests.seed=281E6C2B5FD2D4E1 -Dtests.slow=true -Dtests.locale=tr-TR -Dtests.timezone=PST8PDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 1.39s J3 | BlockDirectoryTest.ensureCacheConfigurable <<< [junit4] > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory [junit4] > at java.nio.Bits.reserveMemory(Bits.java:693) [junit4] > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) [junit4] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) [junit4] > at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68) [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119) [junit4] > at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
        Hide
        steve_rowe Steve Rowe added a comment -

        Three more seeds, but none reproduce for me - note that all three include an NPE as a second Throwable, which I just noticed in the trace in my previous comment here:

        From https://builds.apache.org/job/Lucene-Solr-SmokeRelease-6.x/182:

          [smoker]    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BlockDirectoryTest -Dtests.method=testEOF -Dtests.seed=9A2D36FC5487E440 -Dtests.multiplier=2 -Dtests.locale=hi -Dtests.timezone=America/North_Dakota/Beulah -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
           [smoker]    [junit4] ERROR   1.66s J1 | BlockDirectoryTest.testEOF <<<
           [smoker]    [junit4]    > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory
           [smoker]    [junit4]    > 	at java.nio.Bits.reserveMemory(Bits.java:693)
           [smoker]    [junit4]    > 	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
           [smoker]    [junit4]    > 	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
           [smoker]    [junit4]    > 	at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68)
           [smoker]    [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119)
           [smoker]    [junit4]    > 	at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException
           [smoker]    [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
        

        From my Jenkins on branch_6.x:

          [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BlockDirectoryTest -Dtests.method=testRandomAccessWritesLargeCache -Dtests.seed=79BD96B775734799 -Dtests.slow=true -Dtests.locale=ar-TN -Dtests.timezone=Europe/Lisbon -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
          [junit4] ERROR   1.95s J7  | BlockDirectoryTest.testRandomAccessWritesLargeCache <<<
          [junit4]    > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory
          [junit4]    > 	at java.nio.Bits.reserveMemory(Bits.java:693)
          [junit4]    > 	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
          [junit4]    > 	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
          [junit4]    > 	at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68)
          [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119)
          [junit4]    > 	at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException
          [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
        

        And from my Jenkins on master:

          [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BlockDirectoryTest -Dtests.method=testRandomAccessWrites -Dtests.seed=39545A949FB2DD31 -Dtests.slow=true -Dtests.locale=sr-ME -Dtests.timezone=America/Indiana/Vevay -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          [junit4] ERROR   0.86s J7  | BlockDirectoryTest.testRandomAccessWrites <<<
          [junit4]    > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory
          [junit4]    > 	at java.nio.Bits.reserveMemory(Bits.java:693)
          [junit4]    > 	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
          [junit4]    > 	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
          [junit4]    > 	at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68)
          [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119)
          [junit4]    > 	at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException
          [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
        
        Show
        steve_rowe Steve Rowe added a comment - Three more seeds, but none reproduce for me - note that all three include an NPE as a second Throwable, which I just noticed in the trace in my previous comment here: From https://builds.apache.org/job/Lucene-Solr-SmokeRelease-6.x/182 : [smoker] [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=BlockDirectoryTest -Dtests.method=testEOF -Dtests.seed=9A2D36FC5487E440 -Dtests.multiplier=2 -Dtests.locale=hi -Dtests.timezone=America/North_Dakota/Beulah -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [smoker] [junit4] ERROR 1.66s J1 | BlockDirectoryTest.testEOF <<< [smoker] [junit4] > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory [smoker] [junit4] > at java.nio.Bits.reserveMemory(Bits.java:693) [smoker] [junit4] > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) [smoker] [junit4] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) [smoker] [junit4] > at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68) [smoker] [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119) [smoker] [junit4] > at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException [smoker] [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131) From my Jenkins on branch_6.x: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=BlockDirectoryTest -Dtests.method=testRandomAccessWritesLargeCache -Dtests.seed=79BD96B775734799 -Dtests.slow=true -Dtests.locale=ar-TN -Dtests.timezone=Europe/Lisbon -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 1.95s J7 | BlockDirectoryTest.testRandomAccessWritesLargeCache <<< [junit4] > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory [junit4] > at java.nio.Bits.reserveMemory(Bits.java:693) [junit4] > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) [junit4] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) [junit4] > at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68) [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119) [junit4] > at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131) And from my Jenkins on master: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=BlockDirectoryTest -Dtests.method=testRandomAccessWrites -Dtests.seed=39545A949FB2DD31 -Dtests.slow=true -Dtests.locale=sr-ME -Dtests.timezone=America/Indiana/Vevay -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.86s J7 | BlockDirectoryTest.testRandomAccessWrites <<< [junit4] > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory [junit4] > at java.nio.Bits.reserveMemory(Bits.java:693) [junit4] > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) [junit4] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) [junit4] > at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68) [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119) [junit4] > at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        It probably just matters what is running in parallel and also eating away at the artificial direct memory governor.

        Show
        markrmiller@gmail.com Mark Miller added a comment - It probably just matters what is running in parallel and also eating away at the artificial direct memory governor.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 53a0748f4345b540da598c25500f4fc402dbbf38 in lucene-solr's branch refs/heads/master from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=53a0748 ]

        SOLR-9284: Reduce off heap cache size.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 53a0748f4345b540da598c25500f4fc402dbbf38 in lucene-solr's branch refs/heads/master from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=53a0748 ] SOLR-9284 : Reduce off heap cache size.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 6962381180c7c9d26f22fb09b3b673f2a9f8ef7b in lucene-solr's branch refs/heads/branch_6x from markrmiller
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6962381 ]

        SOLR-9284: Reduce off heap cache size.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 6962381180c7c9d26f22fb09b3b673f2a9f8ef7b in lucene-solr's branch refs/heads/branch_6x from markrmiller [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6962381 ] SOLR-9284 : Reduce off heap cache size.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        It probably just matters what is running in parallel and also eating away at the artificial direct memory governor.

        Although I suppose anything in parallel is in it's own JVM and should have it's own limit. Perhaps a lack of releasing direct memory somewhere then.

        Show
        markrmiller@gmail.com Mark Miller added a comment - It probably just matters what is running in parallel and also eating away at the artificial direct memory governor. Although I suppose anything in parallel is in it's own JVM and should have it's own limit. Perhaps a lack of releasing direct memory somewhere then.
        Hide
        steve_rowe Steve Rowe added a comment -

        Looks like the NPE in HdfsDirectoryTest.testEOF() is still happening: this reproducing master seed is from my Jenkins:

           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=HdfsDirectoryTest -Dtests.method=testEOF -Dtests.seed=FA1024A704DD72C3 -Dtests.slow=true -Dtests.locale=en-GB -Dtests.timezone=Africa/Johannesburg -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] ERROR   0.11s J11 | HdfsDirectoryTest.testEOF <<<
           [junit4]    > Throwable #1: java.lang.NullPointerException
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([FA1024A704DD72C3:6B7B66AF46F9D4BF]:0)
           [junit4]    > 	at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:69)
           [junit4]    > 	at org.apache.solr.store.hdfs.HdfsDirectoryTest.testEof(HdfsDirectoryTest.java:158)
           [junit4]    > 	at org.apache.solr.store.hdfs.HdfsDirectoryTest.testEOF(HdfsDirectoryTest.java:150)
        [...]
           [junit4]   2> 429308 ERROR (SUITE-HdfsDirectoryTest-seed#[FA1024A704DD72C3]-worker) [    ] o.a.h.m.l.MethodMetric Error invoking method getBlocksTotal
           [junit4]   2> java.lang.reflect.InvocationTargetException
           [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           [junit4]   2> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           [junit4]   2> 	at java.lang.reflect.Method.invoke(Method.java:498)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:401)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:194)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassName(DefaultMBeanServerInterceptor.java:1804)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.safeGetClassName(DefaultMBeanServerInterceptor.java:1595)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanPermission(DefaultMBeanServerInterceptor.java:1813)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:430)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
           [junit4]   2> 	at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
           [junit4]   2> 	at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:81)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stopMBeans(MetricsSourceAdapter.java:226)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stop(MetricsSourceAdapter.java:211)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSources(MetricsSystemImpl.java:463)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:213)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:594)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics.shutdown(NameNodeMetrics.java:171)
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:872)
           [junit4]   2> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1726)
           [junit4]   2> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1705)
           [junit4]   2> 	at org.apache.solr.cloud.hdfs.HdfsTestUtil.teardownClass(HdfsTestUtil.java:198)
           [junit4]   2> 	at org.apache.solr.store.hdfs.HdfsDirectoryTest.afterClass(HdfsDirectoryTest.java:65)
           [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           [junit4]   2> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           [junit4]   2> 	at java.lang.reflect.Method.invoke(Method.java:498)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:870)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
           [junit4]   2> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
           [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> Caused by: java.lang.NullPointerException
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:203)
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getTotalBlocks(BlockManager.java:3370)
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocksTotal(FSNamesystem.java:5729)
           [junit4]   2> 	... 54 more
           [junit4]   2> 429312 INFO  (SUITE-HdfsDirectoryTest-seed#[FA1024A704DD72C3]-worker) [    ] o.a.s.SolrTestCaseJ4 ###deleteCore
           [junit4]   2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/jobs/Lucene-Solr-tests-6.x/workspace/solr/build/solr-core/test/J11/temp/solr.store.hdfs.HdfsDirectoryTest_FA1024A704DD72C3-001
           [junit4]   2> Nov 22, 2016 11:23:44 AM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
           [junit4]   2> WARNING: Will linger awaiting termination of 130 leaked thread(s).
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62), sim=RandomSimilarity(queryNorm=true,coord=yes): {}, locale=en-GB, timezone=Africa/Johannesburg
           [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=2,free=193596736,total=526385152
           [junit4]   2> NOTE: All tests run in this JVM: [TestReversedWildcardFilterFactory, TestTestInjection, TestGraphMLResponseWriter, ReplicaListTransformerTest, UpdateRequestProcessorFactoryTest, TestMissingGroups, CursorPagingTest, LukeRequestHandlerTest, TestDistributedStatsComponentCardinality, TestRangeQuery, TestHdfsBackupRestoreCore, AtomicUpdatesTest, TestRealTimeGet, TestNumericTerms64, TestRandomFlRTGCloud, TestSolrCoreProperties, OverseerModifyCollectionTest, TestExpandComponent, DocValuesMissingTest, ReplaceNodeTest, DistribCursorPagingTest, TestClassicSimilarityFactory, MoreLikeThisHandlerTest, TestSystemIdResolver, TestLegacyFieldCache, TestRTGBase, WordBreakSolrSpellCheckerTest, CheckHdfsIndexTest, HdfsDirectoryTest]
           [junit4] Completed [339/655 (1!)] on J11 in 44.08s, 4 tests, 1 error <<< FAILURES!
        
        Show
        steve_rowe Steve Rowe added a comment - Looks like the NPE in HdfsDirectoryTest.testEOF() is still happening: this reproducing master seed is from my Jenkins: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=HdfsDirectoryTest -Dtests.method=testEOF -Dtests.seed=FA1024A704DD72C3 -Dtests.slow=true -Dtests.locale=en-GB -Dtests.timezone=Africa/Johannesburg -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.11s J11 | HdfsDirectoryTest.testEOF <<< [junit4] > Throwable #1: java.lang.NullPointerException [junit4] > at __randomizedtesting.SeedInfo.seed([FA1024A704DD72C3:6B7B66AF46F9D4BF]:0) [junit4] > at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:69) [junit4] > at org.apache.solr.store.hdfs.HdfsDirectoryTest.testEof(HdfsDirectoryTest.java:158) [junit4] > at org.apache.solr.store.hdfs.HdfsDirectoryTest.testEOF(HdfsDirectoryTest.java:150) [...] [junit4] 2> 429308 ERROR (SUITE-HdfsDirectoryTest-seed#[FA1024A704DD72C3]-worker) [ ] o.a.h.m.l.MethodMetric Error invoking method getBlocksTotal [junit4] 2> java.lang.reflect.InvocationTargetException [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] 2> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] 2> at java.lang.reflect.Method.invoke(Method.java:498) [junit4] 2> at org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) [junit4] 2> at org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) [junit4] 2> at org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:401) [junit4] 2> at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:194) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassName(DefaultMBeanServerInterceptor.java:1804) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.safeGetClassName(DefaultMBeanServerInterceptor.java:1595) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanPermission(DefaultMBeanServerInterceptor.java:1813) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:430) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) [junit4] 2> at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) [junit4] 2> at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:81) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stopMBeans(MetricsSourceAdapter.java:226) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stop(MetricsSourceAdapter.java:211) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSources(MetricsSystemImpl.java:463) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:213) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:594) [junit4] 2> at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72) [junit4] 2> at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68) [junit4] 2> at org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics.shutdown(NameNodeMetrics.java:171) [junit4] 2> at org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:872) [junit4] 2> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1726) [junit4] 2> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1705) [junit4] 2> at org.apache.solr.cloud.hdfs.HdfsTestUtil.teardownClass(HdfsTestUtil.java:198) [junit4] 2> at org.apache.solr.store.hdfs.HdfsDirectoryTest.afterClass(HdfsDirectoryTest.java:65) [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] 2> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] 2> at java.lang.reflect.Method.invoke(Method.java:498) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:870) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) [junit4] 2> at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) [junit4] 2> at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: java.lang.NullPointerException [junit4] 2> at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:203) [junit4] 2> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getTotalBlocks(BlockManager.java:3370) [junit4] 2> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocksTotal(FSNamesystem.java:5729) [junit4] 2> ... 54 more [junit4] 2> 429312 INFO (SUITE-HdfsDirectoryTest-seed#[FA1024A704DD72C3]-worker) [ ] o.a.s.SolrTestCaseJ4 ###deleteCore [junit4] 2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/jobs/Lucene-Solr-tests-6.x/workspace/solr/build/solr-core/test/J11/temp/solr.store.hdfs.HdfsDirectoryTest_FA1024A704DD72C3-001 [junit4] 2> Nov 22, 2016 11:23:44 AM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks [junit4] 2> WARNING: Will linger awaiting termination of 130 leaked thread(s). [junit4] 2> NOTE: test params are: codec=Asserting(Lucene62), sim=RandomSimilarity(queryNorm=true,coord=yes): {}, locale=en-GB, timezone=Africa/Johannesburg [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=2,free=193596736,total=526385152 [junit4] 2> NOTE: All tests run in this JVM: [TestReversedWildcardFilterFactory, TestTestInjection, TestGraphMLResponseWriter, ReplicaListTransformerTest, UpdateRequestProcessorFactoryTest, TestMissingGroups, CursorPagingTest, LukeRequestHandlerTest, TestDistributedStatsComponentCardinality, TestRangeQuery, TestHdfsBackupRestoreCore, AtomicUpdatesTest, TestRealTimeGet, TestNumericTerms64, TestRandomFlRTGCloud, TestSolrCoreProperties, OverseerModifyCollectionTest, TestExpandComponent, DocValuesMissingTest, ReplaceNodeTest, DistribCursorPagingTest, TestClassicSimilarityFactory, MoreLikeThisHandlerTest, TestSystemIdResolver, TestLegacyFieldCache, TestRTGBase, WordBreakSolrSpellCheckerTest, CheckHdfsIndexTest, HdfsDirectoryTest] [junit4] Completed [339/655 (1!)] on J11 in 44.08s, 4 tests, 1 error <<< FAILURES!
        Hide
        steve_rowe Steve Rowe added a comment -

        A couple more "OOM: Direct buffer memory" failures today on Apache Jenkins:

        From https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1160/:

          [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=HdfsDirectoryFactoryTest -Dtests.method=testInitArgsOrSysPropConfig -Dtests.seed=200C6D6D2F8C2C5F -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-master/test-data/enwiki.random.lines.txt -Dtests.locale=zh-TW -Dtests.timezone=America/Argentina/Buenos_Aires -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
           [junit4] ERROR   1.30s J0 | HdfsDirectoryFactoryTest.testInitArgsOrSysPropConfig <<<
           [junit4]    > Throwable #1: java.lang.RuntimeException: The max direct memory is likely too low.  Either increase it (by adding -XX:MaxDirectMemorySize=<size>g -XX:+UseLargePages to your containers startup args) or disable direct allocation using solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you are putting the block cache on the heap, your java heap size might not be large enough. Failed allocating ~134.217728 MB.
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([200C6D6D2F8C2C5F:D7A3A446D205C674]:0)
           [junit4]    > 	at org.apache.solr.core.HdfsDirectoryFactory.createBlockCache(HdfsDirectoryFactory.java:304)
           [junit4]    > 	at org.apache.solr.core.HdfsDirectoryFactory.getBlockDirectoryCache(HdfsDirectoryFactory.java:280)
           [junit4]    > 	at org.apache.solr.core.HdfsDirectoryFactory.create(HdfsDirectoryFactory.java:220)
           [junit4]    > 	at org.apache.solr.core.HdfsDirectoryFactoryTest.testInitArgsOrSysPropConfig(HdfsDirectoryFactoryTest.java:108)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]    > Caused by: java.lang.OutOfMemoryError: Direct buffer memory
           [junit4]    > 	at java.nio.Bits.reserveMemory(Bits.java:693)
           [junit4]    > 	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
           [junit4]    > 	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
           [junit4]    > 	at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68)
           [junit4]    > 	at org.apache.solr.core.HdfsDirectoryFactory.createBlockCache(HdfsDirectoryFactory.java:302)
           [junit4]    > 	... 42 more
        [...]
           [junit4]   2> 415746 ERROR (SUITE-HdfsDirectoryFactoryTest-seed#[200C6D6D2F8C2C5F]-worker) [    ] o.a.h.m.l.MethodMetric Error invoking method getBlocksTotal
           [junit4]   2> java.lang.reflect.InvocationTargetException
           [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           [junit4]   2> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           [junit4]   2> 	at java.lang.reflect.Method.invoke(Method.java:498)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:401)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:194)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassName(DefaultMBeanServerInterceptor.java:1804)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.safeGetClassName(DefaultMBeanServerInterceptor.java:1595)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanPermission(DefaultMBeanServerInterceptor.java:1813)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:430)
           [junit4]   2> 	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
           [junit4]   2> 	at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
           [junit4]   2> 	at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:81)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stopMBeans(MetricsSourceAdapter.java:226)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stop(MetricsSourceAdapter.java:211)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSources(MetricsSystemImpl.java:463)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:213)
           [junit4]   2> 	at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:594)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72)
           [junit4]   2> 	at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68)
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics.shutdown(NameNodeMetrics.java:171)
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:872)
           [junit4]   2> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1726)
           [junit4]   2> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1705)
           [junit4]   2> 	at org.apache.solr.cloud.hdfs.HdfsTestUtil.teardownClass(HdfsTestUtil.java:198)
           [junit4]   2> 	at org.apache.solr.core.HdfsDirectoryFactoryTest.teardownClass(HdfsDirectoryFactoryTest.java:61)
           [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           [junit4]   2> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           [junit4]   2> 	at java.lang.reflect.Method.invoke(Method.java:498)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:870)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
           [junit4]   2> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
           [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
           [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
           [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> Caused by: java.lang.NullPointerException
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:203)
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getTotalBlocks(BlockManager.java:3370)
           [junit4]   2> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocksTotal(FSNamesystem.java:5729)
           [junit4]   2> 	... 54 more
        [...]
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=174, maxMBSortInHeap=6.915978870333232, sim=RandomSimilarity(queryNorm=false): {}, locale=zh-TW, timezone=America/Argentina/Buenos_Aires
           [junit4]   2> NOTE: Linux 3.13.0-85-generic amd64/Oracle Corporation 1.8.0_102 (64-bit)/cpus=4,threads=2,free=364870536,total=525860864
           [junit4]   2> NOTE: All tests run in this JVM: [TestFileDictionaryLookup, HdfsChaosMonkeySafeLeaderTest, DistributedDebugComponentTest, TestExactSharedStatsCache, TestLFUCache, TestFieldCacheSort, HdfsUnloadDistributedZkTest, TestLockTree, TestHighlightDedupGrouping, TestDFISimilarityFactory, SolrRequestParserTest, SyncSliceTest, CreateCollectionCleanupTest, HdfsDirectoryFactoryTest]
        

        From https://builds.apache.org/job/Lucene-Solr-NightlyTests-6.x/207:

           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BlockDirectoryTest -Dtests.method=testRandomAccessWritesLargeCache -Dtests.seed=85E88260B81B20E2 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-6.x/test-data/enwiki.random.lines.txt -Dtests.locale=id-ID -Dtests.timezone=Africa/Libreville -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
           [junit4] ERROR   1.64s J1 | BlockDirectoryTest.testRandomAccessWritesLargeCache <<<
           [junit4]    > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory
           [junit4]    > 	at java.nio.Bits.reserveMemory(Bits.java:693)
           [junit4]    > 	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
           [junit4]    > 	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
           [junit4]    > 	at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68)
           [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException
           [junit4]    > 	at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
        [...]
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62): {}, docValues:{}, maxPointsInLeafNode=1406, maxMBSortInHeap=7.589330986925872, sim=RandomSimilarity(queryNorm=false,coord=yes): {}, locale=id-ID, timezone=Africa/Libreville
           [junit4]   2> NOTE: Linux 3.13.0-85-generic amd64/Oracle Corporation 1.8.0_102 (64-bit)/cpus=4,threads=1,free=317294960,total=496500736
           [junit4]   2> NOTE: All tests run in this JVM: [WordBreakSolrSpellCheckerTest, DocumentBuilderTest, TestHashQParserPlugin, TestDynamicFieldResource, DateMathParserTest, CollectionsAPIDistributedZkTest, HdfsRecoveryZkTest, TestDocBasedVersionConstraints, TestCloudManagedSchema, SpellCheckCollatorTest, HdfsBasicDistributedZkTest, TestLuceneMatchVersion, SpatialFilterTest, CustomCollectionTest, TestUseDocValuesAsStored2, TestCharFilters, BlockDirectoryTest]
        
        Show
        steve_rowe Steve Rowe added a comment - A couple more "OOM: Direct buffer memory" failures today on Apache Jenkins: From https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1160/ : [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=HdfsDirectoryFactoryTest -Dtests.method=testInitArgsOrSysPropConfig -Dtests.seed=200C6D6D2F8C2C5F -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-master/test-data/enwiki.random.lines.txt -Dtests.locale=zh-TW -Dtests.timezone=America/Argentina/Buenos_Aires -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 1.30s J0 | HdfsDirectoryFactoryTest.testInitArgsOrSysPropConfig <<< [junit4] > Throwable #1: java.lang.RuntimeException: The max direct memory is likely too low. Either increase it (by adding -XX:MaxDirectMemorySize=<size>g -XX:+UseLargePages to your containers startup args) or disable direct allocation using solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you are putting the block cache on the heap, your java heap size might not be large enough. Failed allocating ~134.217728 MB. [junit4] > at __randomizedtesting.SeedInfo.seed([200C6D6D2F8C2C5F:D7A3A446D205C674]:0) [junit4] > at org.apache.solr.core.HdfsDirectoryFactory.createBlockCache(HdfsDirectoryFactory.java:304) [junit4] > at org.apache.solr.core.HdfsDirectoryFactory.getBlockDirectoryCache(HdfsDirectoryFactory.java:280) [junit4] > at org.apache.solr.core.HdfsDirectoryFactory.create(HdfsDirectoryFactory.java:220) [junit4] > at org.apache.solr.core.HdfsDirectoryFactoryTest.testInitArgsOrSysPropConfig(HdfsDirectoryFactoryTest.java:108) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] > Caused by: java.lang.OutOfMemoryError: Direct buffer memory [junit4] > at java.nio.Bits.reserveMemory(Bits.java:693) [junit4] > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) [junit4] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) [junit4] > at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68) [junit4] > at org.apache.solr.core.HdfsDirectoryFactory.createBlockCache(HdfsDirectoryFactory.java:302) [junit4] > ... 42 more [...] [junit4] 2> 415746 ERROR (SUITE-HdfsDirectoryFactoryTest-seed#[200C6D6D2F8C2C5F]-worker) [ ] o.a.h.m.l.MethodMetric Error invoking method getBlocksTotal [junit4] 2> java.lang.reflect.InvocationTargetException [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] 2> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] 2> at java.lang.reflect.Method.invoke(Method.java:498) [junit4] 2> at org.apache.hadoop.metrics2.lib.MethodMetric$2.snapshot(MethodMetric.java:111) [junit4] 2> at org.apache.hadoop.metrics2.lib.MethodMetric.snapshot(MethodMetric.java:144) [junit4] 2> at org.apache.hadoop.metrics2.lib.MetricsRegistry.snapshot(MetricsRegistry.java:401) [junit4] 2> at org.apache.hadoop.metrics2.lib.MetricsSourceBuilder$1.getMetrics(MetricsSourceBuilder.java:79) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:194) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:172) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:151) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassName(DefaultMBeanServerInterceptor.java:1804) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.safeGetClassName(DefaultMBeanServerInterceptor.java:1595) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanPermission(DefaultMBeanServerInterceptor.java:1813) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:430) [junit4] 2> at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) [junit4] 2> at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) [junit4] 2> at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:81) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stopMBeans(MetricsSourceAdapter.java:226) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stop(MetricsSourceAdapter.java:211) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSources(MetricsSystemImpl.java:463) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:213) [junit4] 2> at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.shutdown(MetricsSystemImpl.java:594) [junit4] 2> at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdownInstance(DefaultMetricsSystem.java:72) [junit4] 2> at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.shutdown(DefaultMetricsSystem.java:68) [junit4] 2> at org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics.shutdown(NameNodeMetrics.java:171) [junit4] 2> at org.apache.hadoop.hdfs.server.namenode.NameNode.stop(NameNode.java:872) [junit4] 2> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1726) [junit4] 2> at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1705) [junit4] 2> at org.apache.solr.cloud.hdfs.HdfsTestUtil.teardownClass(HdfsTestUtil.java:198) [junit4] 2> at org.apache.solr.core.HdfsDirectoryFactoryTest.teardownClass(HdfsDirectoryFactoryTest.java:61) [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] 2> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] 2> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] 2> at java.lang.reflect.Method.invoke(Method.java:498) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713) [junit4] 2> at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:870) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) [junit4] 2> at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) [junit4] 2> at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) [junit4] 2> at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) [junit4] 2> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4] 2> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: java.lang.NullPointerException [junit4] 2> at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.size(BlocksMap.java:203) [junit4] 2> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.getTotalBlocks(BlockManager.java:3370) [junit4] 2> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlocksTotal(FSNamesystem.java:5729) [junit4] 2> ... 54 more [...] [junit4] 2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=174, maxMBSortInHeap=6.915978870333232, sim=RandomSimilarity(queryNorm=false): {}, locale=zh-TW, timezone=America/Argentina/Buenos_Aires [junit4] 2> NOTE: Linux 3.13.0-85-generic amd64/Oracle Corporation 1.8.0_102 (64-bit)/cpus=4,threads=2,free=364870536,total=525860864 [junit4] 2> NOTE: All tests run in this JVM: [TestFileDictionaryLookup, HdfsChaosMonkeySafeLeaderTest, DistributedDebugComponentTest, TestExactSharedStatsCache, TestLFUCache, TestFieldCacheSort, HdfsUnloadDistributedZkTest, TestLockTree, TestHighlightDedupGrouping, TestDFISimilarityFactory, SolrRequestParserTest, SyncSliceTest, CreateCollectionCleanupTest, HdfsDirectoryFactoryTest] From https://builds.apache.org/job/Lucene-Solr-NightlyTests-6.x/207 : [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=BlockDirectoryTest -Dtests.method=testRandomAccessWritesLargeCache -Dtests.seed=85E88260B81B20E2 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene-Solr-NightlyTests-6.x/test-data/enwiki.random.lines.txt -Dtests.locale=id-ID -Dtests.timezone=Africa/Libreville -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 1.64s J1 | BlockDirectoryTest.testRandomAccessWritesLargeCache <<< [junit4] > Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory [junit4] > at java.nio.Bits.reserveMemory(Bits.java:693) [junit4] > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) [junit4] > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) [junit4] > at org.apache.solr.store.blockcache.BlockCache.<init>(BlockCache.java:68) [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119) [junit4] > at java.lang.Thread.run(Thread.java:745)Throwable #2: java.lang.NullPointerException [junit4] > at org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131) [...] [junit4] 2> NOTE: test params are: codec=Asserting(Lucene62): {}, docValues:{}, maxPointsInLeafNode=1406, maxMBSortInHeap=7.589330986925872, sim=RandomSimilarity(queryNorm=false,coord=yes): {}, locale=id-ID, timezone=Africa/Libreville [junit4] 2> NOTE: Linux 3.13.0-85-generic amd64/Oracle Corporation 1.8.0_102 (64-bit)/cpus=4,threads=1,free=317294960,total=496500736 [junit4] 2> NOTE: All tests run in this JVM: [WordBreakSolrSpellCheckerTest, DocumentBuilderTest, TestHashQParserPlugin, TestDynamicFieldResource, DateMathParserTest, CollectionsAPIDistributedZkTest, HdfsRecoveryZkTest, TestDocBasedVersionConstraints, TestCloudManagedSchema, SpellCheckCollatorTest, HdfsBasicDistributedZkTest, TestLuceneMatchVersion, SpatialFilterTest, CustomCollectionTest, TestUseDocValuesAsStored2, TestCharFilters, BlockDirectoryTest]
        Hide
        mdrob Mike Drob added a comment - - edited

        https://github.com/apache/lucene-solr/blob/5738c293f0c3f346b3e3e52c937183060d59cba1/solr/core/src/java/org/apache/solr/store/blockcache/BlockDirectoryCache.java#L53

            if (releaseBlocks) {
              keysToRelease = Collections.newSetFromMap(new ConcurrentHashMap<BlockCacheKey,Boolean>(1024, 0.75f, 512));
              blockCache.setOnRelease(new OnRelease() {
                
                @Override
                public void release(BlockCacheKey key) {
                  keysToRelease.remove(key);
                }
              });
            }
        

        If we're using the global block cache option and create multiple directories using the same factory, we will lose the release hook for the first directory. I think we can verify that by creating a server with multiple cores.

        Edit: Filed SOLR-10104

        Show
        mdrob Mike Drob added a comment - - edited https://github.com/apache/lucene-solr/blob/5738c293f0c3f346b3e3e52c937183060d59cba1/solr/core/src/java/org/apache/solr/store/blockcache/BlockDirectoryCache.java#L53 if (releaseBlocks) { keysToRelease = Collections.newSetFromMap( new ConcurrentHashMap<BlockCacheKey, Boolean >(1024, 0.75f, 512)); blockCache.setOnRelease( new OnRelease() { @Override public void release(BlockCacheKey key) { keysToRelease.remove(key); } }); } If we're using the global block cache option and create multiple directories using the same factory, we will lose the release hook for the first directory. I think we can verify that by creating a server with multiple cores. Edit: Filed SOLR-10104

          People

          • Assignee:
            markrmiller@gmail.com Mark Miller
            Reporter:
            markrmiller@gmail.com Mark Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development