Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14373

HDFS block cache allows overallocation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 4.10
    • None
    • hdfs
    • None

    Description

      For the HDFS block cache, when we allocate more slabs the direct memory available, the error message seems to be hidden.

      In such cases The HdfsDirectoryFactory throws an OutOfMemoryError, which seems to be caught in the HdfsDirectoryFactory itself and thrown as a RuntimeException:

       try {
            blockCache = new BlockCache(metrics, directAllocation, totalMemory, slabSize, blockSize);
          } catch (OutOfMemoryError e) {
            throw new RuntimeException(
                "The max direct memory is likely too low.  Either increase it (by adding -XX:MaxDirectMemorySize=<size>g -XX:+UseLargePages to your containers startup args)"
                    + " or disable direct allocation using solr.hdfs.blockcache.direct.memory.allocation=false in solrconfig.xml. If you are putting the block cache on the heap,"
                    + " your java heap size might not be large enough."
                    + " Failed allocating ~" + totalMemory / 1000000.0 + " MB.",
                e);
          }
      

      Which will manifest as a NullPointerException during core load.

      2020-02-24 06:50:23,492 ERROR (coreLoadExecutor-5-thread-8)-c: collection1-s:shard2-r:core_node2-x: collection1_shard2_replica1-o.a.s.c.SolrCore: Error while closing
      java.lang.NullPointerException
              at org.apache.solr.core.SolrCore.close(SolrCore.java:1352)
              at org.apache.solr.core.SolrCore.<init>(SolrCore.java:967)
      

      When directAllocation is true, the directoryFactory has an approximation of the memory to be allocated.

      2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Number of slabs of block cache [16384] with direct memory allocation set to [true]
      2020-02-24 06:49:53,153 INFO (coreLoadExecutor-5-thread-8)-c:collection1-s:shard2-r:core_node2-x:collection1_shard2_replica1-o.a.s.c.HdfsDirectoryFactory: Block cache target memory usage, slab size of [134217728] will allocate [16384] slabs and use ~[2199023255552] bytes
      

      This is detected on Solr 4.10 but it seems that it also affects current versions, I will double check.

      Plan to resolve:

      • correct logging and throwable instance checking so it does not manifest in a nullpointerexception during core load
      • add a detection which checks if the memory to be allocated is higher than the available direct memory. If yes, fall back to a smaller slab count and log a warning message.

      Attachments

        Activity

          People

            Unassigned Unassigned
            warper Istvan Farkas
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: