Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14251

Shard Split on HDFS

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 8.4
    • Fix Version/s: None
    • Component/s: hdfs
    • Labels:
      None

      Description

      Shard Split on HDFS Index will evaluate local disc space instead of HDFS space

      When performing a shard split on an index that is stored on HDFS the SplitShardCmd however evaluates the free disc space on the local file system of the server where Solr is installed.

      SplitShardCmd assumes that its main phase (when the Lucene index is being split) always executes on the local file system of the shard leader; and indeed the ShardSplitCmd.checkDiskSpace() checks the local file system's free disk space - even though the actual data is written to the HDFS Directory so it (almost) doesn't affect the local FS (except for core.properties file).

      See also: https://lucene.472066.n3.nabble.com/HDFS-Shard-Split-td4449920.html

      My setup to reproduce the issue:

      • Solr deployed on Openshift with local disc of about 5GB
      • HDFS configuration based on solrconfig.xml with
      <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
          <str name="solr.hdfs.home">hdfs://path/to/index/</str>
      ...
      
      • Split command:
      .../admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1&async=1234
      • Response:
      {
        "responseHeader":{"status":0,"QTime":32},
        "Operation splitshard caused exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: not enough free disk space to perform index split on node <solr instance>:8983_solr, required: 294.64909074269235, available: 5.4632568359375",
        "exception":{
          "msg":"not enough free disk space to perform index split on node <solr instance>:8983_solr, required: 294.64909074269235, available: 5.4632568359375",
          "rspCode":500},
        "status":{"state":"failed","msg":"found [1234] in failed tasks"}
      }
      

       

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                johannes.brucher@db.com Johannes Brucher
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m