Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8335

HdfsLockFactory does not allow core to come up after a node was killed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 5.0, 5.1, 5.2, 5.2.1, 5.3, 5.3.1
    • None
    • Hadoop Integration, hdfs
    • None

    Description

      When using HdfsLockFactory if a node gets killed instead of a graceful shutdown the write.lock file remains in HDFS . The next time you start the node the core doesn't load up because of LockObtainFailedException .

      I was able to reproduce this in all 5.x versions of Solr . The problem wasn't there when I tested it in 4.10.4

      Steps to reproduce this on 5.x

      1. Create directory in HDFS : bin/hdfs dfs -mkdir /solr
      2. Start Solr: bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.data.dir=hdfs://localhost:9000/solr -Dsolr.updatelog=hdfs://localhost:9000/solr
      3. Create core: ./bin/solr create -c test -n data_driven
      4. Kill solr
      5. The lock file is there in HDFS and is called write.lock
      6. Start Solr again and you get a stack trace like this:

      2015-11-23 13:28:04.287 ERROR (coreLoadExecutor-6-thread-1) [   x:test] o.a.s.c.CoreContainer Error creating core [test]: Index locked for write for core 'test'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
      org.apache.solr.common.SolrException: Index locked for write for core 'test'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
              at org.apache.solr.core.SolrCore.<init>(SolrCore.java:820)
              at org.apache.solr.core.SolrCore.<init>(SolrCore.java:659)
              at org.apache.solr.core.CoreContainer.create(CoreContainer.java:723)
              at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:443)
              at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:434)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked for write for core 'test'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
              at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528)
              at org.apache.solr.core.SolrCore.<init>(SolrCore.java:761)
              ... 9 more
      2015-11-23 13:28:04.289 ERROR (coreContainerWorkExecutor-2-thread-1) [   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be created
      java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [test]
              at java.util.concurrent.FutureTask.report(FutureTask.java:122)
              at java.util.concurrent.FutureTask.get(FutureTask.java:192)
              at org.apache.solr.core.CoreContainer$2.run(CoreContainer.java:472)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.solr.common.SolrException: Unable to create core [test]
              at org.apache.solr.core.CoreContainer.create(CoreContainer.java:737)
              at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:443)
              at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:434)
              ... 5 more
      Caused by: org.apache.solr.common.SolrException: Index locked for write for core 'test'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
              at org.apache.solr.core.SolrCore.<init>(SolrCore.java:820)
              at org.apache.solr.core.SolrCore.<init>(SolrCore.java:659)
              at org.apache.solr.core.CoreContainer.create(CoreContainer.java:723)
              ... 7 more
      Caused by: org.apache.lucene.store.LockObtainFailedException: Index locked for write for core 'test'. Solr now longer supports forceful unlocking via 'unlockOnStartup'. Please verify locks manually!
              at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:528)
              at org.apache.solr.core.SolrCore.<init>(SolrCore.java:761)
              ... 9 more
      

      In 4.10.4 I saw these two differences

      1. The lock file name was different . It's something like : /solr/index/HdfsDirectory@46ad6bd3 lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@4b44b5f6-write.lock
      2. When the node is started again after it was killed , it loaded up the core just fine but there were two lock files in hdfs now . 4b44b5f6-write.lock is the latest one

      /solr/index/HdfsDirectory@46ad6bd3 lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@4b44b5f6-write.lock
      /solr/index/HdfsDirectory@52959724 lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@9d59d3f-write.lock
      

      Attachments

        1. SOLR-8335.patch
          82 kB
          Mihaly Toth

        Issue Links

          Activity

            People

              markrmiller@gmail.com Mark Miller
              varun Varun Thacker
              Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h