Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7113

Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      I notice this issue while trying to do some heavy indexing into Solr. (700K docs per minute)

      Solr log errors

      15:42:47
      ERROR
      HdfsTransactionLog
      Exception closing tlog.
      java.io.IOException: Filesystem closed
      	at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:765)
      	at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1898)
      	at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1859)
      	at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
      	at org.apache.solr.update.HdfsTransactionLog.close(HdfsTransactionLog.java:303)
      	at org.apache.solr.update.TransactionLog.decref(TransactionLog.java:504)
      	at org.apache.solr.update.UpdateLog.addOldLog(UpdateLog.java:335)
      	at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:628)
      	at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:600)
      	at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      15:42:47
      ERROR
      CommitTracker
      auto commit error...:org.apache.solr.common.SolrException: java.io.IOException: Filesystem closed
      auto commit error...:org.apache.solr.common.SolrException: java.io.IOException: Filesystem closed
      

        Issue Links

          Activity

          Hide
          mbm Matthew Byng-Maddick added a comment -

          I'm very confused about this. We're seeing that tlogs get held open (and in particular hold open datanode transceivers) on HDFS Solr:

          Using the github version of the commit (because I know how to link to it): https://github.com/apache/lucene-solr/commit/f2c9067e59b81b3dea7903315431babcd2506167#diff-c796f1f2f2f362c18bd89a85688fbebfR295 we see the following lines:

          tlog = ntlog
          
          if (tlog != ntlog) {
          

          When is that if condition ever not true? What was this if condition supposed to do? This does appear one part of a reasonable explanation as to why the old rotated tlogs are being held open by the solr HDFS client.

          Show
          mbm Matthew Byng-Maddick added a comment - I'm very confused about this. We're seeing that tlogs get held open (and in particular hold open datanode transceivers) on HDFS Solr: Using the github version of the commit (because I know how to link to it): https://github.com/apache/lucene-solr/commit/f2c9067e59b81b3dea7903315431babcd2506167#diff-c796f1f2f2f362c18bd89a85688fbebfR295 we see the following lines: tlog = ntlog if (tlog != ntlog) { When is that if condition ever not true? What was this if condition supposed to do? This does appear one part of a reasonable explanation as to why the old rotated tlogs are being held open by the solr HDFS client.
          Hide
          thelabdude Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          thelabdude Timothy Potter added a comment - Bulk close after 5.1 release
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1662330 from Mark Miller in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1662330 ]

          SOLR-7113: Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1662330 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1662330 ] SOLR-7113 : Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1662324 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1662324 ]

          SOLR-7113: Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1662324 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1662324 ] SOLR-7113 : Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I'm going to add an annotation to ignore resource close checks for this to work around SOLR-7115.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I'm going to add an annotation to ignore resource close checks for this to work around SOLR-7115 .
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          A quick first patch.

          Show
          markrmiller@gmail.com Mark Miller added a comment - A quick first patch.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Thanks Vamsee - I have a test and patch for this.

          We are kind of jumping hoops to try and support the tlog location changing on a new call to init. This is not even something we need or want to support.

          So rather than try and deal with multiple Filesystem instances here (which would require some sort of reference counting at this point), we can simply fix the code to not accept a location change.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Thanks Vamsee - I have a test and patch for this. We are kind of jumping hoops to try and support the tlog location changing on a new call to init. This is not even something we need or want to support. So rather than try and deal with multiple Filesystem instances here (which would require some sort of reference counting at this point), we can simply fix the code to not accept a location change.

            People

            • Assignee:
              markrmiller@gmail.com Mark Miller
              Reporter:
              vamsee Vamsee Yarlagadda
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development