Solr
  1. Solr
  2. SOLR-7113

Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      I notice this issue while trying to do some heavy indexing into Solr. (700K docs per minute)

      Solr log errors

      15:42:47
      ERROR
      HdfsTransactionLog
      Exception closing tlog.
      java.io.IOException: Filesystem closed
      	at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:765)
      	at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1898)
      	at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1859)
      	at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
      	at org.apache.solr.update.HdfsTransactionLog.close(HdfsTransactionLog.java:303)
      	at org.apache.solr.update.TransactionLog.decref(TransactionLog.java:504)
      	at org.apache.solr.update.UpdateLog.addOldLog(UpdateLog.java:335)
      	at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:628)
      	at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:600)
      	at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      15:42:47
      ERROR
      CommitTracker
      auto commit error...:org.apache.solr.common.SolrException: java.io.IOException: Filesystem closed
      auto commit error...:org.apache.solr.common.SolrException: java.io.IOException: Filesystem closed
      

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          Thanks Vamsee - I have a test and patch for this.

          We are kind of jumping hoops to try and support the tlog location changing on a new call to init. This is not even something we need or want to support.

          So rather than try and deal with multiple Filesystem instances here (which would require some sort of reference counting at this point), we can simply fix the code to not accept a location change.

          Show
          Mark Miller added a comment - Thanks Vamsee - I have a test and patch for this. We are kind of jumping hoops to try and support the tlog location changing on a new call to init. This is not even something we need or want to support. So rather than try and deal with multiple Filesystem instances here (which would require some sort of reference counting at this point), we can simply fix the code to not accept a location change.
          Hide
          Mark Miller added a comment -

          A quick first patch.

          Show
          Mark Miller added a comment - A quick first patch.
          Hide
          Mark Miller added a comment -

          I'm going to add an annotation to ignore resource close checks for this to work around SOLR-7115.

          Show
          Mark Miller added a comment - I'm going to add an annotation to ignore resource close checks for this to work around SOLR-7115 .
          Hide
          ASF subversion and git services added a comment -

          Commit 1662324 from Mark Miller in branch 'dev/trunk'
          [ https://svn.apache.org/r1662324 ]

          SOLR-7113: Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.

          Show
          ASF subversion and git services added a comment - Commit 1662324 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1662324 ] SOLR-7113 : Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.
          Hide
          ASF subversion and git services added a comment -

          Commit 1662330 from Mark Miller in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1662330 ]

          SOLR-7113: Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.

          Show
          ASF subversion and git services added a comment - Commit 1662330 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1662330 ] SOLR-7113 : Multiple calls to UpdateLog#init is not thread safe with respect to the HDFS FileSystem client object usage.
          Hide
          Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          Timothy Potter added a comment - Bulk close after 5.1 release
          Hide
          Matthew Byng-Maddick added a comment -

          I'm very confused about this. We're seeing that tlogs get held open (and in particular hold open datanode transceivers) on HDFS Solr:

          Using the github version of the commit (because I know how to link to it): https://github.com/apache/lucene-solr/commit/f2c9067e59b81b3dea7903315431babcd2506167#diff-c796f1f2f2f362c18bd89a85688fbebfR295 we see the following lines:

          tlog = ntlog
          
          if (tlog != ntlog) {
          

          When is that if condition ever not true? What was this if condition supposed to do? This does appear one part of a reasonable explanation as to why the old rotated tlogs are being held open by the solr HDFS client.

          Show
          Matthew Byng-Maddick added a comment - I'm very confused about this. We're seeing that tlogs get held open (and in particular hold open datanode transceivers) on HDFS Solr: Using the github version of the commit (because I know how to link to it): https://github.com/apache/lucene-solr/commit/f2c9067e59b81b3dea7903315431babcd2506167#diff-c796f1f2f2f362c18bd89a85688fbebfR295 we see the following lines: tlog = ntlog if (tlog != ntlog) { When is that if condition ever not true? What was this if condition supposed to do? This does appear one part of a reasonable explanation as to why the old rotated tlogs are being held open by the solr HDFS client.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Vamsee Yarlagadda
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development