Solr
  1. Solr
  2. SOLR-7092

Stop the HDFS lease recovery retries on HdfsTransactionLog on close and try to avoid lease recovery on closed files.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    1. SOLR-7092.patch
      6 kB
      Mark Miller
    2. SOLR-7092.patch
      4 kB
      Mark Miller

      Issue Links

        Activity

        Hide
        Mark Miller added a comment - - edited

        Patch to stop retrying if the log is closed. Previously would retry for like 15 min even after close.

        Show
        Mark Miller added a comment - - edited Patch to stop retrying if the log is closed. Previously would retry for like 15 min even after close.
        Hide
        Mark Miller added a comment -

        Patch that seems to behave better.

        Show
        Mark Miller added a comment - Patch that seems to behave better.
        Hide
        Mark Miller added a comment -

        There still appears to be an internal HDFS lease renewer that I don't see how to stop, but with the latest patch, I've had good results so far.

         [junit4]   2> 504053 T130 oahh.LeaseRenewer.run WARN Failed to renew lease for [DFSClient_NONMAPREDUCE_-1506464199_13] for 429 seconds.  Will retry shortly ... java.net.ConnectException: Call From totalmetal/127.0.1.1 to localhost:43747 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
           [junit4]   2> 	at sun.reflect.GeneratedConstructorAccessor179.newInstance(Unknown Source)
           [junit4]   2> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           [junit4]   2> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
           [junit4]   2> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
           [junit4]   2> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
           [junit4]   2> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
           [junit4]   2> 	at org.apache.hadoop.ipc.Client.call(Client.java:1359)
           [junit4]   2> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
           [junit4]   2> 	at com.sun.proxy.$Proxy39.renewLease(Unknown Source)
           [junit4]   2> 	at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
           [junit4]   2> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           [junit4]   2> 	at java.lang.reflect.Method.invoke(Method.java:497)
           [junit4]   2> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
           [junit4]   2> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
           [junit4]   2> 	at com.sun.proxy.$Proxy39.renewLease(Unknown Source)
           [junit4]   2> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:519)
           [junit4]   2> 	at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:773)
           [junit4]   2> 	at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
           [junit4]   2> 	at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
           [junit4]   2> 	at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
           [junit4]   2> 	at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
           [junit4]   2> 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> Caused by: java.net.ConnectException: Connection refused
        Show
        Mark Miller added a comment - There still appears to be an internal HDFS lease renewer that I don't see how to stop, but with the latest patch, I've had good results so far. [junit4] 2> 504053 T130 oahh.LeaseRenewer.run WARN Failed to renew lease for [DFSClient_NONMAPREDUCE_-1506464199_13] for 429 seconds. Will retry shortly ... java.net.ConnectException: Call From totalmetal/127.0.1.1 to localhost:43747 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused [junit4] 2> at sun.reflect.GeneratedConstructorAccessor179.newInstance(Unknown Source) [junit4] 2> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [junit4] 2> at java.lang.reflect.Constructor.newInstance(Constructor.java:422) [junit4] 2> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) [junit4] 2> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) [junit4] 2> at org.apache.hadoop.ipc.Client.call(Client.java:1410) [junit4] 2> at org.apache.hadoop.ipc.Client.call(Client.java:1359) [junit4] 2> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) [junit4] 2> at com.sun.proxy.$Proxy39.renewLease(Unknown Source) [junit4] 2> at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source) [junit4] 2> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] 2> at java.lang.reflect.Method.invoke(Method.java:497) [junit4] 2> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) [junit4] 2> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) [junit4] 2> at com.sun.proxy.$Proxy39.renewLease(Unknown Source) [junit4] 2> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:519) [junit4] 2> at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:773) [junit4] 2> at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417) [junit4] 2> at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442) [junit4] 2> at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) [junit4] 2> at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: java.net.ConnectException: Connection refused
        Hide
        ASF subversion and git services added a comment -

        Commit 1668311 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1668311 ]

        SOLR-7092: Stop the HDFS lease recovery retries in HdfsTransactionLog on close and try to avoid lease recovery on closed files.

        Show
        ASF subversion and git services added a comment - Commit 1668311 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1668311 ] SOLR-7092 : Stop the HDFS lease recovery retries in HdfsTransactionLog on close and try to avoid lease recovery on closed files.
        Hide
        ASF subversion and git services added a comment -

        Commit 1668313 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1668313 ]

        SOLR-7092: Stop the HDFS lease recovery retries in HdfsTransactionLog on close and try to avoid lease recovery on closed files.

        Show
        ASF subversion and git services added a comment - Commit 1668313 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1668313 ] SOLR-7092 : Stop the HDFS lease recovery retries in HdfsTransactionLog on close and try to avoid lease recovery on closed files.
        Hide
        Mark Miller added a comment -

        I'm still looking into this area of the code, but that should improve some of the current stdout / stderr errors that have grown more common.

        Show
        Mark Miller added a comment - I'm still looking into this area of the code, but that should improve some of the current stdout / stderr errors that have grown more common.
        Hide
        ASF subversion and git services added a comment -

        Commit 1668412 from Mark Miller in branch 'dev/trunk'
        [ https://svn.apache.org/r1668412 ]

        SOLR-7092: Do a little better at clean up in new test code.

        Show
        ASF subversion and git services added a comment - Commit 1668412 from Mark Miller in branch 'dev/trunk' [ https://svn.apache.org/r1668412 ] SOLR-7092 : Do a little better at clean up in new test code.
        Hide
        ASF subversion and git services added a comment -

        Commit 1668862 from Mark Miller in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1668862 ]

        SOLR-7092: Do a little better at clean up in new test code.

        Show
        ASF subversion and git services added a comment - Commit 1668862 from Mark Miller in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1668862 ] SOLR-7092 : Do a little better at clean up in new test code.
        Hide
        Mark Miller added a comment -

        This should be fixed now. I've improved the 'thread leak' handling situation with SOLR-7289 as well so that it's more difficult for this to sneak by.

        Show
        Mark Miller added a comment - This should be fixed now. I've improved the 'thread leak' handling situation with SOLR-7289 as well so that it's more difficult for this to sneak by.
        Hide
        Mark Miller added a comment -

        and try to avoid lease recovery on closed files.

        This was added to this issue to reduce how often we were starting up lease recovery stuff when we did not need to.

        It was also something I wanted to do because I had seen a situation where recovery of leases on closed files was taking a very long time and causing odd issues with Solr. So I rolled in this change.

        It seems that sometimes HDFS does not like it when you make a bunch of lease recovery calls (hopefully just because they are all on files that don't even need it). Usually, trying to recover a lease on a closed file just returns - but on larger indexes something problematic seems to happen instead. I've got more to find out in that area, but this improves the situation.

        Show
        Mark Miller added a comment - and try to avoid lease recovery on closed files. This was added to this issue to reduce how often we were starting up lease recovery stuff when we did not need to. It was also something I wanted to do because I had seen a situation where recovery of leases on closed files was taking a very long time and causing odd issues with Solr. So I rolled in this change. It seems that sometimes HDFS does not like it when you make a bunch of lease recovery calls (hopefully just because they are all on files that don't even need it). Usually, trying to recover a lease on a closed file just returns - but on larger indexes something problematic seems to happen instead. I've got more to find out in that area, but this improves the situation.
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            Mark Miller
            Reporter:
            Mark Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development