Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6891 Hadoop 2 unit test failures
  3. HBASE-8349

TestLogRolling#TestLogRollOnDatanodeDeath hangs under hadoop2 profile

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.98.0, 0.95.0
    • 0.98.0, 0.95.1
    • hadoop2
    • None
    • Reviewed

    Description

      TestLogRolling has been hanging – after a data node is killed the client attempts to recover a lease and fails forever. (This example ran for a while and shows recovery attempt 541888).

      2013-04-15 16:37:49,074 INFO  [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(72): Attempt 541888 to recoverLease on file hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta returned false, trying for 2642865ms
      2013-04-15 16:37:49,075 ERROR [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(86): Can't recoverLease after 541888 attempts and 2642866ms  for hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta - continuing without the lease, but we could have a data loss.
      2013-04-15 16:37:49,075 INFO  [IPC Server handler 9 on 41333] namenode.FSNamesystem(1957): recoverLease: recover lease [Lease.  Holder: DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091, pendingcreates: 1], src=/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta from client DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091
      2013-04-15 16:37:49,075 INFO  [IPC Server handler 9 on 41333] namenode.FSNamesystem(2981): Recovering lease=[Lease.  Holder: DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091, pendingcreates: 1], src=/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta
      2013-04-15 16:37:49,078 WARN  [IPC Server handler 9 on 41333] namenode.FSNamesystem(3096): DIR* NameSystem.internalReleaseLease: File /user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta has not been closed. Lease recovery is in progress. RecoveryId = 543317 for block blk_7636447875270454121_1019{blockUCState=UNDER_RECOVERY, primaryNodeIndex=1, replicas=[ReplicaUnderConstruction[127.0.0.1:38288|RBW], ReplicaUnderConstruction[127.0.0.1:35956|RWR]]}
      2013-04-15 16:37:49,078 INFO  [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(72): Attempt 541889 to recoverLease on file hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta returned false, trying for 2642869ms
      2013-04-15 16:37:49,079 ERROR [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(86): Can't recoverLease after 541889 attempts and 2642870ms  for hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta - continuing without the lease, but we could have a data loss.
      2013-04-15 16:37:49,079 INFO  [IPC Server handler 4 on 41333] namenode.FSNamesystem(1957): recoverLease: recover lease [Lease.  Holder: DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091, pendingcreates: 1], src=/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta from client DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091
      2013-04-15 16:37:49,079 INFO  [IPC Server handler 4 on 41333] namenode.FSNamesystem(2981): Recovering lease=[Lease.  Holder: DFSClient_hb_rs_localhost,39898,1366065830907_1890639591_1091, pendingcreates: 1], src=/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta
      2013-04-15 16:37:49,082 WARN  [IPC Server handler 4 on 41333] namenode.FSNamesystem(3096): DIR* NameSystem.internalReleaseLease: File /user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta has not been closed. Lease recovery is in progress. RecoveryId = 543318 for block blk_7636447875270454121_1019{blockUCState=UNDER_RECOVERY, primaryNodeIndex=1, replicas=[ReplicaUnderConstruction[127.0.0.1:38288|RBW], ReplicaUnderConstruction[127.0.0.1:35956|RWR]]}
      2013-04-15 16:37:49,083 INFO  [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(72): Attempt 541890 to recoverLease on file hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta returned false, trying for 2642874ms
      2013-04-15 16:37:49,083 ERROR [SplitLogWorker-localhost,39898,1366065830907] util.FSHDFSUtils(86): Can't recoverLease after 541890 attempts and 2642874ms  for hdfs://localhost:41333/user/jon/hbase/.logs/localhost,41341,1366065830879-splitting/localhost%2C41341%2C1366065830879.1366065836654.meta - continuing without the lease, but we could have a data loss.
      

      It initially starts with permissions errors similar to HBASE-7636. For now we disable in this test and will address with a fix in HBASE-8337 with an assist from HDFS folks.

      Attachments

        1. hbase-8349.patch
          1 kB
          Jonathan Hsieh

        Issue Links

          Activity

            People

              jmhsieh Jonathan Hsieh
              jmhsieh Jonathan Hsieh
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: