Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-198

org.apache.hadoop.dfs.LeaseExpiredException during dfs write

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: hdfs-client, namenode
    • Labels:
      None

      Description

      Many long running cpu intensive map tasks failed due to org.apache.hadoop.dfs.LeaseExpiredException.
      See a comment below for the exceptions from the log:

        Activity

        Hide
        sukhendu chakraborty added a comment -

        I am seeing the lease not expired error for a partitioned hive tables in CDH 4.5 MR1. I have a similar usecase as Sujesh above, I am using dynamic date partitioning for a year (365 partitions), but have 1B rows (300GB of data for that year). I also want to cluster the data in each partition into 32 buckets.

        Here is part of the error trace:
        3:58:18.531 PM ERROR org.apache.hadoop.hdfs.DFSClient
        Failed to close file /tmp/hive-user/hive_2014-01-29_15-33-51_510_4099525102053071439/_task_tmp.-ext-10000/trn_dt=20090531/_tmp.000012_0
        org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive-user/hive_2014-01-29_15-33-51_510_4099525102053071439/task_tmp.-ext-10000/trn_dt=20090531/_tmp.000012_0: File does not exist. Holder DFSClient_NONMAPREDUCE-1745484980_1 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2543)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2535)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2601)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2578)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:556)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44958)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

        at org.apache.hadoop.ipc.Client.call(Client.java:1238)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy10.complete(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.complete(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:330)
        at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1796)
        at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1783)
        at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:709)
        at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:726)
        at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:561)
        at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2399)
        at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2415)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

        Show
        sukhendu chakraborty added a comment - I am seeing the lease not expired error for a partitioned hive tables in CDH 4.5 MR1. I have a similar usecase as Sujesh above, I am using dynamic date partitioning for a year (365 partitions), but have 1B rows (300GB of data for that year). I also want to cluster the data in each partition into 32 buckets. Here is part of the error trace: 3:58:18.531 PM ERROR org.apache.hadoop.hdfs.DFSClient Failed to close file /tmp/hive-user/hive_2014-01-29_15-33-51_510_4099525102053071439/_task_tmp.-ext-10000/trn_dt=20090531/_tmp.000012_0 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive-user/hive_2014-01-29_15-33-51_510_4099525102053071439/ task_tmp.-ext-10000/trn_dt=20090531/_tmp.000012_0: File does not exist. Holder DFSClient_NONMAPREDUCE -1745484980_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2543) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2601) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2578) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:556) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:337) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44958) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746) at org.apache.hadoop.ipc.Client.call(Client.java:1238) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy10.complete(Unknown Source) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy10.complete(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:330) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1796) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1783) at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:709) at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:726) at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:561) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2399) at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2415) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
        Harsh J made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Not A Problem [ 8 ]
        Hide
        Harsh J added a comment -

        This one has gone very stale and we have not seen any properly true reports of lease renewals going amiss during long waiting tasks recently. Marking as 'Not a Problem' (anymore). If there's a proper new report of this behaviour, please lets file a new JIRA with the newer data.

        Dhanasekaran Anbalagan - Your problem is pretty different from what OP appears to have reported in an older version. Your problem arises out of MR tasks not utilising an attempt ID based directory (which Hive appears to do sometimes), in which case two different running attempts (out of speculative exec. or otherwise) can cause one of them to run into this error as a result of the file overwrite. Best to investigate further on a mailing list rather than here.

        Show
        Harsh J added a comment - This one has gone very stale and we have not seen any properly true reports of lease renewals going amiss during long waiting tasks recently. Marking as 'Not a Problem' (anymore). If there's a proper new report of this behaviour, please lets file a new JIRA with the newer data. Dhanasekaran Anbalagan - Your problem is pretty different from what OP appears to have reported in an older version. Your problem arises out of MR tasks not utilising an attempt ID based directory (which Hive appears to do sometimes), in which case two different running attempts (out of speculative exec. or otherwise) can cause one of them to run into this error as a result of the file overwrite. Best to investigate further on a mailing list rather than here.
        Hide
        Dhanasekaran Anbalagan added a comment -

        Hi All,

        getting same error. on hive External table, I am using hive-common-0.10.0-cdh4.4.0.

        In my case. we are using sqoop to import data with table. table stored data in rc file format. I am only facing issue with external table.

        4/01/08 12:21:40 INFO mapred.JobClient: Task Id : attempt_201312121801_0049_m_000000_0, Status : FAILED
        org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /dv_data_warehouse/dv_eod_performance_report/DYN0.337789259996055/trade_date=HIVE_DEFAULT_PARTITION/client=HIVE_DEFAULT_PARTITION/install=HIVE_DEFAULT_PARTITION_/temporary/_attempt_201312121801_0049_m_000000_0/part-m-00000: File is not open for writing. Holder DFSClient_NONMAPREDUCE-794488327_1 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
        at org.apache.hadoop.hdfs.protocol.pro
        attempt_201312121801_0049_m_000000_0: SLF4J: Class path contains multiple SLF4J bindings.
        attempt_201312121801_0049_m_000000_0: SLF4J: Found binding in [jar:file:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-simple-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
        attempt_201312121801_0049_m_000000_0: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
        attempt_201312121801_0049_m_000000_0: SLF4J: Found binding in [jar:file:/disk1/mapred/local/taskTracker/tech/distcache/-6782344428220505463_-433811577_1927241260/nameservice1/user/tech/.staging/job_201312121801_0049/libjars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
        attempt_201312121801_0049_m_000000_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
        14/01/08 12:21:55 INFO mapred.JobClient: Task Id : attempt_201312121801_0049_m_000000_1, Status : FAILED
        org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /dv_data_warehouse/dv_eod_performance_report/DYN0.337789259996055/trade_date=HIVE_DEFAULT_PARTITION/client=HIVE_DEFAULT_PARTITION/install=HIVE_DEFAULT_PARTITION_/temporary/_attempt_201312121801_0049_m_000000_1/part-m-00000: File is not open for writing. Holder DFSClient_NONMAPREDUCE-390991563_1 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
        at org.apache.hadoop.hdfs.protocol.pro
        attempt_201312121801_0049_m_000000_1: SLF4J: Class path contains multiple SLF4J bindings.
        attempt_201312121801_0049_m_000000_1: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
        attempt_201312121801_0049_m_000000_1: SLF4J: Found binding in [jar:file:/disk1/mapred/local/taskTracker/tech/distcache/7281954290425601736_-433811577_1927241260/nameservice1/user/tech/.staging/job_201312121801_0049/libjars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
        attempt_201312121801_0049_m_000000_1: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
        14/01/08 12:22:12 INFO mapred.JobClient: Task Id : attempt_201312121801_0049_m_000000_2, Status : FAILED
        org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /dv_data_warehouse/dv_eod_performance_report/DYN0.337789259996055/trade_date=HIVE_DEFAULT_PARTITION/client=HIVE_DEFAULT_PARTITION/install=HIVE_DEFAULT_PARTITION_/_temporary/_attempt_201312121801_0049_m_000000_2/part-m-00000: File is not open for writing. Holder DFSClient_NONMAPREDUCE_1338126902_1 does not have any open files.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
        at org.apache.hadoop.hdfs.protocol.pro
        attempt_201312121801_0049_m_000000_2: SLF4J: Class path contains multiple SLF4J bindings.

        Show
        Dhanasekaran Anbalagan added a comment - Hi All, getting same error. on hive External table, I am using hive-common-0.10.0-cdh4.4.0. In my case. we are using sqoop to import data with table. table stored data in rc file format. I am only facing issue with external table. 4/01/08 12:21:40 INFO mapred.JobClient: Task Id : attempt_201312121801_0049_m_000000_0, Status : FAILED org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /dv_data_warehouse/dv_eod_performance_report/ DYN0.337789259996055/trade_date= HIVE_DEFAULT_PARTITION /client= HIVE_DEFAULT_PARTITION /install= HIVE_DEFAULT_PARTITION _/ temporary/_attempt_201312121801_0049_m_000000_0/part-m-00000: File is not open for writing. Holder DFSClient_NONMAPREDUCE -794488327_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.pro attempt_201312121801_0049_m_000000_0: SLF4J: Class path contains multiple SLF4J bindings. attempt_201312121801_0049_m_000000_0: SLF4J: Found binding in [jar:file:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-simple-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_000000_0: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_000000_0: SLF4J: Found binding in [jar:file:/disk1/mapred/local/taskTracker/tech/distcache/-6782344428220505463_-433811577_1927241260/nameservice1/user/tech/.staging/job_201312121801_0049/libjars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_000000_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 14/01/08 12:21:55 INFO mapred.JobClient: Task Id : attempt_201312121801_0049_m_000000_1, Status : FAILED org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /dv_data_warehouse/dv_eod_performance_report/ DYN0.337789259996055/trade_date= HIVE_DEFAULT_PARTITION /client= HIVE_DEFAULT_PARTITION /install= HIVE_DEFAULT_PARTITION _/ temporary/_attempt_201312121801_0049_m_000000_1/part-m-00000: File is not open for writing. Holder DFSClient_NONMAPREDUCE -390991563_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.pro attempt_201312121801_0049_m_000000_1: SLF4J: Class path contains multiple SLF4J bindings. attempt_201312121801_0049_m_000000_1: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_000000_1: SLF4J: Found binding in [jar:file:/disk1/mapred/local/taskTracker/tech/distcache/7281954290425601736_-433811577_1927241260/nameservice1/user/tech/.staging/job_201312121801_0049/libjars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] attempt_201312121801_0049_m_000000_1: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 14/01/08 12:22:12 INFO mapred.JobClient: Task Id : attempt_201312121801_0049_m_000000_2, Status : FAILED org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /dv_data_warehouse/dv_eod_performance_report/ DYN0.337789259996055/trade_date= HIVE_DEFAULT_PARTITION /client= HIVE_DEFAULT_PARTITION /install= HIVE_DEFAULT_PARTITION _/_temporary/_attempt_201312121801_0049_m_000000_2/part-m-00000: File is not open for writing. Holder DFSClient_NONMAPREDUCE_1338126902_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2452) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2262) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2175) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.pro attempt_201312121801_0049_m_000000_2: SLF4J: Class path contains multiple SLF4J bindings.
        Hide
        Sujesh Chirackkal added a comment -

        Getting the same error in CDH 4,Hive 0.8.1 while running query with lot of dynamic partitions. Setting the number of reducers to a larger value helps me to resolve the issue for now.

        Show
        Sujesh Chirackkal added a comment - Getting the same error in CDH 4,Hive 0.8.1 while running query with lot of dynamic partitions. Setting the number of reducers to a larger value helps me to resolve the issue for now.
        Hide
        Eli Collins added a comment -

        Yes, what I meant earlier was that you can still see this issue because loaded TT's that don't respond for over 10 minutes are thought to be dead and therefore their leases expire. Perhaps work similar to HDFS-599 needs to be done on the TT/JT?

        Show
        Eli Collins added a comment - Yes, what I meant earlier was that you can still see this issue because loaded TT's that don't respond for over 10 minutes are thought to be dead and therefore their leases expire. Perhaps work similar to HDFS-599 needs to be done on the TT/JT?
        Hide
        Hairong Kuang added a comment -

        Eli, a zombie TT causes lease expiration. This is expected, right?

        Show
        Hairong Kuang added a comment - Eli, a zombie TT causes lease expiration. This is expected, right?
        Hide
        Eli Collins added a comment -

        A zombie TT can't result in lease expiration?

        Show
        Eli Collins added a comment - A zombie TT can't result in lease expiration?
        Hide
        Hairong Kuang added a comment -

        org.apache.hadoop.dfs.LeaseExpiredException occurred in 0.18 might have been caused by this bug: https://issues.apache.org/jira/browse/HADOOP-6498. The lease renewal thread might hang on lease renew RPC.

        Eli's description did not convince me. By default, a hard lease limit is 1 hour. 10 minutes delay to renew lease won't cause a lease to expire.

        Show
        Hairong Kuang added a comment - org.apache.hadoop.dfs.LeaseExpiredException occurred in 0.18 might have been caused by this bug: https://issues.apache.org/jira/browse/HADOOP-6498 . The lease renewal thread might hang on lease renew RPC. Eli's description did not convince me. By default, a hard lease limit is 1 hour. 10 minutes delay to renew lease won't cause a lease to expire.
        Tsz Wo Nicholas Sze made changes -
        Description Many long running cpu intensive map tasks failed due to org.apache.hadoop.dfs.LeaseExpiredException.
        Here is except from the log:

        2008-10-26 11:54:17,282 INFO org.apache.hadoop.dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.LeaseExpiredException: No lease on /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 File does not exist. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 38 31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, heldlocks: 0, pendingcreates: 1]
        at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194)
        at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

        at org.apache.hadoop.ipc.Client.call(Client.java:557)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)

        2008-10-26 11:54:17,282 WARN org.apache.hadoop.dfs.DFSClient: NotReplicatedYetException sleeping /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 retries left 2
        2008-10-26 11:54:18,886 INFO org.apache.hadoop.dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.LeaseExpiredException: No lease on /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 File does not exist. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 38 31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, heldlocks: 0, pendingcreates: 1]
        at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194)
        at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

        at org.apache.hadoop.ipc.Client.call(Client.java:557)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)

        2008-10-26 11:54:18,886 WARN org.apache.hadoop.dfs.DFSClient: NotReplicatedYetException sleeping /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 retries left 1
        2008-10-26 11:54:22,090 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.LeaseExpiredException: No lease on /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 File does not exist. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 38 31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, heldlocks: 0, pendingcreates: 1]
        at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194)
        at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

        at org.apache.hadoop.ipc.Client.call(Client.java:557)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)

        2008-10-26 11:54:22,090 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
        2008-10-26 11:54:22,219 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
        java.io.IOException: Could not get block locations. Aborting...
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2081)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
        Many long running cpu intensive map tasks failed due to org.apache.hadoop.dfs.LeaseExpiredException.
        See [a comment below|https://issues.apache.org/jira/browse/HDFS-198?focusedCommentId=12910298&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12910298] for the exceptions from the log:

        Component/s hdfs client [ 12312928 ]
        Component/s name-node [ 12312926 ]
        Tsz Wo Nicholas Sze made changes -
        Resolution Not A Problem [ 8 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        Tsz Wo Nicholas Sze added a comment -

        If this is still a problem, let's reopen it. Thanks Dave and Eli for pointing it out.

        Copying the stack traces from the original description in order to make it shorter..

        2008-10-26 11:54:17,282 INFO org.apache.hadoop.dfs.DFSClient: org.apache.hadoop.ipc.RemoteException:
         org.apache.hadoop.dfs.LeaseExpiredException:
         No lease on /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033
         File does not exist. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 38
         31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, heldlocks: 0, pendingcreates: 1]
        at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194)
        at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
        
        at org.apache.hadoop.ipc.Client.call(Client.java:557)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)
        
        2008-10-26 11:54:17,282 WARN org.apache.hadoop.dfs.DFSClient: NotReplicatedYetException
         sleeping /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 retries left 2
        2008-10-26 11:54:18,886 INFO org.apache.hadoop.dfs.DFSClient: org.apache.hadoop.ipc.RemoteException:
         org.apache.hadoop.dfs.LeaseExpiredException:
         No lease on /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033
         File does not exist. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 38
         31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, heldlocks: 0, pendingcreates: 1]
        at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194)
        at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
        
        at org.apache.hadoop.ipc.Client.call(Client.java:557)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)
        
        2008-10-26 11:54:18,886 WARN org.apache.hadoop.dfs.DFSClient: NotReplicatedYetException
         sleeping /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 retries left 1
        2008-10-26 11:54:22,090 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception:
         org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.LeaseExpiredException:
         No lease on /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033
         File does not exist. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 38
         31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, heldlocks: 0, pendingcreates: 1]
        at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194)
        at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125)
        at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
        
        at org.apache.hadoop.ipc.Client.call(Client.java:557)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)
        
        2008-10-26 11:54:22,090 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
        2008-10-26 11:54:22,219 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
        java.io.IOException: Could not get block locations. Aborting...
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2081)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
        
        Show
        Tsz Wo Nicholas Sze added a comment - If this is still a problem, let's reopen it. Thanks Dave and Eli for pointing it out. Copying the stack traces from the original description in order to make it shorter.. 2008-10-26 11:54:17,282 INFO org.apache.hadoop.dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.LeaseExpiredException: No lease on /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 File does not exist. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 38 31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, heldlocks: 0, pendingcreates: 1] at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194) at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) at org.apache.hadoop.ipc.Client.call(Client.java:557) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842) 2008-10-26 11:54:17,282 WARN org.apache.hadoop.dfs.DFSClient: NotReplicatedYetException sleeping /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 retries left 2 2008-10-26 11:54:18,886 INFO org.apache.hadoop.dfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.LeaseExpiredException: No lease on /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 File does not exist. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 38 31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, heldlocks: 0, pendingcreates: 1] at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194) at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) at org.apache.hadoop.ipc.Client.call(Client.java:557) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842) 2008-10-26 11:54:18,886 WARN org.apache.hadoop.dfs.DFSClient: NotReplicatedYetException sleeping /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 retries left 1 2008-10-26 11:54:22,090 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.LeaseExpiredException: No lease on /xxx/_temporary/_task_200810232126_0001_m_000033_0/part-00033 File does not exist. [Lease. Holder: 44 46 53 43 6c 69 65 6e 74 5f 74 61 73 6b 5f 32 30 30 38 31 30 32 33 32 31 32 36 5f 30 30 30 31 5f 6d 5f 30 30 30 30 33 33 5f 30, heldlocks: 0, pendingcreates: 1] at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1194) at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1125) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) at org.apache.hadoop.ipc.Client.call(Client.java:557) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212) at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy1.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842) 2008-10-26 11:54:22,090 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0] 2008-10-26 11:54:22,219 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2081) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
        Hide
        Eli Collins added a comment -

        Not sure this issue is actually stale. I think it can still occur if a host running a TT gets heavily loaded, it can take a while between when the task is killed and when it actually dies so a file can disappear out from under from the client. I think this can also be hit if a single key takes so long to sort and merge that the task doesn't check in for over 10 minutes.

        Show
        Eli Collins added a comment - Not sure this issue is actually stale. I think it can still occur if a host running a TT gets heavily loaded, it can take a while between when the task is killed and when it actually dies so a file can disappear out from under from the client. I think this can also be hit if a single key takes so long to sort and merge that the task doesn't check in for over 10 minutes.
        Hide
        Dave Brondsema added a comment -

        I'm getting this error, with a new job we're working on (hive query, with lots of dynamic partitions). We're using Hadoop 0.18.3-14.cloudera.CH0_3 Do you think a newer version will fix this?

        Show
        Dave Brondsema added a comment - I'm getting this error, with a new job we're working on (hive query, with lots of dynamic partitions). We're using Hadoop 0.18.3-14.cloudera.CH0_3 Do you think a newer version will fix this?
        Tsz Wo Nicholas Sze made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Not A Problem [ 8 ]
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I believe this issue went stale. Closing.

        Show
        Tsz Wo Nicholas Sze added a comment - I believe this issue went stale. Closing.
        Owen O'Malley made changes -
        Project Hadoop Common [ 12310240 ] HDFS [ 12310942 ]
        Key HADOOP-4524 HDFS-198
        Affects Version/s 0.19.0 [ 12313211 ]
        Affects Version/s 0.17.2 [ 12313296 ]
        Component/s dfs [ 12310710 ]
        Rusty Burchfield made changes -
        Field Original Value New Value
        Affects Version/s 0.19.0 [ 12313211 ]
        Hide
        Chris K Wensel added a comment -

        Has a workaround for this been found?

        Show
        Chris K Wensel added a comment - Has a workaround for this been found?
        Runping Qi created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Runping Qi
          • Votes:
            4 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development