Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4936

Handle overflow condition for txid going over Long.MAX_VALUE

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Not a Problem
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      Hat tip to Fengdong Yu for the question that lead to this (on mailing lists).

      I hacked up my local NN's txids manually to go very large (close to max) and decided to try out if this causes any harm. I basically bumped up the freshly formatted files' starting txid to 9223372036854775805 (and ensured image references the same by hex-editing it):

      ➜  current  ls
      VERSION
      fsimage_9223372036854775805.md5
      fsimage_9223372036854775805
      seen_txid
      ➜  current  cat seen_txid
      9223372036854775805
      

      NameNode started up as expected.

      13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 seconds.
      13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 9223372036854775805 from /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
      13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 9223372036854775806
      

      I could create a bunch of files and do regular ops (counting to much after the long max increments). I created over 10 files, just to make it go well over the Long.MAX_VALUE.

      Quitting NameNode and restarting fails though, with the following error:

      13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized segments in /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
      13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806 -> /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
      13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
      java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 9223372036854775806 but unable to find any edit logs containing txid -9223372036854775808
      	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
      	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
      	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
      	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
      	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:609)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:590)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
      

      Looks like we also lose some edits when we restart, as noted by the finalized edits filename:

      VERSION
      edits_9223372036854775806-9223372036854775807
      fsimage_9223372036854775805
      fsimage_9223372036854775805.md5
      seen_txid
      

      It seems like we won't be able to handle the case where txid overflows. Its a very very large number so that's not an immediate concern but seemed worthy of a report.

        Activity

        Hide
        Harsh J added a comment -

        Expected this response. Resolving.

        (From Todd Lipcon over hdfs-dev@)

        I did some back of the envelope math when implementing txids, and
        determined that overflow is not ever going to happen... A "busy" namenode
        does 1000 write transactions/second (2^10). MAX_LONG is 2^63. So, we can
        run for 2^63 seconds. A year is about 2^25 seconds. So, at 1k tps, you can
        run your namenode for 2^(63-10-25) = 268 million years.
        
        Hadoop is great software and I'm sure it will be around for years to come,
        but if it's still running in 268 million years, that will be a pretty
        depressing rate of technological progress!
        
        -Todd
        
        Show
        Harsh J added a comment - Expected this response. Resolving. (From Todd Lipcon over hdfs-dev@) I did some back of the envelope math when implementing txids, and determined that overflow is not ever going to happen... A "busy" namenode does 1000 write transactions/second (2^10). MAX_LONG is 2^63. So, we can run for 2^63 seconds. A year is about 2^25 seconds. So, at 1k tps, you can run your namenode for 2^(63-10-25) = 268 million years. Hadoop is great software and I'm sure it will be around for years to come, but if it's still running in 268 million years, that will be a pretty depressing rate of technological progress! -Todd

          People

          • Assignee:
            Unassigned
            Reporter:
            Harsh J
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development