Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3914

checksumOk implementation in DFSClient can break applications

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.17.1
    • 0.18.2
    • None
    • None
    • Reviewed

    Description

      One of our non-map-reduce applications (written in C and using libhdfs to access dfs) stopped working after switch from 0.16 to 0.17.
      The problem was finally traced down to failures in checksumOk.

      I would assume, the purpose of checksumOk is for a DfsClient to indicate to a sending Datanode that the checksum of the received block is okay. This must be useful in the replication pipeline.
      How checksumOk is implemented is that any IOException is caught and ignored, probably because it is not essential for the client that the message is successful.

      But it proved fatal for our application because this application links in a 3rd-party library which seems to catch socket exceptions before libhdfs.

      Why was there an Exception? In our case the application reads a file into the local buffer of the DFSInputStream large enough to hold all data, the application reads to the end and the checksumOK is sent successfully at that time. But then the application does some other stuff and comes back to re-read the file (still open). It is then when it calls checksumOk again and crashes.

      It can easily be avoided by adding a Boolean making sure that checksumOk is called exactly once when EOS is encountered. Redundant calls to checksumOk do not seem to make sense anyhow.

      Attachments

        1. checksumOk1-br18.patch
          1 kB
          Hairong Kuang
        2. checksumOk1.patch
          1 kB
          Hairong Kuang
        3. checksumOk.patch
          1 kB
          Hairong Kuang
        4. patch.HADOOP-3914
          1 kB
          Christian Kunz

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ckunz Christian Kunz
            ckunz Christian Kunz
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment