Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4710

SCR should honor dfs.client.read.shortcircuit.buffer.size even when checksums are off

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.4-alpha
    • None
    • hdfs-client
    • Centos (EC2) + short-circuit reads on

    Description

      When short-circuit reads are on, HDFS client slows down when checksums are turned off.

      With checksums on, the query takes 45.341 seconds and with it turned off, it takes 56.345 seconds. This is slower than the speeds observed when short-circuiting is turned off.

      The issue seems to be that FSDataInputStream.readByte() calls are directly transferred to the disk fd when the checksums are turned off.

      Even though all the columns are integers, the data being read will be read via DataInputStream which does

      public final int readInt() throws IOException {
              int ch1 = in.read();
              int ch2 = in.read();
              int ch3 = in.read();
              int ch4 = in.read();
      

      To confirm, an strace of the Yarn container shows

      26690 read(154, "B", 1)                 = 1
      26690 read(154, "\250", 1)              = 1
      26690 read(154, ".", 1)                 = 1
      26690 read(154, "\24", 1)               = 1
      

      To emulate this without the entirety of Hive code, I have written a simpler test app

      https://github.com/t3rmin4t0r/shortcircuit-reader

      The jar will read a file in -bs <n> sized buffers. Running it with 1 byte blocks gives similar results to the Hive test run.

      Attachments

        1. HDFS-4710.001.patch
          41 kB
          Colin McCabe
        2. HDFS-4710.002.patch
          41 kB
          Colin McCabe

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cmccabe Colin McCabe
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment