Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2180

Bad random read performance from synchronizing hfile.fddatainputstream

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.4
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      deep in the HFile read path, there is this code:

      synchronized (in)

      { in.seek(pos); ret = in.read(b, off, n); }

      this makes it so that only 1 read per file per thread is active. this prevents the OS and hardware from being able to do IO scheduling by optimizing lots of concurrent reads.

      We need to either use a reentrant API (pread may be partially reentrant according to Todd) or use multiple stream objects, 1 per scanner/thread.

        Attachments

        1. 2180.patch
          12 kB
          stack
        2. 2180-v2.patch
          23 kB
          stack

          Issue Links

            Activity

              People

              • Assignee:
                stack stack
                Reporter:
                ryanobjc ryan rawson
              • Votes:
                1 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: