[HIVE-3992] Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.11.0
Component/s: None
Labels:
None
Environment:

Ubuntu x86_64/java-1.6/hadoop-2.0.3

Release Note:
Rely on previous sync-points when syncing within the same RCFile and avoid unnecessary I/O
Tags:
rcfile hive

Description

The following function does some bad I/O

public synchronized void sync(long position) throws IOException {
  ...
      try {
        seek(position + 4); // skip escape
        in.readFully(syncCheck);
        int syncLen = sync.length;
        for (int i = 0; in.getPos() < end; i++) {
          int j = 0;
          for (; j < syncLen; j++) {
            if (sync[j] != syncCheck[(i + j) % syncLen]) {
              break;
            }
          }
          if (j == syncLen) {
            in.seek(in.getPos() - SYNC_SIZE); // position before
            // sync
            return;
          }
          syncCheck[i % syncLen] = in.readByte();
        }
      }
...
    }

This causes a rather large number of readByte() calls which are passed onto a ByteBuffer via a single byte array.

This results in rather a large amount of CPU being burnt in a the linear search for the sync pattern in the input RCFile (upto 92% for a skewed example - a trivial map-join + limit 100).

This behaviour should be avoided at best or at least replaced by a rolling hash for efficient comparison, since it has a known byte-width of 16 bytes.

Attached the stack trace from a Yourkit profile.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-3992.3.patch
07/Apr/13 11:34
4 kB
Gopal Vijayaraghavan
HIVE-3992.2.patch
04/Apr/13 13:55
3 kB
Gopal Vijayaraghavan
HIVE-3992.patch
07/Feb/13 10:32
3 kB
Gopal Vijayaraghavan
select-join-limit.html
06/Feb/13 21:24
146 kB
Gopal Vijayaraghavan

Issue Links

relates to

HIVE-4423 Improve RCFile::sync(long) 10x

Closed

Activity

People

Assignee:: Gopal Vijayaraghavan

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Feb/13 21:23

Updated:: 16/May/13 21:11

Resolved:: 08/Apr/13 04:42