Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
-
Speed up RCFile::sync() by searching with a larger buffer window
Description
RCFile::sync(long) takes approx ~1 second everytime it gets called because of the inner loops in the function.
From what was observed with HDFS-4710, single byte reads are an order of magnitude slower than larger 512 byte buffer reads.
Even when disk I/O is buffered to this size, there is overhead due to the synchronized read() methods in BlockReaderLocal & RemoteBlockReader classes.
Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte) call will speed this function >10x.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-3992 Hive RCFile::sync(long) does a sub-sequence linear search for sync blocks
- Closed