[HDFS-4710] SCR should honor dfs.client.read.shortcircuit.buffer.size even when checksums are off - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.0.4-alpha
Fix Version/s: None
Component/s: hdfs-client
Labels:
- perfomance
Environment:

Centos (EC2) + short-circuit reads on

Description

When short-circuit reads are on, HDFS client slows down when checksums are turned off.

With checksums on, the query takes 45.341 seconds and with it turned off, it takes 56.345 seconds. This is slower than the speeds observed when short-circuiting is turned off.

The issue seems to be that FSDataInputStream.readByte() calls are directly transferred to the disk fd when the checksums are turned off.

Even though all the columns are integers, the data being read will be read via DataInputStream which does

public final int readInt() throws IOException {
        int ch1 = in.read();
        int ch2 = in.read();
        int ch3 = in.read();
        int ch4 = in.read();

To confirm, an strace of the Yarn container shows

26690 read(154, "B", 1)                 = 1
26690 read(154, "\250", 1)              = 1
26690 read(154, ".", 1)                 = 1
26690 read(154, "\24", 1)               = 1

To emulate this without the entirety of Hive code, I have written a simpler test app

https://github.com/t3rmin4t0r/shortcircuit-reader

The jar will read a file in -bs <n> sized buffers. Running it with 1 byte blocks gives similar results to the Hive test run.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-4710.002.patch
27/Jun/13 17:40
41 kB
Colin McCabe
HDFS-4710.001.patch
20/Jun/13 18:42
41 kB
Colin McCabe

Issue Links

blocks

HDFS-4922 Improve the short-circuit document

Open

duplicates

HDFS-5634 allow BlockReaderLocal to switch between checksumming and not

Closed

is depended upon by

HDFS-4922 Improve the short-circuit document

Open

is related to

HDFS-5634 allow BlockReaderLocal to switch between checksumming and not

Closed

relates to

HDFS-4960 Unnecessary .meta seeks even when skip checksum is true

Patch Available

Activity

People

Assignee:: Colin McCabe

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 18/Apr/13 08:47

Updated:: 17/Dec/13 20:58

Resolved:: 17/Dec/13 20:58