[HDFS-9146] HDFS forward seek() within a block shouldn't spawn new TCP Peer/RemoteBlockReader - ASF JIRA

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.6.0, 2.8.0, 2.7.1, 3.0.0
Fix Version/s: None
Component/s: hdfs-client
Labels:
None

Description

When a seek() + forward readFully() is triggered from a remote dfsclient, HDFS opens a new remote block reader even if the seek is within the same HDFS block.

(analysis from Rajesh Balamohan)

This is due to the fact that a simple read operation assumes that the user is going to read till the end of the block.

      try {
        blockReader = getBlockReader(targetBlock, offsetIntoBlock,
            targetBlock.getBlockSize() - offsetIntoBlock, targetAddr,
            storageType, chosenNode);

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L624

Since the user hasn't read till the end of the block when the next seek happens, the BlockReader assumes this is an aborted read and tries to throw away the TCP peer it has got.

https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java#L324

    // If we've now satisfied the whole client read, read one last packet
    // header, which should be empty
    if (bytesNeededToFinish <= 0) {
      readTrailingEmptyPacket(); 
     ...
          sendReadResult(Status.SUCCESS);

Since that is not satisfied, the status code is unset & the peer is not returned to the cache.

    if (peerCache != null && sentStatusCode) {
      peerCache.put(datanodeID, peer);
    } else {
      peer.close();
    }

Attachments

Issue Links

Add Link

is related to

HDFS-6607 Improve DFSInputStream forward seek performance

Open

Delete this link

relates to

HADOOP-15292 Distcp's use of pread is slowing it down.

Resolved

Delete this link

HIVE-11945 ORC with non-local reads may not be reusing connection to DN

Closed

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 25/Sep/15 19:41

Updated:: 06/Mar/18 03:00

Agile

View on Board

HDFS forward seek() within a block shouldn't spawn new TCP Peer/RemoteBlockReader

Details

Description

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment