[HIVE-16671] LLAP IO: BufferUnderflowException may happen in very rare(?) cases due to ORC end-of-CB estimation - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.4.0, 3.0.0
Component/s: None
Labels:
None

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-16671.patch
16/May/17 00:27
0.9 kB
Sergey Shelukhin
HIVE-16671.01.patch
16/May/17 01:17
4 kB
Sergey Shelukhin
HIVE-16671.02.patch
18/May/17 21:37
9 kB
Sergey Shelukhin

Activity

Descending order - Click to sort in ascending order

Vineet Garg added a comment - 22/May/18 23:59

Hive 3.0.0 has been released so closing this jira.

Vineet Garg added a comment - 22/May/18 23:59 Hive 3.0.0 has been released so closing this jira.

Sergey Shelukhin added a comment - 20/May/17 01:19

Thanks, committed to branches

Sergey Shelukhin added a comment - 20/May/17 01:19 Thanks, committed to branches

Hive QA added a comment - 18/May/17 23:06

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12868825/HIVE-16671.02.patch

SUCCESS: +1 due to 1 test(s) being added or modified.

ERROR: -1 due to 5 failed/errored test(s), 10732 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=236)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] (batchId=236)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] (batchId=97)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=97)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query24] (batchId=231)

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5334/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5334/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5334/

Messages:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed

This message is automatically generated.

ATTACHMENT ID: 12868825 - PreCommit-HIVE-Build

Hive QA added a comment - 18/May/17 23:06 Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12868825/HIVE-16671.02.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 5 failed/errored test(s), 10732 tests executed Failed tests: org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=236) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] (batchId=236) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] (batchId=97) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=97) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query24] (batchId=231) Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5334/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5334/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5334/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12868825 - PreCommit-HIVE-Build

Prasanth Jayachandran added a comment - 18/May/17 21:42

Prasanth Jayachandran added a comment - 18/May/17 21:42 +1

Sergey Shelukhin added a comment - 18/May/17 21:37

Updated the method, added the test.

Sergey Shelukhin added a comment - 18/May/17 21:37 Updated the method, added the test.

Sergey Shelukhin added a comment - 18/May/17 20:21

"next" in this case is the current element that we are reading. WIll refactor to make it more clear

Sergey Shelukhin added a comment - 18/May/17 20:21 "next" in this case is the current element that we are reading. WIll refactor to make it more clear

Prasanth Jayachandran added a comment - 18/May/17 08:39

if (ix == 3) break; // Done, we have 3 bytes.

After this why are we moving on to next DiskRangeList? DiskRangeList will not be 1 or 2 byte long right? If we can't read complete header in current DiskRangeList, worst case it will be in next contiguous DiskRangeList. Am I missing something here?

Will be good to make it a separate method with tests (fake InStreams) if possible.

Prasanth Jayachandran added a comment - 18/May/17 08:39 if (ix == 3) break ; // Done, we have 3 bytes. After this why are we moving on to next DiskRangeList? DiskRangeList will not be 1 or 2 byte long right? If we can't read complete header in current DiskRangeList, worst case it will be in next contiguous DiskRangeList. Am I missing something here? Will be good to make it a separate method with tests (fake InStreams) if possible.

Sergey Shelukhin added a comment - 18/May/17 00:56

prasanth_j ping?

Sergey Shelukhin added a comment - 18/May/17 00:56 prasanth_j ping?

Sergey Shelukhin added a comment - 16/May/17 21:41

No, ix is across multiple buffers. So, if we have 3 buffers of one bytes, we'd increment ix by 1 in each. Unless I'm missing something.

Sergey Shelukhin added a comment - 16/May/17 21:41 No, ix is across multiple buffers. So, if we have 3 buffers of one bytes, we'd increment ix by 1 in each. Unless I'm missing something.

Prasanth Jayachandran added a comment - 16/May/17 21:35 - edited

ix = readLengthBytes(compressed, bytes, ix);
if (ix == 3) break; // Done, we have 3 bytes.

reset ix=0 inside the while loop? ix could be non-zero value outside while.

Prasanth Jayachandran added a comment - 16/May/17 21:35 - edited ix = readLengthBytes(compressed, bytes, ix); if (ix == 3) break ; // Done, we have 3 bytes. reset ix=0 inside the while loop? ix could be non-zero value outside while.

Hive QA added a comment - 16/May/17 04:06

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12868200/HIVE-16671.01.patch

ERROR: -1 due to no test(s) being added or modified.

ERROR: -1 due to 3 failed/errored test(s), 10717 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] (batchId=81)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable] (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=144)

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5273/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5273/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5273/

Messages:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed

This message is automatically generated.

ATTACHMENT ID: 12868200 - PreCommit-HIVE-Build

Hive QA added a comment - 16/May/17 04:06 Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12868200/HIVE-16671.01.patch ERROR: -1 due to no test(s) being added or modified. ERROR: -1 due to 3 failed/errored test(s), 10717 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] (batchId=81) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=144) Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5273/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5273/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5273/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12868200 - PreCommit-HIVE-Build

Sergey Shelukhin added a comment - 16/May/17 02:33

prasanth_j btw, the first patch fixed the original issue. For that case, the new patch would behave like the original. The corner case to the corner case, where 3 bytes of length are present over multiple buffers, is handled in 02 patch. Unfortunately we don't have repro for that and it might be impossible (perhaps if length falls on ZCR boundary with ZCR enabled?)

Sergey Shelukhin added a comment - 16/May/17 02:33 prasanth_j btw, the first patch fixed the original issue. For that case, the new patch would behave like the original. The corner case to the corner case, where 3 bytes of length are present over multiple buffers, is handled in 02 patch. Unfortunately we don't have repro for that and it might be impossible (perhaps if length falls on ZCR boundary with ZCR enabled?)

Sergey Shelukhin added a comment - 16/May/17 01:17

A patch to handle all corner cases..

Sergey Shelukhin added a comment - 16/May/17 01:17 A patch to handle all corner cases..

Sergey Shelukhin added a comment - 16/May/17 00:45

The 3-byte check is related to the 3-byte CB header. I dbl checked, we hit BufferUnderflow (no next byte) on the 3rd get.
I don't think it's related to the file header, that would have happened much earlier. I may have more details tomorrow about the failure.

Sergey Shelukhin added a comment - 16/May/17 00:45 The 3-byte check is related to the 3-byte CB header. I dbl checked, we hit BufferUnderflow (no next byte) on the 3rd get. I don't think it's related to the file header, that would have happened much earlier. I may have more details tomorrow about the failure.

Prasanth Jayachandran added a comment - 16/May/17 00:36

the 3 bytes check makes me think if this is somehow related to splits starting at 0 vs 3?
When BI split strategy is chosen, entire file/block could become a split? Say if a file is 1000 bytes. Split offset will be 0 and length will be 1000.
Whereas for the same file, if ETL split strategy is chosen, split offset will be 3 and length will be 997. First 3 bytes are ignored as that is part of ORC magic header.

Do you have a repro for this issue? If so could you check the split boundaries to make sure if this is the case.

Prasanth Jayachandran added a comment - 16/May/17 00:36 the 3 bytes check makes me think if this is somehow related to splits starting at 0 vs 3? When BI split strategy is chosen, entire file/block could become a split? Say if a file is 1000 bytes. Split offset will be 0 and length will be 1000. Whereas for the same file, if ETL split strategy is chosen, split offset will be 3 and length will be 997. First 3 bytes are ignored as that is part of ORC magic header. Do you have a repro for this issue? If so could you check the split boundaries to make sure if this is the case.

Sergey Shelukhin added a comment - 16/May/17 00:32

Actually, I guess this patch should be more involved, if there's a contiguous chunk with more byes, however unlikely that is

Sergey Shelukhin added a comment - 16/May/17 00:32 Actually, I guess this patch should be more involved, if there's a contiguous chunk with more byes, however unlikely that is

Sergey Shelukhin added a comment - 16/May/17 00:27 - edited

prasanth_j can you take a look? and also perhaps comment if you have input on ORC estimates. This is basically the same logic as just below where we find out that BufferChunk-s from disk contain a cut off CB, after finding the CB length. What we see is a stable repro of BufferUnderflow in one of the 3 get-s. So I'm assuming that ORC estimate can overshoot a CB boundary by exactly 1-2 bytes, causing the failure to read length; that needs to be handled similarly to a partial CB. Does this make sense?

Sergey Shelukhin added a comment - 16/May/17 00:27 - edited prasanth_j can you take a look? and also perhaps comment if you have input on ORC estimates. This is basically the same logic as just below where we find out that BufferChunk-s from disk contain a cut off CB, after finding the CB length. What we see is a stable repro of BufferUnderflow in one of the 3 get-s. So I'm assuming that ORC estimate can overshoot a CB boundary by exactly 1-2 bytes, causing the failure to read length; that needs to be handled similarly to a partial CB. Does this make sense?

People

Assignee:: Sergey Shelukhin

Reporter:: Ravi Mutyala

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 16/May/17 00:22

Updated:: 22/May/18 23:59

Resolved:: 20/May/17 01:19