Attachments
Attachments
- HIVE-16671.patch
- 0.9 kB
- Sergey Shelukhin
- HIVE-16671.01.patch
- 4 kB
- Sergey Shelukhin
- HIVE-16671.02.patch
- 9 kB
- Sergey Shelukhin
Activity
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12868825/HIVE-16671.02.patch
SUCCESS: +1 due to 1 test(s) being added or modified.
ERROR: -1 due to 5 failed/errored test(s), 10732 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=236) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] (batchId=236) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] (batchId=97) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=97) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query24] (batchId=231)
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5334/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5334/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5334/
Messages:
Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed
This message is automatically generated.
ATTACHMENT ID: 12868825 - PreCommit-HIVE-Build
"next" in this case is the current element that we are reading. WIll refactor to make it more clear
if (ix == 3) break; // Done, we have 3 bytes.
After this why are we moving on to next DiskRangeList? DiskRangeList will not be 1 or 2 byte long right? If we can't read complete header in current DiskRangeList, worst case it will be in next contiguous DiskRangeList. Am I missing something here?
Will be good to make it a separate method with tests (fake InStreams) if possible.
No, ix is across multiple buffers. So, if we have 3 buffers of one bytes, we'd increment ix by 1 in each. Unless I'm missing something.
ix = readLengthBytes(compressed, bytes, ix); if (ix == 3) break; // Done, we have 3 bytes.
reset ix=0 inside the while loop? ix could be non-zero value outside while.
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12868200/HIVE-16671.01.patch
ERROR: -1 due to no test(s) being added or modified.
ERROR: -1 due to 3 failed/errored test(s), 10717 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] (batchId=81) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=144)
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5273/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5273/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5273/
Messages:
Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed
This message is automatically generated.
ATTACHMENT ID: 12868200 - PreCommit-HIVE-Build
prasanth_j btw, the first patch fixed the original issue. For that case, the new patch would behave like the original. The corner case to the corner case, where 3 bytes of length are present over multiple buffers, is handled in 02 patch. Unfortunately we don't have repro for that and it might be impossible (perhaps if length falls on ZCR boundary with ZCR enabled?)
The 3-byte check is related to the 3-byte CB header. I dbl checked, we hit BufferUnderflow (no next byte) on the 3rd get.
I don't think it's related to the file header, that would have happened much earlier. I may have more details tomorrow about the failure.
the 3 bytes check makes me think if this is somehow related to splits starting at 0 vs 3?
When BI split strategy is chosen, entire file/block could become a split? Say if a file is 1000 bytes. Split offset will be 0 and length will be 1000.
Whereas for the same file, if ETL split strategy is chosen, split offset will be 3 and length will be 997. First 3 bytes are ignored as that is part of ORC magic header.
Do you have a repro for this issue? If so could you check the split boundaries to make sure if this is the case.
Actually, I guess this patch should be more involved, if there's a contiguous chunk with more byes, however unlikely that is
prasanth_j can you take a look? and also perhaps comment if you have input on ORC estimates. This is basically the same logic as just below where we find out that BufferChunk-s from disk contain a cut off CB, after finding the CB length. What we see is a stable repro of BufferUnderflow in one of the 3 get-s. So I'm assuming that ORC estimate can overshoot a CB boundary by exactly 1-2 bytes, causing the failure to read length; that needs to be handled similarly to a partial CB. Does this make sense?
Hive 3.0.0 has been released so closing this jira.