Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls OrcAcidUtils.getLastFlushLength for every delete delta file.
Even the comment says:
// NOTE: Calling last flush length below is more for future-proofing when we have // streaming deletes. But currently we don't support streaming deletes, and this can // be removed if this becomes a performance issue.
If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then for every base + delta dir we will check all of the delete_delta directories, and check the getLastFlushLength method which will result in 6*5=30 unnecessary NN/S3 calls.
We should remove the check as already proposed in the comment.
Attachments
Issue Links
- relates to
-
HIVE-23597 VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times
- Open
- links to