Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
3.1.1, 4.0.0
-
None
Description
This affects current master as well. Created an orc file such that it spans multiple stripes and ran a simple select *, and got incorrect row counts (when comparing with select count. The problem seems to be that after split generation and creating min/max rowId for each row (note that since the loaded file is not written by Hive ACID, it does not have ROW_ID in the file; but the ROWID is applied on read by discovering min/max bounds which are used for calculating ROW_ID.rowId for each row of a split), Hive is only reading the last split.