-
Type:
Bug
-
Status: Resolved
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 3.1.1, 4.0.0
-
Component/s: Transactions
-
Labels:None
This affects current master as well. Created an orc file such that it spans multiple stripes and ran a simple select *, and got incorrect row counts (when comparing with select count. The problem seems to be that after split generation and creating min/max rowId for each row (note that since the loaded file is not written by Hive ACID, it does not have ROW_ID in the file; but the ROWID is applied on read by discovering min/max bounds which are used for calculating ROW_ID.rowId for each row of a split), Hive is only reading the last split.