Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21460

ACID: Load data followed by a select * query results in incorrect results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 3.1.1, 4.0.0
    • 3.1.1, 4.0.0-alpha-1
    • Transactions
    • None

    Description

      This affects current master as well. Created an orc file such that it spans multiple stripes and ran a simple select *, and got incorrect row counts (when comparing with select count. The problem seems to be that after split generation and creating min/max rowId for each row (note that since the loaded file is not written by Hive ACID, it does not have ROW_ID in the file; but the ROWID is applied on read by discovering min/max bounds which are used for calculating ROW_ID.rowId for each row of a split), Hive is only reading the last split.

      Attachments

        1. HIVE-21460.1.patch
          0.9 kB
          Vaibhav Gumashta

        Activity

          People

            vgumashta Vaibhav Gumashta
            bgoerlitz Brian Goerlitz
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: