Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9042 Support reading full-ACID ORC tables
  3. IMPALA-9484

Milestone 1: properly scan files that has full ACID schema

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.0.0
    • None
    • ghx-label-3

    Description

       

      Full ACID row format looks like this:

      { "operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": \{"i": 1}

      }

      User columns are nested under "row". The frontend should create proper tuples and slot descriptors for the scan nodes to read the files correctly.

      We should be able to query the ACID columns, at least for debugging/testing. Hive uses the special “row__id” identifier for that.

      Impala should raise an error if there are delete deltas. Directory filtering should filter out minor compacted directories since the records from those need validation.

      Non-goals in this sub-task:

      • row validation against validWriteIdList
      • reading "original files" (files in non-ACID format)
      • reading delete deltas

      Attachments

        Activity

          People

            boroknagyz Zoltán Borók-Nagy
            boroknagyz Zoltán Borók-Nagy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: