Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-28266

Iceberg: select count(*) from data_files metadata tables gives wrong result

    XMLWordPrintableJSON

Details

    Description

      In Hive Iceberg, every table has a corresponding metadata table "*.data_files" that contains info about the files that contain table's data.

      select count from a data_file metadata table returns number of rows in the data table instead of number of data files from the metadata table.

       

      CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by iceberg stored as orc TBLPROPERTIES ('external.table.purge'='true','format-version'='2');
      insert into x values 
      ('amy', 35, 123412344),
      ('adxfvy', 36, 123412534),
      ('amsdfyy', 37, 123417234),
      ('asafmy', 38, 123412534);
      insert into x values 
      ('amerqwy', 39, 123441234),
      ('amyxzcv', 40, 123341234),
      ('erweramy', 45, 122341234);
      Select * from default.x.data_files;
      – Returns 2 records in the output
      Select count from default.x.data_files;
      – Returns 7 instead of 2
      

       

      Attachments

        Issue Links

          Activity

            People

              difin Dmitriy Fingerman
              difin Dmitriy Fingerman
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: