Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9779

Unnecessarily reloading file metadata in some DDLs

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0, Impala 3.2.0, Impala 3.3.0, Impala 3.4.0
    • Fix Version/s: None
    • Component/s: Catalog
    • Labels:
      None
    • Epic Color:
      ghx-label-13

      Description

      Some DDLs are not modifying the actual table data. We don't need to reload file meta for them. These DDLs include:

      • Compute (incremental) stats
      • Drop stats
      • Alter table set row format
      • Alter table set file format

      Code paths of them both call CatalogOpExecutor.bulkAlterPartitions(). The related partitions are marked as "dirty" anyway. Dirty partitions will be dropped and reloaded at the end of CatalogOpExecutor.alterTable(TAlterTableParams, TDdlExecResponse). See the details in HdfsTable.updatePartitionsFromHms().

      We can consider not marking related partitions as "dirty" in these DDLs.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              stigahuang Quanlong Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: