Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7047

REFRESH on unpartitioned tables calls getBlockLocations on every file

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.13.0
    • Fix Version/s: Impala 3.2.0
    • Component/s: Catalog
    • Labels:
    • Epic Color:
      ghx-label-8

      Description

      In HdfsTable.updateUnpartitionedTableFileMd() the existing default Partition object is reset, and a new empty one is created. It then calls refreshPartitionFileMetadata with this new partition which has an empty list of file descriptors. This ends up listing the directory, and for each file, since it doesn't find it in the empty descriptor list, will make a separate RPC to HDFS to get the locations.

      This is quite wasteful vs just using the API that returns the located statuses for the directory.

      Alternatively, it seems like it should probably keep around the old file descriptor list in the new Partition object so that the incremental refresh path can work.

        Attachments

          Activity

            People

            • Assignee:
              tlipcon Todd Lipcon
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: