Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5431

Calling FileSystem.Exists() twice in a row for the same partition adds unnecessary latency to metadata loading

    Details

    • Epic Color:
      ghx-label-5

      Description

      FileSystem.exists() is called in loadPartitionFileMetadata then again in refreshFileMetadata which seems redundant.

      When dealing with a large number of partitions this can become a bottleneck.

       private void loadPartitionFileMetadata(StorageDescriptor storageDescriptor,
            HdfsPartition partition) throws Exception {
          Preconditions.checkNotNull(storageDescriptor);
          Preconditions.checkNotNull(partition);
          Path partDirPath = new Path(storageDescriptor.getLocation());
          FileSystem fs = partDirPath.getFileSystem(CONF);
         if (!fs.exists(partDirPath)) return;
          refreshFileMetadata(partition);
        }
      
      private void refreshFileMetadata(HdfsPartition partition) throws CatalogException {
          Path partDir = partition.getLocationPath();
          Preconditions.checkNotNull(partDir);
          try {
            FileSystem fs = partDir.getFileSystem(CONF);
            if (!fs.exists(partDir)) {
              partition.setFileDescriptors(new ArrayList<FileDescriptor>());
              return;
            }
            if (!FileSystemUtil.supportsStorageIds(fs)) {
              synthesizeBlockMetadata(fs, partition);
              return;
            }
            // Index the partition file descriptors by their file names for O(1) look ups.
            ImmutableMap<String, FileDescriptor> fileDescsByName = Maps.uniqueIndex(
                partition.getFileDescriptors(), new Function<FileDescriptor, String>() {
                  public String apply(FileDescriptor desc) {
                    return desc.getFileName();
                  }
                });
      

      Before and after Java profiles attached, the number of socket reads goes down from 1,639 to 1,046. For a table with 80 partitions and 250K files this gave a 15-20% speedup.

        Attachments

        1. Baseline.jfr
          181 kB
          Mostafa Mokhtar
        2. After removing redundant fs.exists().jfr
          214 kB
          Mostafa Mokhtar

          Issue Links

            Activity

              People

              • Assignee:
                bharathv bharath v
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: