Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1149

Optimize CombineHiveFileInputFormat execution speed

    XMLWordPrintableJSON

Details

    Description

      When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is pretty slow.
      One of the culprit is the "new URI" call in the following function. We should try to get rid of it.

        protected static PartitionDesc getPartitionDescFromPath(
            Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws IOException {
          // The format of the keys in pathToPartitionInfo sometimes contains a port
          // and sometimes doesn't, so we just compare paths.
          for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
              .entrySet()) {
            try {
              if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
                return entry.getValue();
              }
            } catch (URISyntaxException e2) {
            }
          }
          throw new IOException("cannot find dir = " + dir.toString()
              + " in partToPartitionInfo!");
        }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            zshao Zheng Shao
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: