Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4169

N^2 loop in HdfsTable.getPartitionFromThriftPartitionSpec when running "compute stats" which causes slowdown when running against a large number of partitions

    XMLWordPrintableJSON

Details

    Description

      While running compute stats on a table with +500K partitions I found that the catalog is spending a lot of time in toLowerCase(), looks like searching through all the partitions values is un-optimal.

      The code below should use a HashSet.

          // Search through all the partitions and check if their partition key values
          // match the values being searched for.
          for (HdfsPartition partition: partitionMap_.values()) {
            if (partition.isDefaultPartition()) continue;
            List<LiteralExpr> partitionValues = partition.getPartitionValues();
            Preconditions.checkState(partitionValues.size() == targetValues.size());
            boolean matchFound = true;
            for (int i = 0; i < targetValues.size(); ++i) {
              String value;
              if (partitionValues.get(i) instanceof NullLiteral) {
                value = getNullPartitionKeyValue();
              } else {
                value = partitionValues.get(i).getStringValue();
                Preconditions.checkNotNull(value);
                // See IMPALA-252: we deliberately map empty strings on to
                // NULL when they're in partition columns. This is for
                // backwards compatibility with Hive, and is clearly broken.
                if (value.isEmpty()) value = getNullPartitionKeyValue();
              }
              if (!targetValues.get(i).equals(value.toLowerCase())) {
                matchFound = false;
                break;
              }
            }
            if (matchFound) {
              return partition;
            }
          }
      

      Attachments

        1. compute_stats_store_sales_100kp3.jfr
          1.18 MB
          Mostafa Mokhtar

        Activity

          People

            Unassigned Unassigned
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: