Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12393

DictEncoder uses inconsistent hash function for TimestampValue

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 4.3.0
    • Impala 4.3.0
    • Backend
    • None
    • ghx-label-3

    Description

      DictEncoder currently uses this hash function for TimestampValue:

      template<typename T>
      inline uint32_t DictEncoder<T>::Hash(const T& value) const {
        return HashUtil::Hash(&value, sizeof(value), 0);
      }

      TimestampValue has some padding, and nothing ensures that the padding is cleared. This means that identical TimestampValue objects can hash to different values.

      This came up when fixing a Clang-Tidy performance check. This line in dict-test.cc changed from iterating over values to iterating over const references.

        DictEncoder<InternalType> encoder(&pool, fixed_buffer_byte_size, &track_encoder);
        encoder.UsedbyTest();
      <<<<<<
        for (InternalType i: values) encoder.Put(i);
      =====
        for (const InternalType& i: values) encoder.Put(i);
      >>>>>
        bytes_alloc = encoder.DictByteSize();
        EXPECT_EQ(track_encoder.consumption(), bytes_alloc);
        EXPECT_EQ(encoder.num_entries(), values_set.size()); <--------

      The test became flaky, with the encoder.num_entries() being larger than the values_set.size() for TimestampValue. This happened because the hash values didn't match even for identical entries and the dictionary would have multiple copies of the same value. When iterating over a plain non-reference TimestampValue, each TimestampValue is being copied to a temporary value. Maybe in this circumstance the padding stays the same between iterations.

      It's possible this would come up when writing Parquet data files.

      One fix would be to use TimestampValue's Hash function, which ignores the padding:

      template<>
      inline uint32_t DictEncoder<TimestampValue>::Hash(const TimestampValue& value) const {
        return value.Hash();
      }

       

      Attachments

        Issue Links

          Activity

            People

              joemcdonnell Joe McDonnell
              joemcdonnell Joe McDonnell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: