Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1254

parquet writer will incorrectly add a value twice when max dictionary size is reached

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.4
    • Impala 2.1
    • None
    • None

    Description

      EncodeValue returns false when the dictionary num entries limit has been reached, even though the value was added. Then, AppendRow() will loop back around and re-add that value.

         if (current_encoding_ == Encoding::PLAIN_DICTIONARY) {
            *bytes_needed = dict_encoder_->Put(*reinterpret_cast<T*>(value));
            *bytes_added += *bytes_needed;
      
            // If the dictionary contains the maximum number of values, switch to plain
            // encoding.  The current dictionary encoded page is written out.
            if (dict_encoder_->num_entries() == MAX_DICTIONARY_ENTRIES) {
              *bytes_added += FinalizeCurrentPage();
              current_encoding_ = Encoding::PLAIN;
              return false;                           <==== HERE
            } else {
      
      

      Attachments

        Activity

          People

            dhecht Daniel Hecht
            dhecht Daniel Hecht
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: