Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10186

Write invalid parquet PageLocations which table sort by some columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend
    • ghx-label-13

    Description

      Current parquet writer write -1 of PageLocation.offset and PageLocation.first_row_index when meet a empty page. 

       hdfs-parquet-file-writer.cc  Line: 808 ~ 819

        // Write data pages
        for (const DataPage& page : pages_) {
          if (page.header.data_page_header.num_values == 0) {
            // Skip empty pages
            location.offset = -1;
            location.compressed_page_size = 0;
            location.first_row_index = -1;
            AddLocationToOffsetIndex(location);
            continue;
          }
      

      But -1 values may cause   ComputeCandidatePages function run into unexpected status.

      bool ComputeCandidatePages(
          const vector<parquet::PageLocation>& page_locations,
          const vector<RowRange>& candidate_ranges,
          const int64_t num_rows, vector<int>* candidate_pages) {
        if (!ValidatePageLocations(page_locations, num_rows)) return false
      

      and then cause  IMPALA-9952

       

      Attachments

        Issue Links

          Activity

            People

              boroknagyz Zoltán Borók-Nagy
              guojingfeng guojingfeng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: