Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7330

Make the table metadata refresh after "LOAD" commands incremental

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 3.1.0
    • Fix Version/s: Impala 3.1.0
    • Component/s: Catalog
    • Labels:
    • Epic Color:
      ghx-label-9

      Description

      Currently, any LOAD command refreshes the entire table metadata after loading the required files.

      case TStmtType::LOAD: {
        DCHECK(exec_request_.__isset.load_data_request);
        TLoadDataResp response;
        RETURN_IF_ERROR(
            frontend_->LoadData(exec_request_.load_data_request, &response));
        request_result_set_.reset(new vector<TResultRow>);
        request_result_set_->push_back(response.load_summary);
      
        // Now refresh the table metadata.
        TCatalogOpRequest reset_req;
        reset_req.__set_sync_ddl(exec_request_.query_options.sync_ddl);
        reset_req.__set_op_type(TCatalogOpType::RESET_METADATA);
        reset_req.__set_reset_metadata_params(TResetMetadataRequest());
        reset_req.reset_metadata_params.__set_header(TCatalogServiceRequestHeader());
        reset_req.reset_metadata_params.__set_is_refresh(true);
        reset_req.reset_metadata_params.__set_table_name(
            exec_request_.load_data_request.table_name);
        reset_req.reset_metadata_params.__set_sync_ddl(
            exec_request_.query_options.sync_ddl);
        catalog_op_executor_.reset(
            new CatalogOpExecutor(exec_env_, frontend_, server_profile_));
        RETURN_IF_ERROR(catalog_op_executor_->Exec(reset_req));
        RETURN_IF_ERROR(parent_server_->ProcessCatalogUpdateResult(
            *catalog_op_executor_->update_catalog_result(),
            exec_request_.query_options.sync_ddl));
        break;
      }

      Refreshing the entire table is not always required, especially if we load only into a single partition via,

      LOAD DATA INPATH '...path...' INTO TABLE t PARTITION (...);
      

      The idea is to make this refresh post-load incremental and only refresh newly updated/created partitions.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tlipcon Todd Lipcon
                Reporter:
                bharathv bharath v
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: