Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7330

Make the table metadata refresh after "LOAD" commands incremental

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.1.0
    • Impala 3.1.0
    • Catalog
    • ghx-label-9

    Description

      Currently, any LOAD command refreshes the entire table metadata after loading the required files.

      case TStmtType::LOAD: {
        DCHECK(exec_request_.__isset.load_data_request);
        TLoadDataResp response;
        RETURN_IF_ERROR(
            frontend_->LoadData(exec_request_.load_data_request, &response));
        request_result_set_.reset(new vector<TResultRow>);
        request_result_set_->push_back(response.load_summary);
      
        // Now refresh the table metadata.
        TCatalogOpRequest reset_req;
        reset_req.__set_sync_ddl(exec_request_.query_options.sync_ddl);
        reset_req.__set_op_type(TCatalogOpType::RESET_METADATA);
        reset_req.__set_reset_metadata_params(TResetMetadataRequest());
        reset_req.reset_metadata_params.__set_header(TCatalogServiceRequestHeader());
        reset_req.reset_metadata_params.__set_is_refresh(true);
        reset_req.reset_metadata_params.__set_table_name(
            exec_request_.load_data_request.table_name);
        reset_req.reset_metadata_params.__set_sync_ddl(
            exec_request_.query_options.sync_ddl);
        catalog_op_executor_.reset(
            new CatalogOpExecutor(exec_env_, frontend_, server_profile_));
        RETURN_IF_ERROR(catalog_op_executor_->Exec(reset_req));
        RETURN_IF_ERROR(parent_server_->ProcessCatalogUpdateResult(
            *catalog_op_executor_->update_catalog_result(),
            exec_request_.query_options.sync_ddl));
        break;
      }

      Refreshing the entire table is not always required, especially if we load only into a single partition via,

      LOAD DATA INPATH '...path...' INTO TABLE t PARTITION (...);
      

      The idea is to make this refresh post-load incremental and only refresh newly updated/created partitions.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tlipcon Todd Lipcon
            bharathv Bharath Vissapragada
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment