Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.1.0
-
ghx-label-9
Description
Currently, any LOAD command refreshes the entire table metadata after loading the required files.
case TStmtType::LOAD: { DCHECK(exec_request_.__isset.load_data_request); TLoadDataResp response; RETURN_IF_ERROR( frontend_->LoadData(exec_request_.load_data_request, &response)); request_result_set_.reset(new vector<TResultRow>); request_result_set_->push_back(response.load_summary); // Now refresh the table metadata. TCatalogOpRequest reset_req; reset_req.__set_sync_ddl(exec_request_.query_options.sync_ddl); reset_req.__set_op_type(TCatalogOpType::RESET_METADATA); reset_req.__set_reset_metadata_params(TResetMetadataRequest()); reset_req.reset_metadata_params.__set_header(TCatalogServiceRequestHeader()); reset_req.reset_metadata_params.__set_is_refresh(true); reset_req.reset_metadata_params.__set_table_name( exec_request_.load_data_request.table_name); reset_req.reset_metadata_params.__set_sync_ddl( exec_request_.query_options.sync_ddl); catalog_op_executor_.reset( new CatalogOpExecutor(exec_env_, frontend_, server_profile_)); RETURN_IF_ERROR(catalog_op_executor_->Exec(reset_req)); RETURN_IF_ERROR(parent_server_->ProcessCatalogUpdateResult( *catalog_op_executor_->update_catalog_result(), exec_request_.query_options.sync_ddl)); break; }
Refreshing the entire table is not always required, especially if we load only into a single partition via,
LOAD DATA INPATH '...path...' INTO TABLE t PARTITION (...);
The idea is to make this refresh post-load incremental and only refresh newly updated/created partitions.
Attachments
Issue Links
- relates to
-
IMPALA-7313 LOAD DATA does not create partitions on-demand (inconsistent with INSERT)
- Open