Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 3.0
-
None
-
ghx-label-7
Description
Impala/bin/load-data.py is most commonly used to load test data onto a simulated standalone cluster running on the local host. However, with the correct inputs, it can also be used to load data onto an actual cluster running on remote hosts.
A recent enhancement in the load-data.py script to parallelize parts of the data loading process – https://github.com/apache/impala/commit/d481cd48 – has introduced a regression in the latter use case:
From $IMPALA_HOME/logs/data_loading/data-load-functional-exhaustive.log:
Created table functional_hbase.widetable_1000_cols Took 0.7121 seconds 09:48:01 Beginning execution of hive SQL: /home/systest/Impala-auxiliary-tests/tests/cdh_cluster/../../../Impala-cdh-cluster-test-runner/logs/data_loading/sql/functional/load-functional-query-exhaustive-hive-generated-text-none-none.sql Traceback (most recent call last): File "/home/systest/Impala-auxiliary-tests/tests/cdh_cluster/../../../Impala-cdh-cluster-test-runner/bin/load-data.py", line 494, in <module> if __name__ == "__main__": main() File "/home/systest/Impala-auxiliary-tests/tests/cdh_cluster/../../../Impala-cdh-cluster-test-runner/bin/load-data.py", line 468, in main hive_exec_query_files_parallel(thread_pool, hive_load_text_files) File "/home/systest/Impala-auxiliary-tests/tests/cdh_cluster/../../../Impala-cdh-cluster-test-runner/bin/load-data.py", line 299, in hive_exec_query_files_parallel exec_query_files_parallel(thread_pool, query_files, 'hive') File "/home/systest/Impala-auxiliary-tests/tests/cdh_cluster/../../../Impala-cdh-cluster-test-runner/bin/load-data.py", line 290, in exec_query_files_parallel for result in thread_pool.imap_unordered(execution_function, query_files): File "/usr/lib/python2.7/multiprocessing/pool.py", line 659, in next raise value TypeError: coercing to Unicode: need string or buffer, NoneType found