Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
Impala 2.3.0
-
None
Description
Impala write performance is bottlenecked by the single threaded HDFSTableSink. Consider the following usecase –
— For a baremetal cluster with 3 managers & 7 datanodes; 11 drives/datanode
- Each drive gave me respectable ~100MB/s read and write performance
```
[root@c3kuhdpnode1 ~]# hdparm -t -T /dev/sdk1
/dev/sdk1:
Timing cached reads: 18650 MB in 1.99 seconds = 9349.32 MB/sec
Timing buffered disk reads: 348 MB in 3.00 seconds = 115.88 MB/sec
[root@c3kuhdpnode1 ~]# time dd if=/dev/sdk1 of=/$drive/sdk1.zero bs=1024 count=10000000
10000000+0 records in
10000000+0 records out
10240000000 bytes (10 GB) copied, 90.6001 s, 113 MB/s
real
1m30.602s
user
0m0.929s
sys
0m35.295s
```
— For a single CTAS Query
— The query running by itself on the cluster took 37 min. See Profile_Before_Hashing.txt for the explain plan
— 10, modified versions of the query, each performing a 1/10 of the writes, took ~22m running concurrently. See Profile_After_Hashing.txt for one of the 10 explain plans.
In both cases, I found majority of the time is spent in HDFSTableSink. I'd expect that portion of the query to be able to write nearly as fast as a Teragen on the cluster could, which is not what we are observing.
This is pretty terrible for a few different reasons, primarily that, we can't use Hive to generate parquet because it might generate multiple hdfs blocks.