Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 1.1.1
-
None
-
None
Description
We should look into the right batch size to send to hdfswrite. We previously called it once per row, which lead to very poor performance. Now we batch based on the input batch size.
This is not effective for partitioned tables where the input batch is split. We should also see if this is the best size to pass to hdfs in general.