Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
Impala 2.10.0
-
None
-
ghx-label-3
Description
functional.complextypes_fileformat is a text table containing some nested data.
Data load is supposed to generate functional.complextypes_fileformat in this order:
1. Create table functional.complextypes_fileformat
2. Populate functional.complextypes_fileformat using
INSERT OVERWRITE TABLE
{db_name} {db_suffix}.
{table_name} SELECT id, named_struct("f1",string_col,"f2",int_col), array(1, 2, 3), map("k", cast(0 as bigint)) FROM functional.alltypestiny;3. Create tables functional_*.complextypes_fileformat
4. Populate each table using:
INSERT OVERWRITE TABLE {table_name}
SELECT * FROM functional.
{table_name};
However, dataload is doing this in the wrong order. It does #1, #3, #4, and then finally #2. This means that #4 is operating on zero rows, so all the functional_*.complextypes_fileformat tables have zero rows. Oddly enough, dataload also generates #4 to insert into functional.complextypes_fileformat so it is overwriting itself using rows from itself. Dataload should do this in the correct order (and avoid this weirdness).
This is only used for frontend tests, but it can cause issues with recent versions of Hive, because Hive seems to skip creating a file when it would be writing zero rows. That can alter the number of files listed in the plan.
Attachments
Issue Links
- breaks
-
IMPALA-6239 Remote data load breaks with "LOAD DATA LOCAL INPATH": Invalid path
- Resolved