Resolution: Cannot Reproduce
I have a CSV table with 14 columns, about 44 million rows. I found that with SELECT * and certain conditions in the WHERE clause, the shell would seem to output all the results (I would see the line at the bottom of the result table) but then would hang and never return to the impala-shell prompt.
If I SELECT COUNT with the same WHERE clause, it works.
If I SELECT DISTINCT <one of the columns> with the same where clause, it works.
If I do a CTAS into a Parquet table with SELECT * from the original table (no WHERE clause), it works.
However, if I do the same query via 'impala-shell -q' or with delimited results via 'impala-shell -B -q', it hangs the same way.
When I hit Ctrl-C, the resulting message is:
^C Cancelling Query
Failed to reconnect and close: ERROR: Cancelled
which is different than what I saw when cancelling other SELECT or INSERT statements that were still in progress. This seems like the query has finished but the shell doesn't close it properly. The equivalent COUNT and DISTINCT queries finish in 1-2 seconds. I let the SELECT * run for 3 minutes or more and it stays stuck.
It's possible there is some anomaly somewhere in the CSV files, but like I say I can CTAS the whole contents of the table. It's only outputting in the shell that stalls.
Schema, stalled query, actual data, profiles, logs all available on the Cloudera network for diagnosis. (Ping me for the location.)