2.10 introduced the max_row_size query option. This is to replace the read_size startup option from earlier, both of them control the maximum width of a row impala can process. The defaults for these are different, while read_size was 8MB, the new default is 512kb. There is a tradeoff between being able to read large rows and memory usage. The 512kb is expected to work well for most use cases, having 8MB would reserve unnecessarily large chunks of memory in the new buffer pool even if the rows are smaller.
We should advise users that they may have to append query options to their workflows or change the 512kb default when upgrading to 2.10.
Also, if users set --read_size > 8mb to process larger rows, we should recommend that they revert --read_size to the default and set the max_row_size query option to the size of the largest rows they need to process. This should greatly reduce memory consumption of HDFS scans. The query option gives them the flexibility of overriding the value per-query, per-pool or globally.