bin/load-data.py uses sqlparse to read SQL files and split them into SQL statements. Recently, some remote cluster tests have seen errors during dataload due to sqlparse failing to split SQL statements appropriately. Specifically, it does not detect the end of a SQL statement and tries to run dozens of SQL statements together. Impala's parser rejects this. The SQL file is identical to the SQL file generated during our normal dataload, so clearly, something about this system or its environment breaks sqlparse.
sqlparse in our environment is 0.1.15, which is quite old. The latest sqlparse is 0.2.4. Running the tests with sqlparse 0.2.4 does not encounter the error. sqlparse needs to be upgraded.