Some test setups have data already in place and only need to run the DDLs to sync up the metadata. This corresponds to running testdata/bin/create-load-data.sh using a data snapshot but without skip_metadata_load.
Right now, for partitioned tables where the partitions are created dynamically as part of the insert, generate-schema-statements.py forces a reload:
# Force reloading of the table if the user specified the --force option or
# if the table is partitioned and there was no ALTER section specified. This is to
# ensure the partition metadata is always properly created. The ALTER section is
# used to create partitions, so if that section exists there is no need to force
# IMPALA-6579: Also force reload all Kudu tables. The Kudu entity referenced
# by the table may or may not exist, so requiring a force reload guarantees
# that the Kudu entity is always created correctly.
# TODO: Rename the ALTER section to ALTER_TABLE_ADD_PARTITION
force_reload = options.force_reload or (partition_columns and not alter) or \
file_format == 'kudu'
In the case where the data is already in place, this would drop that data and reload it. Instead, we should just use "recover partitions" on that table to get all the partition information.