I tried to load tpcds_3000_iceberg database based on existing parquet database. The source database is tpcds_3000_parquet (3TB scale) and the cluster has 10 nodes.
After loading table catalog_sales, I found that the row count of tpcds_3000_iceberg.catalog_sales is less than row count of tpcds_3000_parquet.catalog_sales. Further debugging reveals that the CTAS query actually finish writing parquet files, but only one parquet file per partition gets written into Iceberg avro metadata.
For example inspecting partiton 2451120, it says that there are 2 parquet files
However, the avro files only have ec48089182b28ba9-b2910c2d00000011_1004901849_data.0.parq in it
The CTAS query that I use on debugging is the following: