Description
It seems that for non-catalog tables (e.g. spark.read.parquet(...)), we scan the filesystem twice, once for schema inference, and another to create a FileIndex class for the relation.
It would be better to combine these scans somehow, since this is the most costly step of creating a table. This is a follow-up ticket to https://github.com/apache/spark/pull/16090.
cc cloud_fan