Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.4.0
-
None
-
None
Description
In the current implementation of CSV/JSON data source, the schema inference relies on methods that will throw exceptions if the fields can't convert as some data types.
Throwing and catching exceptions can be slow. We can improve it by creating methods that return optional results instead. A good example is https://github.com/apache/spark/pull/36562, which reduces the schema inference time by 90%.