Description
This is related with SPARK-19918 and SPARK-18362.
There are three points here:
- The main advantage of this change was removing file-listing bottlenecks in driver side and I guess this does not applies to LibSVM datasource for the current state as it requires the file should be single.
As a side note, it looks possible with multiple files -SPARK-21066but it is still in discussion
- Another advantage is ones from using FileScanRDD. For example, I guess we can use spark.sql.files.ignoreCorruptFiles option if we allow multiple files in schema inference.
- We can unify the schema inference code path in text based data sources. This is also a preparation for
SPARK-21289.