[SPARK-21326] Use TextFileFormat in implementation of LibSVMFileFormat - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: ML, SQL
Labels:
None

Description

This is related with ~~SPARK-19918~~ and ~~SPARK-18362~~.

There are three points here:

The main advantage of this change was removing file-listing bottlenecks in driver side and I guess this does not applies to LibSVM datasource for the current state as it requires the file should be single.
As a side note, it looks possible with multiple files - ~~SPARK-21066~~ but it is still in discussion

Another advantage is ones from using FileScanRDD. For example, I guess we can use spark.sql.files.ignoreCorruptFiles option if we allow multiple files in schema inference.

We can unify the schema inference code path in text based data sources. This is also a preparation for ~~SPARK-21289~~.

Attachments

Issue Links

links to

[Github] Pull Request #18556 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 06/Jul/17 09:02

Updated:: 12/Dec/22 18:10

Resolved:: 07/Jul/17 04:24