Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21326

Use TextFileFormat in implementation of LibSVMFileFormat

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • ML, SQL
    • None

    Description

      This is related with SPARK-19918 and SPARK-18362.

      There are three points here:

      • The main advantage of this change was removing file-listing bottlenecks in driver side and I guess this does not applies to LibSVM datasource for the current state as it requires the file should be single.
        As a side note, it looks possible with multiple files - SPARK-21066 but it is still in discussion
      • Another advantage is ones from using FileScanRDD. For example, I guess we can use spark.sql.files.ignoreCorruptFiles option if we allow multiple files in schema inference.
      • We can unify the schema inference code path in text based data sources. This is also a preparation for SPARK-21289.

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: