Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39279

Fasten the schema inference of CSV/JSON data source

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • SQL
    • None

    Description

      In the current implementation of CSV/JSON data source, the schema inference relies on methods that will throw exceptions if the fields can't convert as some data types. 

      Throwing and catching exceptions can be slow. We can improve it by creating methods that return optional results instead. A good example is https://github.com/apache/spark/pull/36562, which reduces the schema inference time by 90%.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Gengliang.Wang Gengliang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: