Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12420

Have a built-in CSV data source implementation

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • SQL
    • None

    Description

      CSV is the most common data format in the "small data" world. It is often the first format people want to try when they see Spark on a single node. Making this built-in for the most common source can provide a better experience for first-time users.

      We should consider inlining https://github.com/databricks/spark-csv

      Attachments

        1. Built-in CSV datasource in Spark.pdf
          133 kB
          Hossein Falaki

        Issue Links

          1.
          Initial import of databricks/spark-csv Sub-task Resolved Hossein Falaki
          2.
          Renaming CSV options to be similar to Pandas and R Sub-task Resolved Hyukjin Kwon
          3.
          Organize options for default values Sub-task Closed Unassigned
          4.
          Use spark internal utilities wherever possible Sub-task Closed Unassigned
          5.
          Improve tests for better coverage Sub-task Closed Unassigned
          6.
          Populate statistics for DataFrame when reading CSV Sub-task Closed Unassigned
          7.
          Support to specify the option for compression codec. Sub-task Resolved Hyukjin Kwon
          8.
          Refector options to be correctly formed in a case class Sub-task Resolved Hyukjin Kwon
          9.
          CSVRelation should be based on HadoopFsRelation Sub-task Closed Unassigned
          10.
          Use cast expression to perform type cast in csv Sub-task Closed Unassigned
          11.
          Encoding not working with non-ascii compatible encodings (UTF-16/32 etc.) Sub-task Closed Unassigned
          12.
          Expose maxCharactersPerColumn as a user configurable option Sub-task Resolved Hossein Falaki
          13.
          Documentation for CSV datasource options Sub-task Resolved Hyukjin Kwon
          14.
          Support for loading CSV with a single function call Sub-task Resolved Hyukjin Kwon
          15.
          NullPoingException in schema inference for CSV when the first line is empty Sub-task Resolved Hyukjin Kwon
          16.
          java.lang.NegativeArraySizeException in CSV Sub-task Resolved Hyukjin Kwon
          17.
          Make type inference recognize boolean types Sub-task Resolved Hyukjin Kwon
          18.
          Support for writing CSV with a single function call Sub-task Resolved Hyukjin Kwon
          19.
          Support for saving with a quote mode Sub-task Resolved Jurriaan Pruis
          20.
          Support for specifying custom date format for date and timestamp types Sub-task Resolved Hyukjin Kwon
          21.
          Keep old data source name for backwards compatibility Sub-task Resolved Hossein Falaki
          22.
          Limit logging of bad records Sub-task Resolved Reynold Xin
          23.
          Handle decimal type in CSV inference Sub-task Resolved Hyukjin Kwon
          24.
          Produce InternalRow instead of external Row Sub-task Resolved Hyukjin Kwon
          25.
          Options for parsing NaNs, Infinity and nulls for numeric types Sub-task Resolved Hossein Falaki
          26.
          Increase default value for maxCharsPerColumn Sub-task Resolved Unassigned
          27.
          rowSeparator does not work for both reading and writing Sub-task Resolved Unassigned
          28.
          Put CSV options as Python csv function parameters Sub-task Resolved Hyukjin Kwon
          29.
          Upgrade Univocity library from 2.0.2 to 2.1.0 Sub-task Resolved Hyukjin Kwon
          30.
          Allow setting the quoteEscapingEnabled flag when writing CSV Sub-task Resolved Jurriaan Pruis

          Activity

            People

              Unassigned Unassigned
              rxin Reynold Xin
              Votes:
              4 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: