[SPARK-14260] Increase default value for maxCharsPerColumn - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Trivial
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

I guess the default value of the option maxCharsPerColumn looks relatively small,1000000 characters meaning 976KB.

It looks some of guys have a problem with this ending up setting the value manually.

https://github.com/databricks/spark-csv/issues/295
https://issues.apache.org/jira/browse/SPARK-14103

According to univocity API, this exists to avoid OutOfMemoryErrors.

If this does not harm performance, then I think it would be better to make the default value much bigger (eg. 10MB or 100MB) so that users do not take care of the lengths of each field in CSV file.

Apparently Apache CSV Parser does not have such limits.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Mar/16 02:18

Updated:: 12/Dec/22 18:11

Resolved:: 31/Mar/16 19:20