[SPARK-36610] Add `thousands` argument to `ps.read_csv`. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.2.0
Fix Version/s: None
Component/s: PySpark
Labels:
None

Description

When reading csv file in pandas, pandas automatically detect the thousand separator if `thousands` argument is specified.

>>> pd.read_csv(path, sep=";")
    name  age        job      money
0  Jorge   30  Developer  1,000,000
1    Bob   32  Developer    1000000

>>> pd.read_csv(path, sep=";", thousands=",")
    name  age        job    money
0  Jorge   30  Developer  1000000
1    Bob   32  Developer  1000000

However, pandas-on-Spark doesn't support it.

Attachments

Issue Links

links to

[Github] Pull Request #33907 (itholic)

Activity

People

Assignee:: Unassigned

Reporter:: Haejoon Lee

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Aug/21 07:51

Updated:: 03/Sep/21 07:23