[SPARK-40922] pyspark.pandas.read_csv supports reading multiple files, but that is undocumented - ASF JIRA

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.3.1
Fix Version/s: 3.4.0
Component/s: Pandas API on Spark, PySpark
Labels:
None

Description

The path argument of pyspark.pandas.read_csv(path, ...) currently has type annotation str and is documented as

    path : str
        The path string storing the CSV file to be read.

The implementation however uses pyspark.sql.DataFrameReader.csv(path, ...) which does support multiple paths:

        path : str or list
            string, or list of strings, for input path(s),
            or RDD of Strings storing CSV rows.

Loading multiple CSV files at once is a useful feature to have and should be documented (and tested for)

Attachments

Issue Links

Add Link

links to

[Github] Pull Request #38399 (soxofaan)

Delete this link

[Github] Pull Request #38399 (soxofaan)

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Stefaan Lippens

Reporter:: Stefaan Lippens

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Oct/22 13:46

Updated:: 12/Dec/22 18:11

Resolved:: 27/Oct/22 10:56

Agile

View on Board

pyspark.pandas.read_csv supports reading multiple files, but that is undocumented

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment