[ARROW-6231] [C++][Python] Consider assigning default column names when reading CSV file and header_rows=0 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.15.0
Component/s: C++, Python
Labels:
- csv
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/22618

Description

This is a slight usability rough edge. Assigning default names (like "f0, f1, ...") would probably be better since then at least you can see how many columns there are and what is in them.

In [10]: parse_options = csv.ParseOptions(delimiter='|', header_rows=0)                                                                                         

In [11]: %time table = csv.read_csv('Performance_2016Q4.txt', parse_options=parse_options)                                                                      
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<timed exec> in <module>

~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/_csv.pyx in pyarrow._csv.read_csv()

~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: header_rows == 0 needs explicit column names

In pandas integers are used, so some kind of default string would have to be defined

In [18]: df = pd.read_csv('Performance_2016Q4.txt', sep='|', header=None, low_memory=False)                                                                     

In [19]: df.columns                                                                                                                                             
Out[19]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
           dtype='int64')

Attachments

Issue Links

links to

GitHub Pull Request #5206

Activity

People

Assignee:: Antoine Pitrou

Reporter:: Wes McKinney

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 14/Aug/19 03:02

Updated:: 11/Jan/23 07:45

Resolved:: 30/Aug/19 16:33

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

[C++][Python] Consider assigning default column names when reading CSV file and header_rows=0