Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
This is a slight usability rough edge. Assigning default names (like "f0, f1, ...") would probably be better since then at least you can see how many columns there are and what is in them.
In [10]: parse_options = csv.ParseOptions(delimiter='|', header_rows=0) In [11]: %time table = csv.read_csv('Performance_2016Q4.txt', parse_options=parse_options) --------------------------------------------------------------------------- ArrowInvalid Traceback (most recent call last) <timed exec> in <module> ~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/_csv.pyx in pyarrow._csv.read_csv() ~/miniconda/envs/pyarrow-14-1/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: header_rows == 0 needs explicit column names
In pandas integers are used, so some kind of default string would have to be defined
In [18]: df = pd.read_csv('Performance_2016Q4.txt', sep='|', header=None, low_memory=False) In [19]: df.columns Out[19]: Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], dtype='int64')
Attachments
Issue Links
- links to