XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.15.1
Fix Version/s: 8.0.0
Component/s: Python
Labels:
Environment:
Windows, python 3.6.7,

External issue URL:
https://github.com/apache/arrow/issues/24135

Description

Sorry in advance if I mess anything up. This is my first issue.

I have hourly data for 3 years using a Pandas datetime as the index. Pandas allows me load/save .csv with the following code (only one month with 2 variables shown):
`

Write data to .csv

jan90.to_csv('PEC fine course 1 grid 199001.csv', index=True)

Load data from .csv

jan90 = pd.read_csv('PEC fine course 1 grid 199001.csv', index_col=0, parse_dates=True)
`
Using .csv works, but is slow when I get to the full dataset of 26k+ rows and 21.6k+ columns (and more columns may be coming if I have to add lags to my data). So, a more efficient load/save routine is very desirable. I was excited when I found feather, but the lost index is a no-go for my use.

Thanks for your consideration.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PEC fine course 1 grid 199001.csv
21/Feb/20 17:58
35 kB
Samuel Jones
PEC fine course 1 grid 199001.feather
21/Feb/20 17:58
12 kB
Samuel Jones

Issue Links

duplicates

ARROW-15018 [Python] DataFrame Index modified during Feather serialization round trip

Closed

links to

GitHub Pull Request #12821

Activity

People

Assignee:: saloni jain

Reporter:: Samuel Jones

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/Feb/20 18:00

Updated:: 11/Jan/23 07:56

Resolved:: 21/Apr/22 16:48

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 10m