Details
-
Improvement
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
-
None
Description
DataFrame reads attempt to match user expectations by giving every element across all
shards a unique index. This is done by embedding the filepath
itself in the index, but this results in the (often quite long) path
being duplicated for every element (sometimes exceeding the size of the
data itself).
We should instead generate a guaranteed unique numeric index.