Details
-
Task
-
Status: Resolved
-
Blocker
-
Resolution: Unresolved
-
None
-
None
Description
Existing bootstrap code lists all files in the dataset and caches a FileStatus object for each file found. FileStatus object has many fields which take memory and most of these fields are not even used later as part of bootstrap.
For a very large production table, the bootstrap code fails with OOM and also leads to timeout as a very large number of executors are spawned.
Dataset has 1299 partitions and 12Million files.
Attachments
Issue Links
- links to