[DAFFODIL-2468] Unparsing an infoset for an 800mb csv file runs out of memory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.1.0
Component/s: Back End
Labels:
None

Description

While verifying ~~DAFFODIL-2455~~ - - Large CSV file causes "Attempting to backtrack too far" exception, found that unparsing the successfully parsed 800mb CSV files infoset ran out of memory.

Increased the DAFFODIL_JAVA_OPTS memory setting several time up to 32gb and tried unparsing the infoset, each time running out of memory. Ran on test platform which has 90+GB of memory.

Parsed and unparsed using the shema from dfdl-shemas/dfdl-csv repo.

The 800gb csv file (csv_data800m.csv) gzipped.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

csv_data800m.csv.gz
03/Feb/21 13:49
3.54 MB
Dave Thompson

Activity

People

Assignee:: Steve Lawrence

Reporter:: Dave Thompson

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Feb/21 13:52

Updated:: 12/May/21 18:40

Resolved:: 12/May/21 18:40