Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-2468

Unparsing an infoset for an 800mb csv file runs out of memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • Back End
    • None

    Description

      While verifying DAFFODIL-2455 - - Large CSV file causes "Attempting to backtrack too far" exception, found that unparsing the successfully parsed 800mb CSV files infoset ran out of memory.

      Increased the DAFFODIL_JAVA_OPTS memory setting several time up to 32gb and tried unparsing the infoset, each time running out of memory. Ran on test platform which has 90+GB of memory. 

      Parsed and unparsed using the shema from dfdl-shemas/dfdl-csv repo.

      The 800gb csv file (csv_data800m.csv) gzipped.

      Attachments

        1. csv_data800m.csv.gz
          3.54 MB
          Dave Thompson

        Activity

          People

            slawrence Steve Lawrence
            dfthompson Dave Thompson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: