Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9603

[C++][Parquet] Write Arrow relies on unspecified behavior for nested types

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • C++

    Description

      parquet/column_writer.cc WriteArrow implementations at certain points checks null counts/required data and passes through the null bitmap for encoding.  This only works for nested data types if the if the null slot on a parent implies a null slot on the leaf.  This relationship is not required by the specifications.

       

      Most paths for creating arrays follow this pattern so it would be esoteric to hit this bug, but we should still fix it.

       

      All branches that rely on reading nullness should generate a new null bitmap based on definition levels if the column is nested, and decisions should be based off of that.

      Attachments

        Issue Links

          Activity

            People

              emkornfield@gmail.com Micah Kornfield
              emkornfield@gmail.com Micah Kornfield
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 7h 40m
                  7h 40m