Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17524

The ORC reader method ReadStripe does not work when we specify fields to selected as a list of integers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 8.0.1
    • 10.0.0
    • C++

    Description

      I think there is a bug in the ORC reader : when we specify the fields indexes that we want to keep, it does not work correctly. Looking at the code, it seems to be because we do "includeTypes" in lieue of "include" when setting the ORC options.
      It can be problematic when we want to import an ORC table containing Union types as it will do an error at the import, even if we try not to import these specific fields.

      The definitions of the corresponding ORC methods are here :
      https://github.com/apache/orc/blob/72220851cbde164a22706f8d47741fd1ad3db190/c%2B%2B/src/Options.hh#L185-L191

      and
      https://github.com/apache/orc/blob/72220851cbde164a22706f8d47741fd1ad3db190/c%2B%2B/src/Options.hh#L201-L207

      Attachments

        Issue Links

          Activity

            People

              louisc Louis Calot
              louisc Louis Calot
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m