Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11489

Async IO cannot handle >2GB ORC files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.2.0, Impala 4.1.1
    • Backend
    • None

    Description

      We assume that the size fits to an int:
      https://github.com/apache/impala/blob/308fda110758b0fc58e5b1f477d635aac29aea75/be/src/exec/hdfs-orc-scanner.cc#L253

      If the size overflows, then we can incorrectly hit the following error check (this check is meant to avoid crashing due to corrupt metadata). I see no other ways this could cause problems, if the catch still succeeds (because the overflow led to a valid looking length), then the data will be read correctly.

      This looks like a trivial fix, but I am concerned about lack of testing of >2GB files

      Attachments

        Activity

          People

            csringhofer Csaba Ringhofer
            csringhofer Csaba Ringhofer
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: