Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8109

Impala cannot read the gzip files bigger than 2 GB

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.12.0
    • Fix Version/s: Impala 3.1.0
    • Component/s: Backend
    • Labels:
      None
    • Epic Color:
      ghx-label-2

      Description

      When querying a partition containing gzip files, the query fails with the error below:
      WARNINGS: Disk I/O error: Error seeking to -2147483648 in file: hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXXXXXX.gz:
      Error(255): Unknown error 255
      Root cause: EOFException: Cannot seek to negative offset

      hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXXXXXX.gz file is a delimited text file and has a size of bigger than 2 GB (approx: 2.4 GB) The uncompressed size is ~13GB

      The impalad version is : 2.12.0-cdh5.15.0

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                hakkibc hakki
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: