Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8549

Add support for scanning DEFLATE text files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • Impala 3.3.0
    • Backend
    • ghx-label-5

    Description

      Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing text files stored using zlib / deflate (results in files such as 000000_0.deflate). Impala currently does not support reading .deflate text files and returns errors such as: ERROR: Scanner plugin 'DEFLATE' is not one of the enabled plugins: 'LZO'.

      Moreover, the default compression codec in Hadoop is zlib / deflate (see o.a.h.io.compress.DefaultCodec). So when writing to a text table in Hive, if users set hive.exec.compress.output to true, then .deflate files will be written by default.

      Impala does support zlib / deflate with other file formats though: Avro, RCFiles, SequenceFiles (see https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).

      Currently, the frontend assigns a compression type to a file depending on its extension. For instance, the functional_text_def database is stored as a file with a .deflate extension and is assigned the compression type DEFLATE. The HdfsTextScanner class receives this value and uses it directly to create a decompressor. The functional_{avro,seq,rc}_databases are stored as files without extensions, so the frontend interprets their compression type as NONE. However, in the backend, each of their corresponding scanners implement custom logic of their own to read file headers and override the existing NONE compression type assigned to files with new values, such as DEFAULT or DEFLATE, so that they appropriate decompressor can be instantiated.

      Attachments

        Issue Links

          Activity

            People

              ethan.xue Ethan
              stakiar Sahil Takiar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: