Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-580

Inconsistent or blank fileFormats values passed to CM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Cannot Reproduce
    • Impala 1.1
    • None
    • Backend
    • None
    • Impala 1.1.0 and CM 4.6.2.

    Description

      In the CM "Query Details" page, one of the fields is "File Formats". If I query a table created with STORED AS SEQFILE with the BZip2 compression codec, CM shows a line like:

      File Formats: SEQUENCE_FILE/BZIP2

      That seems intuitive. However, for other combinations of file format and compression codec, the "File Formats" value is blank or seems misleading.

      select * from seqfile_snappy limit 5 -> file formats in CM is blank
      select * from rcfile_snappy limit 5 -> file formats in CM is blank
      select count from seqfile_deflate -> file formats in CM = SEQUENCE_FILE/DEFAULT
      select count from rcfile_deflate -> file formats in CM = RC_FILE/DEFAULT (is DEFAULT a typo for DEFLATE since this happens for both SEQFILE and RCFILE tables?)
      select count from parquet_snappy -> file formats = PARQUET/NONE

      I also see PARQUET/NONE for a Parquet table compressed with GZip.

      I also see PARQUET/NONE for a Parquet table where the Impala data directory contains data files compressed with different codecs. I understand CM could in some cases display multiple values in this "File Formats" field, and that's what I'd expect to happen in this case. (The same way I'd expect multiple "File Formats" values for a join of tables with different file formats, or a query against a partitioned table where partitions had different file formats.)

      I did not have an LZO-compressed text table, so I didn't check if that case would produce TEXT/LZO as expected.
      I did not have an Avro table, so I didn't check those combinations.
      I did not check Avro, SEQFILE, or RCFILE with data files from more than one compression codec in the same directory.

      Other than the above cases, I think I checked every combination of file format and codec, and the only issues I saw were those I listed.

      impala-shell PROFILE output or CM profile text available if desired.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jrussell John Russell
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: