Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-482

"block size is too big" error with Snappy-compressed RCFile containing null

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.0.1, Impala 1.1
    • Impala 1.1.1
    • None

    Description

      Impala seems to have trouble with Snappy-compressed RCFiles containing null values. I'm using Hive 0.11.0; it's possible that the problem is on Hive's side, but Hive can read its own output just fine.

      What happens is the following error:

      Decompressor: block size is too big.  Data is likely corrupt. Size: 0
      

      Example reproduction instructions (with long outputs left out as [...]):

      $ echo $'\x1'foo > data.txt
      $ cat > script.hql <<EOF
      CREATE TABLE text(a STRING, b STRING) STORED AS TEXTFILE;
      CREATE TABLE rc(a STRING, b STRING) STORED AS RCFILE;
      LOAD DATA LOCAL INPATH "data.txt" INTO TABLE text;
      SET hive.exec.compress.output=true;
      SET mapred.max.split.size=256000000;
      SET mapred.output.compression.type=BLOCK;
      SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
      INSERT INTO TABLE rc SELECT * FROM text;
      EOF
      $ hive -f script.hql
      [...]
      $ impala-shell -ri $insertImpalaNodeHere -q 'SELECT * FROM text'
      [...]
      Returned 1 row(s) in 0.28s
      $ impala-shell -ri $insertImpalaNodeHere -q 'SELECT * FROM rc'
      [...]
      Returned 0 row(s) in 0.27s
      

      I expect one row with a = NULL and b = "foo", but get 0 rows instead.

      impalad's debug output for the query is attached.

      Attachments

        1. impalad-debug.log
          30 kB
          Matti Niemenmaa

        Activity

          People

            skye Skye Wanderman-Milne
            deewiant_impala_2f8f Matti Niemenmaa
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: