Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6370

Crash when querying nested data in partitioned Parquet table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.11.0
    • Impala 2.12.0
    • Backend

    Description

      Header of hs_err file:

      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00007f908452d782, pid=1223, tid=0x00007f8fc0d1e700
      #
      # JRE version: Java(TM) SE Runtime Environment (8.0_111-b14) (build 1.8.0_111-b14)
      # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.111-b14 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      # C  [libc.so.6+0x89782]  memcpy+0xd2
      #
      # Core dump written. Default location: /data/1/cores/core or core.1223
      #
      # If you would like to submit a bug report, please visit:
      #   http://bugreport.java.com/bugreport/crash.jsp
      #
      
      ---------------  T H R E A D  ---------------
      
      Current thread is native thread
      
      siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000000
      
      Registers:
      RAX=0x00000000276a9000, RBX=0x000000001ae72f00, RCX=0x0000000000000019, RDX=0x0000000000000341
      RSP=0x00007f8fc0d1d058, RBP=0x00007f8fc0d1d150, RSI=0x0000000000000000, RDI=0x00000000276a9000
      R8 =0x00007f8fc0d1d108, R9 =0x00007f8fc0d1d110, R10=0x0000000000000000, R11=0x00007f9084527e5d
      R12=0x000000001b06e418, R13=0x000000002485d2e8, R14=0x0000000000000001, R15=0x000000002485d2a0
      RIP=0x00007f908452d782, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000004
        TRAPNO=0x000000000000000e
      
      

      Backtrace from core:

      (gdb) bt
      #0  0x00007f90844d6495 in raise () from sysroot/lib64/libc.so.6
      #1  0x00007f90844d7c75 in abort () from sysroot/lib64/libc.so.6
      #2  0x00007f90867419d5 in os::abort(bool) ()
         from sysroot/usr/java/jdk1.8.0_111/jre/lib/amd64/server/libjvm.so
      #3  0x00007f90868e4033 in VMError::report_and_die() ()
         from sysroot/usr/java/jdk1.8.0_111/jre/lib/amd64/server/libjvm.so
      #4  0x00007f90867471ff in JVM_handle_linux_signal ()
         from sysroot/usr/java/jdk1.8.0_111/jre/lib/amd64/server/libjvm.so
      #5  0x00007f908673d993 in signalHandler(int, siginfo*, void*) ()
         from sysroot/usr/java/jdk1.8.0_111/jre/lib/amd64/server/libjvm.so
      #6  <signal handler called>
      #7  0x00007f908452d782 in memcpy () from sysroot/lib64/libc.so.6
      #8  0x0000000000dbb61b in InitTupleFromTemplate (this=<optimized out>, 
          tuple_byte_size=<optimized out>, tuple=<optimized out>, template_tuple=0x0)
          at /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/exec/hdfs-scanner.h:446
      #9  InitTuple (tuple=<optimized out>, template_tuple=0x0, 
          desc=<optimized out>, this=0x1ae72f00)
          at /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/exec/hdfs-scanner.h:432
      #10 impala::HdfsParquetScanner::AssembleCollection (this=<optimized out>, 
          column_readers=..., new_collection_rep_level=<optimized out>, 
          coll_value_builder=coll_value_builder@entry=0x7f8fc0d1d160)
          at /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/exec/hdfs-parquet-scanner.cc:1249
      #11 0x0000000000de3ba1 in ReadSlot (pool=0x1b06e418, tuple=<optimized out>, 
          this=0x1ae72f00)
          at /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/exec/parquet-column-readers.cc:1280
      #12 impala::CollectionColumnReader::ReadValue (this=0x1ae72f00, 
          pool=0x1b06e418, tuple=<optimized out>)
          at /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/exec/parquet-column-readers.cc:1256
      #13 0x0000000040bcc682 in ?? ()
      #14 0x0000000027b91c20 in ?? ()
      #15 0x000000001b06e418 in ?? ()
      #16 0x000000003f565000 in ?? ()
      #17 0x0000000000000044 in ?? ()
      #18 0x00007f8fc0d1d2e0 in ?? ()
      #19 0x000000003f565000 in ?? ()
      #20 0x0000000000000003 in ?? ()
      #21 0x0000000000000400 in ?? ()
      #22 0x000000001b06e3d0 in ?? ()
      #23 0x000000001b06e418 in ?? ()
      #24 0x0000000000de36b1 in impala::ParquetColumnReader::ReadValueBatch (
          this=0x2485d2a0, pool=0x2485d2e8, max_values=1086113410, tuple_size=1, 
          tuple_mem=0x40bcc000 "", num_values=0x1b06e418)
          at /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/exec/parquet-column-readers.cc:802
      #25 0x0000000002de8c50 in vtable for parquet::Statistics ()
      #26 0x0000000040bcc000 in ?? ()
      #27 0x000000001ae72f00 in ?? ()
      #28 0x000000001ae72f00 in ?? ()
      #29 0x00000000ffffffff in ?? ()
      #30 0x0000000000000000 in ?? ()
      

      I have a repro:

      impala-shell.sh -q "
      DROP TABLE IF EXISTS complextypes_partitioned;
      CREATE TABLE complextypes_partitioned (
        id BIGINT,
        int_array ARRAY<INT>,
        int_array_array ARRAY<ARRAY<INT>>,
        int_map MAP<STRING,INT>,
        int_map_array ARRAY<MAP<STRING,INT>>,
        nested_struct STRUCT<a:INT,b:ARRAY<INT>,c:STRUCT<d:ARRAY<ARRAY<STRUCT<e:INT,f:STRING>>>>,g:MAP<STRING,STRUCT<h:STRUCT<i:ARRAY<DOUBLE>>>>>
      )
      PARTITIONED BY (
        part int
      )
      STORED AS PARQUET;
      alter table complextypes_partitioned add partition (part=1);
      "
      
      hdfs dfs -cp "/test-warehouse/complextypestbl_parquet/*" /test-warehouse/complextypes_partitioned/part=1
      
      impala-shell.sh -q "
      REFRESH complextypes_partitioned;
      select id, part, a.item from complextypes_partitioned t, t.int_array a
      "
      

      Attachments

        Activity

          People

            tarmstrong Tim Armstrong
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: