Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3079

Fix Sequence file writer (crashes or produces invalid files)

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
      None

      Description

      Stack

      #0  0x0000003f38832625 in raise () from /lib64/libc.so.6
      #1  0x0000003f38833e05 in abort () from /lib64/libc.so.6
      #2  0x00007f956d814c55 in os::abort(bool) () from /opt/toolchain/sun-jdk-64bit-1.7.0.75/jre/lib/amd64/server/libjvm.so
      #3  0x00007f956d996cd7 in VMError::report_and_die() () from /opt/toolchain/sun-jdk-64bit-1.7.0.75/jre/lib/amd64/server/libjvm.so
      #4  0x00007f956d819b6f in JVM_handle_linux_signal () from /opt/toolchain/sun-jdk-64bit-1.7.0.75/jre/lib/amd64/server/libjvm.so
      #5  <signal handler called>
      #6  0x0000000000cf1992 in snappy::internal::CompressFragment(char const*, unsigned long, char*, unsigned short*, int) ()
      #7  0x0000000000cf2087 in snappy::Compress(snappy::Source*, snappy::Sink*) ()
      #8  0x0000000000cf3051 in snappy::RawCompress(char const*, unsigned long, char*, unsigned long*) ()
      #9  0x0000000000a8c158 in impala::SnappyBlockCompressor::ProcessBlock(bool, long, unsigned char const*, long*, unsigned char**) ()
      #10 0x0000000000cc9177 in impala::HdfsSequenceTableWriter::WriteCompressedBlock() ()
      #11 0x0000000000cc97d0 in impala::HdfsSequenceTableWriter::Flush() ()
      #12 0x0000000000cc9ac3 in impala::HdfsSequenceTableWriter::AppendRowBatch(impala::RowBatch*, std::vector<int, std::allocator<int> > const&, bool*) ()
      #13 0x0000000000cc37e9 in impala::HdfsTableSink::Send(impala::RuntimeState*, impala::RowBatch*, bool) ()
      #14 0x0000000000ca9364 in impala::PlanFragmentExecutor::OpenInternal() ()
      #15 0x0000000000ca98e7 in impala::PlanFragmentExecutor::Open() ()
      #16 0x0000000000a60df8 in impala::FragmentMgr::FragmentExecState::Exec() ()
      #17 0x0000000000a589d2 in impala::FragmentMgr::FragmentThread(impala::TUniqueId) ()
      #18 0x0000000000a59bea in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >, void>::invoke(boost::detail::function::function_buffer&) ()
      #19 0x0000000000af7da7 in impala::Thread::SuperviseThread(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()()>, impala::Promise<long>*) ()
      #20 0x0000000000af86c4 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() ()
      #21 0x0000000000d31c4a in ?? ()
      #22 0x0000003f38c079d1 in start_thread () from /lib64/libpthread.so.0
      #23 0x0000003f388e88fd in clone () from /lib64/libc.so.6
      

      DDL

      set ALLOW_UNSUPPORTED_FORMATS=1;
      CREATE TABLE tpcds_1000_parquet.store_sales_sequencefile (                                                                  |
        ss_sold_time_sk INT,                                                                                                      |
        ss_item_sk BIGINT,                                                                                                        |
        ss_customer_sk INT,                                                                                                       |
        ss_cdemo_sk INT,                                                                                                          |
        ss_hdemo_sk INT,                                                                                                          |
        ss_addr_sk INT,                                                                                                           |
        ss_store_sk INT,                                                                                                          |
        ss_promo_sk INT,                                                                                                          |
        ss_ticket_number BIGINT,                                                                                                  |
        ss_quantity INT,                                                                                                          |
        ss_wholesale_cost DOUBLE,                                                                                                 |
        ss_list_price DOUBLE,                                                                                                     |
        ss_sales_price DOUBLE,                                                                                                    |
        ss_ext_discount_amt DOUBLE,                                                                                               |
        ss_ext_sales_price DOUBLE,                                                                                                |
        ss_ext_wholesale_cost DOUBLE,                                                                                             |
        ss_ext_list_price DOUBLE,                                                                                                 |
        ss_ext_tax DOUBLE,                                                                                                        |
        ss_coupon_amt DOUBLE,                                                                                                     |
        ss_net_paid DOUBLE,                                                                                                       |
        ss_net_paid_inc_tax DOUBLE,                                                                                               |
        ss_net_profit DOUBLE                                                                                                      |
      )                                                                                                                           |
      PARTITIONED BY (                                                                                                            |
        ss_sold_date_sk INT                                                                                                       |
      )                                                                                                                           |
      STORED AS SEQUENCEFILE                                                                                                      |
      

      Query

      insert into store_sales_sequencefile partition(ss_sold_date_sk) select * from store_sales where ss_sold_date_sk between 2450816 and 2451200;
      

        Issue Links

          Activity

          Hide
          dhecht Dan Hecht added a comment -

          Skye Wanderman-Milne, do you know the history with ALLOW_UNSUPPORTED_FORMATS=1? How "unsupported" is writing a sequence file? Why do we have this option?

          Show
          dhecht Dan Hecht added a comment - Skye Wanderman-Milne , do you know the history with ALLOW_UNSUPPORTED_FORMATS=1? How "unsupported" is writing a sequence file? Why do we have this option?
          Hide
          dhecht Dan Hecht added a comment -

          Interestingly, when I run locally in a minicluster on a debug build, I get:

          #5  0x00000000022381ad in google::LogMessage::Flush() ()
          #6  0x000000000223ba5e in google::LogMessageFatal::~LogMessageFatal() ()
          #7  0x0000000001247c9d in impala::MemTracker::~MemTracker (this=0x7f4b080, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/runtime/mem-tracker.cc:219
          #8  0x0000000001230015 in boost::checked_delete<impala::MemTracker> (x=0x7f4b080) at /home/dhecht/toolchain/boost-1.57.0/include/boost/core/checked_delete.hpp:34
          #9  0x000000000122e41b in boost::scoped_ptr<impala::MemTracker>::~scoped_ptr (this=0x7f4b2f8, __in_chrg=<optimized out>) at /home/dhecht/toolchain/boost-1.57.0/include/boost/smart_ptr/scoped_ptr.hpp:82
          #10 0x000000000183424c in impala::HdfsTableSink::~HdfsTableSink (this=0x7f4b1e0, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/exec/hdfs-table-sink.h:122
          #11 0x0000000001834302 in impala::HdfsTableSink::~HdfsTableSink (this=0x7f4b1e0, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/exec/hdfs-table-sink.h:122
          #12 0x0000000001809e97 in boost::checked_delete<impala::DataSink> (x=0x7f4b1e0) at /home/dhecht/toolchain/boost-1.57.0/include/boost/core/checked_delete.hpp:34
          #13 0x0000000001809363 in boost::scoped_ptr<impala::DataSink>::~scoped_ptr (this=0x8bff0c8, __in_chrg=<optimized out>) at /home/dhecht/toolchain/boost-1.57.0/include/boost/smart_ptr/scoped_ptr.hpp:82
          #14 0x0000000001802a3e in impala::PlanFragmentExecutor::~PlanFragmentExecutor (this=0x8bfef60, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/runtime/plan-fragment-executor.cc:72
          #15 0x000000000141b2a4 in impala::FragmentMgr::FragmentExecState::~FragmentExecState (this=0x8bfed00, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/service/fragment-exec-state.h:41
          
          (gdb) f 7
          #7  0x0000000001247c9d in impala::MemTracker::~MemTracker (this=0x7f4b080, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/runtime/mem-tracker.cc:219
          219       DCHECK_EQ(consumption_->current_value(), 0) << label_ << "\n" << GetStackTrace();
          

          The assert is new, but probably exposing a different problem. Michael Ho, I think you are readying a patch for this DCHECK, right?

          Show
          dhecht Dan Hecht added a comment - Interestingly, when I run locally in a minicluster on a debug build, I get: #5 0x00000000022381ad in google::LogMessage::Flush() () #6 0x000000000223ba5e in google::LogMessageFatal::~LogMessageFatal() () #7 0x0000000001247c9d in impala::MemTracker::~MemTracker ( this =0x7f4b080, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/runtime/mem-tracker.cc:219 #8 0x0000000001230015 in boost::checked_delete<impala::MemTracker> (x=0x7f4b080) at /home/dhecht/toolchain/boost-1.57.0/include/boost/core/checked_delete.hpp:34 #9 0x000000000122e41b in boost::scoped_ptr<impala::MemTracker>::~scoped_ptr ( this =0x7f4b2f8, __in_chrg=<optimized out>) at /home/dhecht/toolchain/boost-1.57.0/include/boost/smart_ptr/scoped_ptr.hpp:82 #10 0x000000000183424c in impala::HdfsTableSink::~HdfsTableSink ( this =0x7f4b1e0, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/exec/hdfs-table-sink.h:122 #11 0x0000000001834302 in impala::HdfsTableSink::~HdfsTableSink ( this =0x7f4b1e0, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/exec/hdfs-table-sink.h:122 #12 0x0000000001809e97 in boost::checked_delete<impala::DataSink> (x=0x7f4b1e0) at /home/dhecht/toolchain/boost-1.57.0/include/boost/core/checked_delete.hpp:34 #13 0x0000000001809363 in boost::scoped_ptr<impala::DataSink>::~scoped_ptr ( this =0x8bff0c8, __in_chrg=<optimized out>) at /home/dhecht/toolchain/boost-1.57.0/include/boost/smart_ptr/scoped_ptr.hpp:82 #14 0x0000000001802a3e in impala::PlanFragmentExecutor::~PlanFragmentExecutor ( this =0x8bfef60, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/runtime/plan-fragment-executor.cc:72 #15 0x000000000141b2a4 in impala::FragmentMgr::FragmentExecState::~FragmentExecState ( this =0x8bfed00, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/service/fragment-exec-state.h:41 (gdb) f 7 #7 0x0000000001247c9d in impala::MemTracker::~MemTracker ( this =0x7f4b080, __in_chrg=<optimized out>) at /home/dhecht/src/Impala/be/src/runtime/mem-tracker.cc:219 219 DCHECK_EQ(consumption_->current_value(), 0) << label_ << "\n" << GetStackTrace(); The assert is new, but probably exposing a different problem. Michael Ho , I think you are readying a patch for this DCHECK, right?
          Hide
          dhecht Dan Hecht added a comment -

          Removing this assert does reproduce the same crash Mostafa saw. Next, I'm going to try Michael Ho patch: http://github.mtv.cloudera.com/kwho/Impala/commit/4aee3f729d545378a9612d595d75607ef2d4b32e
          I suspect this is another case of memory corruption with same root cause of IMPALA-3068. (Unfortunately my desktop seems to have crashed, though, so will have to wait until the morning to check.)

          Show
          dhecht Dan Hecht added a comment - Removing this assert does reproduce the same crash Mostafa saw. Next, I'm going to try Michael Ho patch: http://github.mtv.cloudera.com/kwho/Impala/commit/4aee3f729d545378a9612d595d75607ef2d4b32e I suspect this is another case of memory corruption with same root cause of IMPALA-3068 . (Unfortunately my desktop seems to have crashed, though, so will have to wait until the morning to check.)
          Hide
          kwho Michael Ho added a comment -

          I think the DCHECK you hit has to do with HdfsTableSink not calling Release() on its mem_tracker_. I am going through various table writers to see why they aren't releasing the memory they consume. Again, it either is a memory leak or accounting problem. I already tried with my own patch and the same problem still happened.

          Show
          kwho Michael Ho added a comment - I think the DCHECK you hit has to do with HdfsTableSink not calling Release() on its mem_tracker_. I am going through various table writers to see why they aren't releasing the memory they consume. Again, it either is a memory leak or accounting problem. I already tried with my own patch and the same problem still happened.
          Hide
          skye Skye Wanderman-Milne added a comment -

          This work was started but never completed to the point where we could consider it supported. I think we determined this by trying to use Impala to write our test data, but it wasn't working (I don't know any further details on this). I wouldn't consider this a blocker considering it's unsupported functionality.

          Show
          skye Skye Wanderman-Milne added a comment - This work was started but never completed to the point where we could consider it supported. I think we determined this by trying to use Impala to write our test data, but it wasn't working (I don't know any further details on this). I wouldn't consider this a blocker considering it's unsupported functionality.
          Hide
          skye Skye Wanderman-Milne added a comment -

          Mostafa Mokhtar, did you hit this running that query manually or as part of our existing tests?

          Show
          skye Skye Wanderman-Milne added a comment - Mostafa Mokhtar , did you hit this running that query manually or as part of our existing tests?
          Hide
          mmokhtar Mostafa Mokhtar added a comment -

          Skye Wanderman-Milne
          No, this was ad-hoc.

          Show
          mmokhtar Mostafa Mokhtar added a comment - Skye Wanderman-Milne No, this was ad-hoc.
          Hide
          kwho Michael Ho added a comment -

          I ran with the patch at http://gerrit.cloudera.org:8080/2314 and I was able to finish the insert statement without crashing.
          The file looks corrupted and I could not read it back.

          Also tried the same test for parquet, avro and text and I was able to read them back successfully.

          Show
          kwho Michael Ho added a comment - I ran with the patch at http://gerrit.cloudera.org:8080/2314 and I was able to finish the insert statement without crashing. The file looks corrupted and I could not read it back. Also tried the same test for parquet, avro and text and I was able to read them back successfully.
          Hide
          dhecht Dan Hecht added a comment -

          Michael Ho, I can still reproduce this crash even with http://gerrit.cloudera.org:8080/2314 (though I no longer hit the consumption() dcheck so that one is fixed).

          Show
          dhecht Dan Hecht added a comment - Michael Ho , I can still reproduce this crash even with http://gerrit.cloudera.org:8080/2314 (though I no longer hit the consumption() dcheck so that one is fixed).
          Hide
          kwho Michael Ho added a comment -

          So, this bug is unrelated to IMPALA-3085 then.

          Show
          kwho Michael Ho added a comment - So, this bug is unrelated to IMPALA-3085 then.
          Hide
          dhecht Dan Hecht added a comment -

          Correct. And I'm pretty convinced that it is not a general corruption issue but instead a bug in the way we use the snappy compressor, but since this is under ALLOW_UNSUPPORTED_FORMATS=1, it is also not a blocker.

          Show
          dhecht Dan Hecht added a comment - Correct. And I'm pretty convinced that it is not a general corruption issue but instead a bug in the way we use the snappy compressor, but since this is under ALLOW_UNSUPPORTED_FORMATS=1, it is also not a blocker.
          Hide
          dhecht Dan Hecht added a comment -

          I think the logic in SnappyBlockCompressor::ProcessBlock() is not right for odd values of 'input_length'. This patch seemes to fix the crash, but still leaves us with files that can't be read. I'm just recording this here for when we pick this bug up again later:

          diff --git a/be/src/util/compress.cc b/be/src/util/compress.cc
          index 22bb32d..1e70391 100644
          --- a/be/src/util/compress.cc
          +++ b/be/src/util/compress.cc
          @@ -201,9 +201,9 @@ Status SnappyBlockCompressor::ProcessBlock(bool output_preallocated,
             // an integer which is the size of the decompressed data followed by a
             // sequence of compressed blocks each preceded with an integer size.
             // For testing purposes we are going to generate two blocks.
          -  int64_t block_size = input_length / 2;
          -  size_t length = snappy::MaxCompressedLength(block_size) * 2;
          -  length += 3 * sizeof (int32_t);
          +  int64_t block_size = input_length;
          +  size_t length = snappy::MaxCompressedLength(block_size);
          +  length += 2 * sizeof (int32_t);
             DCHECK(!output_preallocated || length <= *output_length);
          
             if (output_preallocated) {
          @@ -231,6 +231,7 @@ Status SnappyBlockCompressor::ProcessBlock(bool output_preallocated,
               input += block_size;
               input_length -= block_size;
               outp += size;
          +    DCHECK_EQ(input_length, 0);
             }
          
             *output = out_buffer_;
          
          Show
          dhecht Dan Hecht added a comment - I think the logic in SnappyBlockCompressor::ProcessBlock() is not right for odd values of 'input_length'. This patch seemes to fix the crash, but still leaves us with files that can't be read. I'm just recording this here for when we pick this bug up again later: diff --git a/be/src/util/compress.cc b/be/src/util/compress.cc index 22bb32d..1e70391 100644 --- a/be/src/util/compress.cc +++ b/be/src/util/compress.cc @@ -201,9 +201,9 @@ Status SnappyBlockCompressor::ProcessBlock(bool output_preallocated, // an integer which is the size of the decompressed data followed by a // sequence of compressed blocks each preceded with an integer size. // For testing purposes we are going to generate two blocks. - int64_t block_size = input_length / 2; - size_t length = snappy::MaxCompressedLength(block_size) * 2; - length += 3 * sizeof (int32_t); + int64_t block_size = input_length; + size_t length = snappy::MaxCompressedLength(block_size); + length += 2 * sizeof (int32_t); DCHECK(!output_preallocated || length <= *output_length); if (output_preallocated) { @@ -231,6 +231,7 @@ Status SnappyBlockCompressor::ProcessBlock(bool output_preallocated, input += block_size; input_length -= block_size; outp += size; + DCHECK_EQ(input_length, 0); } *output = out_buffer_;
          Hide
          attilaj Attila Jeges added a comment -

          commit 59b2db6ba722e5bef297bb4603519e06333ce5cb
          Author: Attila Jeges <attilaj@cloudera.com>
          Date: Mon Feb 20 18:07:25 2017 +0100

          IMPALA-3079: Fix sequence file writer

          This change fixes the following issues in the Sequence File Writer:
          1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong()
          were broken. As a result, Impala created corrupt uncompressed
          sequence files.

          2. KEY_CLASS_NAME was missing from the sequence file header. As a
          result, Hive could not read back uncompressed sequence files
          created by Impala.

          3. Impala created record-compressed sequence files with empty keys
          block. As a result, Hive could not read back record-compressed
          sequence files created by Impala.

          4. Impala created block-compressed files with:

          • empty key-lengths block
          • empty keys block
          • empty value-lengths block
            This resulted in invalid block-compressed sequence files that Hive could
            not read back.

          5. In some cases the wrong Record-compression flag was written to the
          sequence file header. As a result, Hive could not read back record-
          compressed sequence files created by Impala.

          6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the
          beginning of blocks in block-compressed sequence files. Hive could
          not read these files back.

          7. The calculation of block sizes in SnappyBlockCompressor class was
          incorrect for odd-length buffers.

          Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487
          Reviewed-on: http://gerrit.cloudera.org:8080/6107
          Reviewed-by: Michael Ho <kwho@cloudera.com>
          Reviewed-by: Attila Jeges <attilaj@cloudera.com>
          Reviewed-by: Dan Hecht <dhecht@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          attilaj Attila Jeges added a comment - commit 59b2db6ba722e5bef297bb4603519e06333ce5cb Author: Attila Jeges <attilaj@cloudera.com> Date: Mon Feb 20 18:07:25 2017 +0100 IMPALA-3079 : Fix sequence file writer This change fixes the following issues in the Sequence File Writer: 1. ReadWriteUtil::VLongRequiredBytes() and ReadWriteUtil::PutVLong() were broken. As a result, Impala created corrupt uncompressed sequence files. 2. KEY_CLASS_NAME was missing from the sequence file header. As a result, Hive could not read back uncompressed sequence files created by Impala. 3. Impala created record-compressed sequence files with empty keys block. As a result, Hive could not read back record-compressed sequence files created by Impala. 4. Impala created block-compressed files with: empty key-lengths block empty keys block empty value-lengths block This resulted in invalid block-compressed sequence files that Hive could not read back. 5. In some cases the wrong Record-compression flag was written to the sequence file header. As a result, Hive could not read back record- compressed sequence files created by Impala. 6. Impala added 'sync_marker' instead of 'neg1_sync_marker' to the beginning of blocks in block-compressed sequence files. Hive could not read these files back. 7. The calculation of block sizes in SnappyBlockCompressor class was incorrect for odd-length buffers. Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Reviewed-on: http://gerrit.cloudera.org:8080/6107 Reviewed-by: Michael Ho <kwho@cloudera.com> Reviewed-by: Attila Jeges <attilaj@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              attilaj Attila Jeges
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development