Uploaded image for project: 'Apache HAWQ'
  1. Apache HAWQ
  2. HAWQ-42

Disk file corrupt will make HAWQ coredump when read-shortcircuit is enabled in hdfs-client.xml

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Resolved
    • None
    • 2.0.0.0-incubating
    • libhdfs
    • None

    Description

      Running workload ( tpch_row_10g_nocompression_no_partition) on a 128 node cluster, these queries (q1,q3,q4,q5,q6,w7,q8,q9,q10,q12,q14,q15,q17,q18,q19,q20,q21) failed out for query executor error and core dump.

      (gdb) bt
      #0  0x000000350b40f5db in raise () from /lib64/libpthread.so.0
      #1  0x0000000000ac77fa in SafeHandlerForSegvBusIll (processName=<value optimized out>, postgres_signal_arg=7) at elog.c:4497
      #2  <signal handler called>
      #3  0x00007f1b445690c2 in _mm_crc32_u64 (this=0x261fcd0, b=0x7f1b0d6d7000, len=512) at /opt/gcc-4.4.2/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/include/smmintrin.h:716
      #4  Hdfs::Internal::HWCrc32c::update (this=0x261fcd0, b=0x7f1b0d6d7000, len=512) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/common/HWCrc32c.cpp:114
      #5  0x00007f1b44549692 in Hdfs::Internal::LocalBlockReader::readAndVerify (this=0x26075a0, bufferSize=2097152) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/LocalBlockReader.cpp:174
      #6  0x00007f1b4454996f in Hdfs::Internal::LocalBlockReader::readInternal (this=0x26075a0, buf=0x3057b20 "Pb\370\003V\246X", len=<value optimized out>)
          at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/LocalBlockReader.cpp:227
      #7  0x00007f1b44549a13 in Hdfs::Internal::LocalBlockReader::read (this=0xffffffff, buf=0x7f1b0d6d7000 <Address 0x7f1b0d6d7000 out of bounds>, size=64)
          at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/LocalBlockReader.cpp:240
      #8  0x00007f1b4453bc3a in Hdfs::Internal::InputStreamImpl::readOneBlock (this=0x2768f20, buf=0x3057b20 "Pb\370\003V\246X", size=65536, shouldUpdateMetadataOnFailure=<value optimized out>)
          at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/InputStreamImpl.cpp:563
      #9  0x00007f1b4453c163 in Hdfs::Internal::InputStreamImpl::readInternal (this=0x2768f20, buf=0x3057b20 "Pb\370\003V\246X", size=65536) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/InputStreamImpl.cpp:666
      #10 0x00007f1b4453c5bb in Hdfs::Internal::InputStreamImpl::read (this=0x2768f20, buf=0x3057b20 "Pb\370\003V\246X", size=65536) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/InputStreamImpl.cpp:507
      #11 0x00007f1b44530e8c in hdfsRead (fs=<value optimized out>, file=<value optimized out>, buffer=0xffffffff, length=225275904) at /data/pulse2-agent/agents/agent1/work/LIBHDFS3-2.0-stash/rhel5_x86_64/src/client/Hdfs.cpp:800
      #12 0x00007f1b2138ab7d in gpfs_hdfs_read (fcinfo=<value optimized out>) at gpfshdfs.c:492
      #13 0x000000000092b48b in HdfsRead (protocol=<value optimized out>, fileSystem=<value optimized out>, file=<value optimized out>, buffer=<value optimized out>, length=<value optimized out>) at filesystem.c:533
      #14 0x000000000091c385 in HdfsFileRead (file=6, buffer=0x3057b20 "Pb\370\003V\246X", amount=65536) at fd.c:2722
      #15 FileRead (file=6, buffer=0x3057b20 "Pb\370\003V\246X", amount=65536) at fd.c:3133
      #16 0x0000000000bcc416 in BufferedReadIo (bufferedRead=0x3009f08, newMaxReadAheadLen=<value optimized out>, growBufferLen=<value optimized out>, isUseSplitLen=<value optimized out>) at cdbbufferedread.c:198
      #17 BufferedReadUseBeforeBuffer (bufferedRead=0x3009f08, newMaxReadAheadLen=<value optimized out>, growBufferLen=<value optimized out>, isUseSplitLen=<value optimized out>) at cdbbufferedread.c:317
      #18 BufferedReadGrowBuffer (bufferedRead=0x3009f08, newMaxReadAheadLen=<value optimized out>, growBufferLen=<value optimized out>, isUseSplitLen=<value optimized out>) at cdbbufferedread.c:647
      #19 0x0000000000bc6b79 in AppendOnlyStorageRead_InternalGetBuffer (storageRead=0x3009eb8, isUseSplitLen=0 '\000') at cdbappendonlystorageread.c:1223
      #20 AppendOnlyStorageRead_GetBuffer (storageRead=0x3009eb8, isUseSplitLen=0 '\000') at cdbappendonlystorageread.c:1289
      #21 0x0000000000599a1e in AppendOnlyExecutorReadBlock_GetContents (scan=0x3009d98, direction=<value optimized out>, slot=0x2fdfed8) at appendonlyam.c:628
      #22 getNextBlock (scan=0x3009d98, direction=<value optimized out>, slot=0x2fdfed8) at appendonlyam.c:1243
      #23 appendonlygettup (scan=0x3009d98, direction=<value optimized out>, slot=0x2fdfed8) at appendonlyam.c:1283
      #24 appendonly_getnext (scan=0x3009d98, direction=<value optimized out>, slot=0x2fdfed8) at appendonlyam.c:1673
      #25 0x000000000075de16 in AppendOnlyScanNext (scanState=<value optimized out>) at execAOScan.c:39
      #26 0x0000000000751f1b in ExecScan (scanState=0x2ffea70) at execScan.c:129
      #27 ExecTableScanRelation (scanState=0x2ffea70) at execScan.c:441
      #28 0x0000000000788a73 in ExecTableScan (node=0x2ffea70) at nodeTableScan.c:42
      #29 0x00000000007469dd in ExecProcNode (node=0x2ffea70) at execProcnode.c:904
      #30 0x000000000077efe6 in execMotionSender (node=0x2ffd2d0) at nodeMotion.c:348
      #31 ExecMotion (node=0x2ffd2d0) at nodeMotion.c:315
      #32 0x0000000000746b71 in ExecProcNode (node=0x2ffd2d0) at execProcnode.c:999
      #33 0x000000000073a8ac in ExecutePlan (estate=0x274bb60, planstate=<value optimized out>, operation=<value optimized out>, numberTuples=<value optimized out>, direction=<value optimized out>, dest=<value optimized out>) at execMain.c:3181
      #34 0x000000000073b1f2 in ExecutorRun (queryDesc=<value optimized out>, direction=<value optimized out>, count=<value optimized out>) at execMain.c:1166
      #35 0x0000000000976ec9 in PortalRunSelect (portal=<value optimized out>, count=0, isTopLevel=<value optimized out>, dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:1641
      #36 PortalRun (portal=<value optimized out>, count=0, isTopLevel=<value optimized out>, dest=<value optimized out>, altdest=<value optimized out>, completionTag=<value optimized out>) at pquery.c:1463
      #37 0x000000000096f488 in exec_mpp_query (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:1378
      #38 PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:4866
      #39 0x00000000008cf51b in BackendRun (port=0x260d420) at postmaster.c:5844
      #40 BackendStartup (port=0x260d420) at postmaster.c:5437
      #41 0x00000000008d4fef in ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:2139
      #42 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1431
      #43 0x00000000007d6aea in main (argc=9, argv=0x2609d20) at main.c:226
      (gdb) 
      

      Attachments

        Activity

          People

            wangzw Zhanwei Wang
            xsheng Xiang Sheng
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: