Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3151

segfault when repairing log block container with a missing LBM container data file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • fs
    • None

    Description

      We upgraded a cluster from 1.7 to 1.12 and saw the following segfault on one node:

      *** SIGSEGV (@0x20f2008) received by PID 35899 (TID 0x7ff7e40cc700) from PID 34545672; stack trace: ***
          @     0x7ff7f2a395d0 (unknown)
          @           0x9fe02e std::_Sp_counted_base<>::_M_release()
          @          0x2049f77 kudu::fs::LogBlockManager::Repair()
          @          0x204ae45 kudu::fs::LogBlockManager::RepairTask()
          @          0x228e67e kudu::ThreadPool::DispatchThread()
          @          0x228778f kudu::Thread::SuperviseThread()
          @     0x7ff7f2a31dd5 start_thread
          @     0x7ff7f0d0902d __clone
      

      When running kudu fs check we saw the following logs:

      I0617 09:17:37.681373 147811 fs_manager.cc:433] Time spent opening block manager: real 10.871s	user 0.215s	sys 0.162s
      Not found: Could not open container 74e7b95f8ccb4c7b98e52dc48049e967: /data/5/kudu/tablet/data/data/74e7b95f8ccb4c7b98e52dc48049e967.data: No such file or directory (error 2)
      

      and upon inspecting the files, we found 74e7b95f8ccb4c7b98e52dc48049e967.data was indeed missing, while the metadata file 74e7b95f8ccb4c7b98e52dc48049e967.metadata was present but non-empty (more creates than deletes, see attached).

      We were able to delete the metadata file, and I don't think we saw any failed tablets upon doing so (which may surface if the tablet were unable to find some necessary blocks at startup, eg PK blocks when reading min/max keys).

      It's possible the metadata might be left over from a LBM compaction, but it isn't clear what the exact issue is so far. It's also unclear whether the "missing" data file went missing before or after the upgrade, as we didn't run a kudu fs check before upgrading.

      Attachments

        1. metadump.txt
          1.72 MB
          Andrew Wong

        Activity

          People

            Unassigned Unassigned
            awong Andrew Wong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: