Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2892

tserver crashed while dropping range partition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.9.0
    • None
    • tablet
    • None

    Description

      On one of our production clusters, a tserver crashed yesterday morning while dropping a range partition, and below is error-msg:

      // code placeholder
      Log file created at: 2019/07/11 01:51:30
      Running on machine: kudu31.jd.163.org
      Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
      E0711 01:51:30.331185 11840 env_posix.cc:316] I/O error, context: /mnt/dfs/0/kudu/tserver/data/data/9305dce18e6f4100b486b605617122b3.data
      E0711 01:51:30.337604 11840 data_dirs.cc:1120] Directory /mnt/dfs/0/kudu/tserver/data/data marked as failed
      F0711 04:00:51.835958 68948 ts_tablet_manager.cc:940] Failed to delete tablet data for 2278f736bf6548e2b773003c1ba7ed66: Invalid argument: Unable to delete on-disk data from tablet 2278f736bf6548e2b773003c1ba7ed66: The metadata for tablet 2278f736bf6548e2b773003c1ba7ed66 still references orphaned blocks. Call DeleteTabletData() first
      

      It seems the new orphan blocks that were not deleted caused this problem after a disk was marked as bad. I attached an info-msg file about tablet '2278f736bf6548e2b773003c1ba7ed66'. Our kudu version is 1.9.x 6a9cf4.

      For brevity, I made a quick generalization:

      1. 01:51:30.331185: bad disk /mnt/dfs/0 was detected
      2. 01:51:30.344581: failing tablet
      3. 01:51:30.870059: Initiating tablet copy
      4. 04:00:51.820354: Processing DeleteTablet
      5. 04:00:51.835958: Crashed.

       

      Attachments

        1. tserver-INFO.log
          134 kB
          LiFu He

        Activity

          People

            Unassigned Unassigned
            helifu LiFu He
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: