Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15837

Memstore size accounting is wrong if postBatchMutate() throws exception

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Over in PHOENIX-2883, I've been trying to figure out how to track down the root cause of an issue we were seeing where a negative memstoreSize was ultimately causing an RS to abort. The tl;dr version is

      • Something causes memstoreSize to be negative (not sure what is doing this yet)
      • All subsequent flushes short-circuit and don't run because they think there is no data to flush
      • The region is eventually closed (commonly, for a move).
      • A final flush is attempted on each store before closing (which also short-circuit for the same reason), leaving unflushed data in each store.
      • The sanity check that each store's size is zero fails and the RS aborts.

      I have a little patch which I think should improve our failure case around this, preventing the RS abort safely (forcing a flush when memstoreSize is negative) and logging a calltrace when an update to memstoreSize make it negative (to find culprits in the future).

      Attachments

        1. HBASE-15837.001.patch
          4 kB
          Josh Elser
        2. hbase-15837.branch-1.patch
          5 kB
          Enis Soztutar
        3. hbase-15837-v1.patch
          6 kB
          Enis Soztutar
        4. hbase-memstore-size-accounting.patch
          5 kB
          Enis Soztutar

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            elserj Josh Elser
            elserj Josh Elser
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment