Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14680

Two configs for snapshot timeout and better defaults

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      One of the clusters timed out taking a snapshot for a disabled table. The table is big enough, and the master operation takes more than 1 min to complete. However while trying to increase the timeout, we noticed that there are two parameters with very similar names configuring different things:

      hbase.snapshot.master.timeout.millis is defined in SnapshotDescriptionUtils and is send to client side and used in disabled table snapshot.

      hbase.snapshot.master.timeoutMillis is defined in SnapshotManager and used as the timeout for the procedure execution.

      So, there are a couple of improvements that we can do:

      • 1 min is too low for big tables. We need to set this to 5 min or 10 min by default. Even a 6T table which is medium sized fails.
      • Unify the two timeouts into one. Decide on either of them, and deprecate the other. Use the biggest one for BC.
      • Add the timeout to hbase-default.xml.
      • Why do we even have a timeout for disabled table snapshots? The master is doing the work so we should not timeout in any case.

      Attachments

        1. hbase-14680_v3.patch
          10 kB
          Enis Soztutar
        2. HBASE-14680_v2.patch
          9 kB
          Heng Chen
        3. HBASE-14680_v1.patch
          8 kB
          Heng Chen
        4. HBASE-14680.patch
          8 kB
          Heng Chen

        Activity

          People

            chenheng Heng Chen
            enis Enis Soztutar
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: