Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-12728

buffered writes substantially less useful after removal of HTablePool

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.98.0
    • 1.0.0, 1.1.0
    • None
    • None
    • Reviewed
    • Hide
      In our pre-1.0 API, HTable is considered a light-weight object that consumed by a single thread at a time. The HTablePool class provided a means of sharing multiple HTable instances across a number of threads. As an optimization, HTable managed a "write buffer", accumulating edits and sending a "batch" all at once. By default the batch was sent as the last step in invocations of put(Put) and put(List<Put>). The user could disable the automatic flushing of the write buffer, retaining edits locally and only sending the whole "batch" once the write buffer has filled or when the flushCommits() method in invoked explicitly. Explicit or implicit batch writing was controlled by the setAutoFlushTo(boolean) method. A value of true (the default) had the write buffer flushed at the completion of a call to put(Put) or put(List<Put>). A value of false allowed for explicit buffer management. HTable also exposed the buffer to consumers via getWriteBuffer().

      The combination of HTable with setAutoFlushTo(false) and the HTablePool provided a convenient mechanism by which multiple "Put-producing" threads could share a common write buffer. Both HTablePool and HTable are deprecated, and they are officially replaced in The new 1.0 API by Table and BufferedMutator. Table, which replaces HTable, no longer exposes explicit write-buffer management. Instead, explicit buffer management is exposed via BufferedMutator. BufferedMutator is made safe for concurrent use. Where code would previously retrieve and return HTables from an HTablePool, now that code creates and shares a single BufferedMutator instance across all threads.
      Show
      In our pre-1.0 API, HTable is considered a light-weight object that consumed by a single thread at a time. The HTablePool class provided a means of sharing multiple HTable instances across a number of threads. As an optimization, HTable managed a "write buffer", accumulating edits and sending a "batch" all at once. By default the batch was sent as the last step in invocations of put(Put) and put(List<Put>). The user could disable the automatic flushing of the write buffer, retaining edits locally and only sending the whole "batch" once the write buffer has filled or when the flushCommits() method in invoked explicitly. Explicit or implicit batch writing was controlled by the setAutoFlushTo(boolean) method. A value of true (the default) had the write buffer flushed at the completion of a call to put(Put) or put(List<Put>). A value of false allowed for explicit buffer management. HTable also exposed the buffer to consumers via getWriteBuffer(). The combination of HTable with setAutoFlushTo(false) and the HTablePool provided a convenient mechanism by which multiple "Put-producing" threads could share a common write buffer. Both HTablePool and HTable are deprecated, and they are officially replaced in The new 1.0 API by Table and BufferedMutator. Table, which replaces HTable, no longer exposes explicit write-buffer management. Instead, explicit buffer management is exposed via BufferedMutator. BufferedMutator is made safe for concurrent use. Where code would previously retrieve and return HTables from an HTablePool, now that code creates and shares a single BufferedMutator instance across all threads.

    Description

      In previous versions of HBase, when use of HTablePool was encouraged, HTable instances were long-lived in that pool, and for that reason, if autoFlush was set to false, the table instance could accumulate a full buffer of writes before a flush was triggered. Writes from the client to the cluster could then be substantially larger and less frequent than without buffering.

      However, when HTablePool was deprecated, the primary justification seems to have been that creating HTable instances is cheap, so long as the connection and executor service being passed to it are pre-provided. A use pattern was encouraged where users should create a new HTable instance for every operation, using an existing connection and executor service, and then close the table. In this pattern, buffered writes are substantially less useful; writes are as small and as frequent as they would have been with autoflush=true, except the synchronous write is moved from the operation itself to the table close call which immediately follows.

      More concretely :
      ```
      // Given these two helpers ...

      private HTableInterface getAutoFlushTable(String tableName) throws IOException {
      // (autoflush is true by default)
      return storedConnection.getTable(tableName, executorService);
      }

      private HTableInterface getBufferedTable(String tableName) throws IOException {
      HTableInterface table = getAutoFlushTable(tableName);
      table.setAutoFlush(false);
      return table;
      }

      // it's my contention that these two methods would behave almost identically,
      // except the first will hit a synchronous flush during the put call,
      and the second will
      // flush during the (hidden) close call on table.

      private void writeAutoFlushed(Put somePut) throws IOException {
      try (HTableInterface table = getAutoFlushTable(tableName))

      { table.put(somePut); // will do synchronous flush }

      }

      private void writeBuffered(Put somePut) throws IOException {
      try (HTableInterface table = getBufferedTable(tableName))

      { table.put(somePut); }

      // auto-close will trigger synchronous flush
      }
      ```

      For buffered writes to actually provide a performance benefit to users, one of two things must happen:

      • The writeBuffer itself shouldn't live, flush and die with the lifecycle of it's HTableInstance. If the writeBuffer were managed elsewhere and had a long lifespan, this could cease to be an issue. However, if the same writeBuffer is appended to by multiple tables, then some additional concurrency control will be needed around it.
      • Alternatively, there should be some pattern for having long-lived HTable instances. However, since HTable is not thread-safe, we'd need multiple instances, and a mechanism for leasing them out safely – which sure sounds a lot like the old HTablePool to me.

      See discussion on mailing list here : http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E

      Attachments

        1. HBASE-12728-6.patch
          161 kB
          Nick Dimiduk
        2. HBASE-12728-6.patch
          161 kB
          Nick Dimiduk
        3. HBASE-12728-5.patch
          157 kB
          Nick Dimiduk
        4. HBASE-12728-4.patch
          148 kB
          Nick Dimiduk
        5. HBASE-12728-3.patch
          147 kB
          Nick Dimiduk
        6. HBASE-12728-2.patch
          150 kB
          Solomon Duskis
        7. hbase-12728-1.0-addendum-3.patch
          2 kB
          Enis Soztutar
        8. HBASE-12728.patch
          161 kB
          Solomon Duskis
        9. HBASE-12728.addendum.patch
          1 kB
          Nick Dimiduk
        10. HBASE-12728.06-branch-1.patch
          163 kB
          Nick Dimiduk
        11. HBASE-12728.06-branch-1.0.patch
          163 kB
          Nick Dimiduk
        12. HBASE-12728.05-branch-1.patch
          164 kB
          Nick Dimiduk
        13. HBASE-12728.05-branch-1.0.patch
          165 kB
          Nick Dimiduk
        14. bulk-mutator.patch
          116 kB
          Solomon Duskis
        15. 12728-1.0-addendum-2.txt
          0.8 kB
          Ted Yu
        16. 12728.connection-owns-buffers.example.branch-1.0.patch
          31 kB
          Nick Dimiduk

        Issue Links

          Activity

            People

              ndimiduk Nick Dimiduk
              abeppu Aaron Beppu
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: