Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      Keith Turner showed me a scenario with Continuous Ingest, where the system was configured to use "flush" for it's durability, but the Metadata table was configured to use "sync" (default for metadata). When the system grew large enough to have many tablets, lots of tablets were constantly writing to the metadata table on a relatively few number of tablet servers. This caused all ingest to drop to 0 very frequently as the tables waiting on a "flush" had to wait for the WAL to "sync".

      This problem is primarily caused by the fact that we have only one WAL per tablet server, and so tablet writes waiting on a flush had to wait on a sync if any concurrent write to the tablet server required a sync.

      We could possibly alleviate this problem if we permitted tablet servers to have multiple WALs open. One potentially good way to manage these multiple WALs is to group them by durability, so there'd be one WAL for sync, and another for flush. That way, writes requiring a flush would not wait on sync's.

      1. screenshot-1.png
        54 kB
        Keith Turner

        Issue Links

          Activity

          Hide
          ctubbsii Christopher Tubbs added a comment - - edited

          And one WAL for Durability.LOG (I forgot about that one).

          Show
          ctubbsii Christopher Tubbs added a comment - - edited And one WAL for Durability.LOG (I forgot about that one).
          Hide
          kturner Keith Turner added a comment -

          In screenshot-1 Continuous ingest (CI) was running on 8 d2.xlarge EC2 tservers. The CI table had 2K tablets and was constantly minor compacting. Each minor compaction wrote an entry to the metadata table, which caused and hsync on that tserver. So at least one of the tservers was constantly calling hsync. While test was running I reconfigured the accumulo metadata table to use flush and performance improved.

          Show
          kturner Keith Turner added a comment - In screenshot-1 Continuous ingest (CI) was running on 8 d2.xlarge EC2 tservers. The CI table had 2K tablets and was constantly minor compacting. Each minor compaction wrote an entry to the metadata table, which caused and hsync on that tserver. So at least one of the tservers was constantly calling hsync. While test was running I reconfigured the accumulo metadata table to use flush and performance improved.
          Hide
          kturner Keith Turner added a comment -

          Note I was running test for 1.7.1-rc1 (with a patch for ACCUMULO-4141)

          Show
          kturner Keith Turner added a comment - Note I was running test for 1.7.1-rc1 (with a patch for ACCUMULO-4141 )

            People

            • Assignee:
              Unassigned
              Reporter:
              ctubbsii Christopher Tubbs
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development