Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: master, tserver
    • Labels:
      None

      Description

      After reading the proposal on HBASE-10278, I realized there are many ways to make the Accumulo WAL roll-over faster.

      1. Open two WALogs, but use only one until it reaches the WALog roll-over size
      2. Rollover consists only of swapping the writers
      3. WALog roll consists of the final close, which can happen in parallel
      4. Don't mark the tablets with log entries: they are already marked with the tserver
      5. The tserver can make notes about the logs-in-use in the metadata table(s) as part of opening the log.
      6. The master can copy the log entries to tablets while unassigning them, piggybacking on the unassigment mutation.
      7. Tablet servers can remove their current log entries from the metadata tables when they have no tablets using them.

      There are two issues:

      1. tablets will have an empty file in recovery, nearly all the time, but the recovery code already handles that case.
      2. presently, a tablet doesn't have a marker for a log it did not use. Many more tablets will attempt to recover when it is unnecessary.

      This would also address ACCUMULO-2889.

        Issue Links

          Activity

          Hide
          ecn Eric Newton added a comment -
          1. Master could also clean up old tserver log markers, but it might be safer to let the GC do it.
          Show
          ecn Eric Newton added a comment - Master could also clean up old tserver log markers, but it might be safer to let the GC do it.
          Hide
          ecn Eric Newton added a comment -

          Need to write the WAL entry for each tablet level: normal tablets to the metadata table, metadata tablets to the root table, and root table to zookeeper.

          Show
          ecn Eric Newton added a comment - Need to write the WAL entry for each tablet level: normal tablets to the metadata table, metadata tablets to the root table, and root table to zookeeper.
          Hide
          ecn Eric Newton added a comment - - edited

          I have prototyped the approach.

          • eliminated the lazy update of log entries for every tablet
          • open the next WAL file asynchronously
          • pre-write the meta entries for the new log

          I wrote a test that compares the performance of continuous ingest using a very small WAL rollover, and one that will not rollover at all.

          The methodology of the test:

          1. Set the WAL size to 10M
          2. Create a table with 50 splits per tablet server
          3. Wait for balance
          4. Time 2M continuous ingest entries
          5. Drop the table
          6. Take the average of three attempts
          7. Reset the WAL size to 1G, restart the tablet servers
          8. Perform the same ingest tests

          There are some minor configuration adjustments for the test (otherwise, just standard MAC):

          tserver.wal.replication=1
          table.minc.logs.max=100
          gc.file.archive=false
          

          Before the changes, the WAL roll-over caused the small WAL test to run at 130% of the large WAL test.

          Afterward, they are 108 - 117%.

          I haven't written the recovery code, file GC modifications or dealt with backwards compatibility.

          The extremely small WAL ensures lots of rollovers, but the #tablets/tserver is reasonable.

          Show
          ecn Eric Newton added a comment - - edited I have prototyped the approach. eliminated the lazy update of log entries for every tablet open the next WAL file asynchronously pre-write the meta entries for the new log I wrote a test that compares the performance of continuous ingest using a very small WAL rollover, and one that will not rollover at all. The methodology of the test: Set the WAL size to 10M Create a table with 50 splits per tablet server Wait for balance Time 2M continuous ingest entries Drop the table Take the average of three attempts Reset the WAL size to 1G, restart the tablet servers Perform the same ingest tests There are some minor configuration adjustments for the test (otherwise, just standard MAC): tserver.wal.replication=1 table.minc.logs.max=100 gc.file.archive=false Before the changes, the WAL roll-over caused the small WAL test to run at 130% of the large WAL test. Afterward, they are 108 - 117%. I haven't written the recovery code, file GC modifications or dealt with backwards compatibility. The extremely small WAL ensures lots of rollovers, but the #tablets/tserver is reasonable.
          Hide
          ecn Eric Newton added a comment -

          Have all the ITs working now, except those related to replication. I talked to Josh Elser about the changes necessary.

          • the tablet server or master should inform replication when a WAL is closed
          • the GC should defer deleting WALs in use by replication
          • the tablet server should just let the GC clean up WALs
          Show
          ecn Eric Newton added a comment - Have all the ITs working now, except those related to replication. I talked to Josh Elser about the changes necessary. the tablet server or master should inform replication when a WAL is closed the GC should defer deleting WALs in use by replication the tablet server should just let the GC clean up WALs
          Hide
          ecn Eric Newton added a comment -

          Comparison of Continuous Ingest on AWS cluster.

          Show
          ecn Eric Newton added a comment - Comparison of Continuous Ingest on AWS cluster.
          Hide
          elserj Josh Elser added a comment -

          Woo! Great work, Eric Newton!!

          Show
          elserj Josh Elser added a comment - Woo! Great work, Eric Newton !!
          Hide
          elserj Josh Elser added a comment -

          I just want to state: these changes are making me really nervous for inclusion in 1.7.0. I don't believe enough rigor has gone into actually flushing out bugs. I'll be continuing to work on this over the weekend, but I want to make everyone else aware of my considerations about reverting these changes for 1.7.0.

          Show
          elserj Josh Elser added a comment - I just want to state: these changes are making me really nervous for inclusion in 1.7.0. I don't believe enough rigor has gone into actually flushing out bugs. I'll be continuing to work on this over the weekend, but I want to make everyone else aware of my considerations about reverting these changes for 1.7.0.
          Hide
          elserj Josh Elser added a comment -

          After finding another issue with WALs being deleted prematurely, I'm working on reverting these changes for 1.7. I'll leave them in place for 1.8/master.

          Show
          elserj Josh Elser added a comment - After finding another issue with WALs being deleted prematurely, I'm working on reverting these changes for 1.7. I'll leave them in place for 1.8/master.
          Hide
          elserj Josh Elser added a comment -

          Just pushed the revert to 1.7 and merge -sours to master. Fixing the bugs can proceed against 1.8.

          Show
          elserj Josh Elser added a comment - Just pushed the revert to 1.7 and merge -sours to master. Fixing the bugs can proceed against 1.8.
          Hide
          elserj Josh Elser added a comment -

          Also, once 1.7.0 gets out, I'll try to start spending more time on the breakages, but anyone can feel free to snipe them from me if before then.

          Show
          elserj Josh Elser added a comment - Also, once 1.7.0 gets out, I'll try to start spending more time on the breakages, but anyone can feel free to snipe them from me if before then.

            People

            • Assignee:
              ecn Eric Newton
              Reporter:
              ecn Eric Newton
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 50m
                3h 50m

                  Development