Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Labels:
      None

      Description

      I was experimenting with write ahead log write performance and noticed that when I wrote 1G of data that the walog was 1.5G.

        Activity

        Hide
        hudson Hudson added a comment -

        Integrated in Accumulo-Trunk #586 (See https://builds.apache.org/job/Accumulo-Trunk/586/)
        ACCUMULO-786 added a delete mutation to unit test (Revision 1424147)
        ACCUMULO-786 added timestamp setting to unit test (Revision 1424105)

        Result = ABORTED
        ecn :
        Files :

        • /accumulo/trunk/server/src/test/java/org/apache/accumulo/server/logger/LogFileTest.java

        ecn :
        Files :

        • /accumulo/trunk/server/src/test/java/org/apache/accumulo/server/logger/LogFileTest.java
        Show
        hudson Hudson added a comment - Integrated in Accumulo-Trunk #586 (See https://builds.apache.org/job/Accumulo-Trunk/586/ ) ACCUMULO-786 added a delete mutation to unit test (Revision 1424147) ACCUMULO-786 added timestamp setting to unit test (Revision 1424105) Result = ABORTED ecn : Files : /accumulo/trunk/server/src/test/java/org/apache/accumulo/server/logger/LogFileTest.java ecn : Files : /accumulo/trunk/server/src/test/java/org/apache/accumulo/server/logger/LogFileTest.java
        Hide
        kturner Keith Turner added a comment -

        I thnik keeping old mutation around is good, it makes it easier to write tests.

        Speaking of tests, I think the unit test should explicitly set the timestamp (in addition to not setting it). Also, should probably test delete.

        Show
        kturner Keith Turner added a comment - I thnik keeping old mutation around is good, it makes it easier to write tests. Speaking of tests, I think the unit test should explicitly set the timestamp (in addition to not setting it). Also, should probably test delete.
        Hide
        ecn Eric Newton added a comment -

        All good ideas. I'll make these changes. The use of OldMutation as a test was just me being lazy. I'll use it to generate some test data, and then eliminate the class.

        Show
        ecn Eric Newton added a comment - All good ideas. I'll make these changes. The use of OldMutation as a test was just me being lazy. I'll use it to generate some test data, and then eliminate the class.
        Hide
        kturner Keith Turner added a comment -

        Looking at the commit just made, I think we should restructure the code to have mutation class in the server code. The server side Mutation could extend the client side mutation and add setSystemTimestamp(). The client code should not have setSystemTimestamp(). Also there is no reason for thrift to transfer the system time stamp that the user should never set.

        Should probably make the mutation members private instead of package private. I see this was done for testing, but can this be accomplished with the public API?

        Show
        kturner Keith Turner added a comment - Looking at the commit just made, I think we should restructure the code to have mutation class in the server code. The server side Mutation could extend the client side mutation and add setSystemTimestamp(). The client code should not have setSystemTimestamp(). Also there is no reason for thrift to transfer the system time stamp that the user should never set. Should probably make the mutation members private instead of package private. I see this was done for testing, but can this be accomplished with the public API?
        Hide
        ecn Eric Newton added a comment -

        I got the 36 bytes down to 11, for small entries, and performed some tests writing write-ahead logs. I'm seeing a 10-20% improvement in write speed. I am not seeing much of an improvement in ContinuousIngest, however.

        In addition, I'm using a system timestamp in Mutation, which is separate from the timestamp in the serialized ColumnUpdate. That means the tablet server isn't deserializing the ColumnUpdates just to set the timestamps. That will save the work of creating lots of small objects, just to throw them all away.

        Show
        ecn Eric Newton added a comment - I got the 36 bytes down to 11, for small entries, and performed some tests writing write-ahead logs. I'm seeing a 10-20% improvement in write speed. I am not seeing much of an improvement in ContinuousIngest, however. In addition, I'm using a system timestamp in Mutation, which is separate from the timestamp in the serialized ColumnUpdate. That means the tablet server isn't deserializing the ColumnUpdates just to set the timestamps. That will save the work of creating lots of small objects, just to throw them all away.
        Hide
        kturner Keith Turner added a comment -

        When I ran this experiment I was generating mutations w/ 100 byte rows and empty columns.

        Show
        kturner Keith Turner added a comment - When I ran this experiment I was generating mutations w/ 100 byte rows and empty columns.
        Hide
        ecn Eric Newton added a comment -

        The mutation serialization code is pretty simple: lengths are encoded as 4-byte signed integers, timestamps which are normally not set by the user, are encoded as 8-byte signed longs. There's roughly 36 bytes of overhead for an empty mutation.

        We could encode the lengths and timestamps as variable-length integers. It might not be worth the extra encoding/decoding time, though. We'll need to experiment.

        Show
        ecn Eric Newton added a comment - The mutation serialization code is pretty simple: lengths are encoded as 4-byte signed integers, timestamps which are normally not set by the user, are encoded as 8-byte signed longs. There's roughly 36 bytes of overhead for an empty mutation. We could encode the lengths and timestamps as variable-length integers. It might not be worth the extra encoding/decoding time, though. We'll need to experiment.

          People

          • Assignee:
            ecn Eric Newton
            Reporter:
            kturner Keith Turner
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development