Description
We have observed an issue in SLES clusters where the system timestamp regularly goes back in time. This happens frequently enough to cause test failures when LTT is used with updater.
Everytime an mutation is performed, the updater creates a string in the form "#column:mutation_type" and appends it to the column "mutate_info".
It seems that when the test fails, it is always the case that the mutate_info information for that particular column reported is not there in the column mutate_info. However, according to the MultiThreadedUpdater source code, if a row gets updated, all the columns will be mutated. So if a row contains 15 columns, all 15 should appear in mutate_info.
When the test fails though, we get an exception like:
2014-11-02 04:31:12,018 ERROR [HBaseReaderThread_7] util.MultiThreadedAction: Error checking data for key [b0485292cde20d8a76cca37410a9f115-23787], column family [test_cf], column [8], mutation [null]; value of length 818
For the same row, the mutate info DOES NOT contain columns 8 (and 9) while it should:
test_cf:mutate_info timestamp=1414902651388, value=#increment:1#0:0#1:0#10:3#11:0#12:3#13:0#14:0#15:0#16:2#2:3#3:0#4:2#5:3#6:0#7:0
Further debugging led to finding the root cause where It seems that on SUSE System.currentTimeMillis() can go back in time freely (especially when run in a virtualized env like EC2), and actually happens very frequently.
This is from a debug log that was put in place:
2014-11-04 01:16:05,025 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765025/Put/mvcc=8239/#increment:1 2014-11-04 01:16:05,038 INFO [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765038/Put/mvcc=8255/#increment:1#0:3 2014-11-04 01:16:05,047 INFO [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765047/Put/mvcc=8265/#increment:1#0:3#1:3 2014-11-04 01:16:05,057 INFO [B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765056/Put/mvcc=8274/#increment:1#0:3#1:3#10:2 2014-11-04 01:16:05,061 INFO [B.DefaultRpcServer.handler=6,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765061/Put/mvcc=8278/#increment:1#0:3#1:3#10:2#11:0 2014-11-04 01:16:05,070 INFO [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765070/Put/mvcc=8285/#increment:1#0:3#1:3#10:2#11:0#12:3 2014-11-04 01:16:05,076 INFO [B.DefaultRpcServer.handler=3,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765076/Put/mvcc=8289/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0 2014-11-04 01:16:05,084 INFO [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765084/Put/mvcc=8293/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0 2014-11-04 01:16:05,090 INFO [B.DefaultRpcServer.handler=7,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765090/Put/mvcc=8297/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0 2014-11-04 01:16:05,097 INFO [B.DefaultRpcServer.handler=0,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765097/Put/mvcc=8301/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0 2014-11-04 01:16:05,100 INFO [B.DefaultRpcServer.handler=14,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765100/Put/mvcc=8303/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0 2014-11-04 01:16:05,103 INFO [B.DefaultRpcServer.handler=11,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765103/Put/mvcc=8305/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0 2014-11-04 01:16:05,110 INFO [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765110/Put/mvcc=8309/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2 2014-11-04 01:16:05,114 INFO [B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765114/Put/mvcc=8312/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2 2014-11-04 01:16:05,119 INFO [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765119/Put/mvcc=8316/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0 2014-11-04 01:16:04,941 INFO [B.DefaultRpcServer.handler=22,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764941/Put/mvcc=8320/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#5:0 2014-11-04 01:16:04,945 INFO [B.DefaultRpcServer.handler=25,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764945/Put/mvcc=8324/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#6:0 2014-11-04 01:16:04,988 INFO [B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764988/Put/mvcc=8329/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#7:3 2014-11-04 01:16:04,998 INFO [B.DefaultRpcServer.handler=2,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764998/Put/mvcc=8337/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#8:2 2014-11-04 01:16:05,005 INFO [B.DefaultRpcServer.handler=17,queue=2,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765005/Put/mvcc=8343/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#9:3
Notice that we keep appending entries like #13:0 to the end. Up until
2014-11-04 01:16:05,119 INFO [B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765119/Put/mvcc=8316/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0
the timestamps are always increasing. Then the next line has a smaller timestamp. It still does the append though:
2014-11-04 01:16:04,941 INFO [B.DefaultRpcServer.handler=22,queue=1,port=60020] regionserver.MemStore: upserting: 193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764941/Put/mvcc=8320/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#5:0
The timestamp went from 1415063765119 to 1415063764941 (a 178ms difference). Since append code always checks for latest value (latest value in hbase sort order), then appends to the latest value, it does not look like an HBase level issue. You can even notice that the log4j timestamp is going back.
Apart from SUSE timestamp issues (and NTP setup etc), I think our append code should ensure that the timestamp of the new key value that is appended should not have a lesser timestamp than the old keyvalue. We can think about implementing an EnvEdgeManager to do monotonically increment the ts within the process, but that may be expensive.
I think we can ensure to use the max timestamp in append anyway since that will ensure that even if system ts goes back, the new kv always sorts before the previous value.