Description
As reported by Bruce Song Zhang, there is a crasher in debug mode in tablet compaction under load.
Here is a snippet of the log file he sent:
I0219 10:03:27.503398 31721 delta_tracker.cc:456] Flushing 8 deltas from DMS 114... I0219 10:03:27.504659 31721 delta_tracker.cc:410] Flushed delta block: 2109978395164279242 I0219 10:03:27.504781 31721 delta_tracker.cc:416] Reopened delta block for read: 2109978395164279242 I0219 10:03:28.150115 31721 maintenance_manager.cc:359] Time spent running FlushDeltaMemStoresOp(c6069618c9864c1d8422912bf69efd44): real 0.667s user 0.647s sys 0.020s I0219 10:03:28.665895 31721 log_reader.cc:415] T c6069618c9864c1d8422912bf69efd44: removed 1 log segments from log reader I0219 10:03:28.665935 31721 log.cc:736] Deleting log segment in path: /export/servers/kudu/tserver_wal_data/wals/c6069618c9864c1d8422912bf69efd44/wal-000000223 (GCed ops < 61476431) I0219 10:03:28.704988 31721 maintenance_manager.cc:359] Time spent running LogGCOp(c6069618c9864c1d8422912bf69efd44): real 0.039s user 0.000s sys 0.039s I0219 10:03:29.456583 31721 tablet.cc:653] Flush: entering stage 1 (old memrowset already frozen for inserts) I0219 10:03:29.456614 31721 compaction.cc:591] Selected 1 rowsets to compact: I0219 10:03:29.456630 31721 compaction.cc:594] memrowset(current size on disk: ~0 bytes) I0219 10:03:29.456686 31721 tablet.cc:655] Memstore in-memory size: 33095447 bytes I0219 10:03:29.456742 31721 tablet.cc:1132] Flush: entering phase 1 (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}] I0219 10:03:29.457515 31721 multi_column_writer.cc:82] Opened CFile writer for column country_id[int32 NOT NULL] I0219 10:03:29.457700 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_id[int64 NOT NULL] I0219 10:03:29.457865 31721 multi_column_writer.cc:82] Opened CFile writer for column cur_day[int32 NOT NULL] I0219 10:03:29.458027 31721 multi_column_writer.cc:82] Opened CFile writer for column country_name[string NOT NULL] I0219 10:03:29.458173 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_name[string NOT NULL] I0219 10:03:29.458333 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_id[int32 NOT NULL] I0219 10:03:29.458493 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_name[string NOT NULL] I0219 10:03:29.458636 31721 multi_column_writer.cc:82] Opened CFile writer for column product_id[int64 NOT NULL] I0219 10:03:29.458777 31721 multi_column_writer.cc:82] Opened CFile writer for column product_name[string NOT NULL] I0219 10:03:29.458927 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1[int32 NOT NULL] I0219 10:03:29.459066 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1_name[string NOT NULL] I0219 10:03:29.459198 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2[int32 NOT NULL] I0219 10:03:29.459347 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2_name[string NOT NULL] I0219 10:03:29.459491 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3[int32 NOT NULL] I0219 10:03:29.459630 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3_name[string NOT NULL] I0219 10:03:29.459774 31721 multi_column_writer.cc:82] Opened CFile writer for column data_type[int8 NOT NULL] I0219 10:03:29.459910 31721 multi_column_writer.cc:82] Opened CFile writer for column price[double NOT NULL] I0219 10:03:29.460060 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_date[timestamp NOT NULL] I0219 10:03:29.460197 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_number[int64 NOT NULL] I0219 10:03:29.460340 31721 multi_column_writer.cc:82] Opened CFile writer for column gmv[double NOT NULL] W0219 10:03:29.572443 31721 compaction.cc:714] Found REINSERT REDO truncating row history for Source Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233) Redo Mutations: [@5963145475847274496(DELETE), @5963145476833153024(REINSERT (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233))] Undo Mutations: [@5963144433666650112(DELETE)] Dest Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233) Redo Mutations: [] Undo Mutations: [@5963145476833153024(DELETE)] Note: this warning will appear only for the first truncated row W0219 10:03:32.302188 31721 compaction.cc:834] Total 7 rows lost some history due to REINSERT after DELETE I0219 10:03:32.346256 31721 tablet.cc:1222] Flush: entering phase 2 (starting to duplicate updates in new rowsets) I0219 10:03:32.846774 31721 tablet.cc:1277] Flush Phase 2: carrying over any updates which arrived during Phase 1 I0219 10:03:32.846839 31721 tablet.cc:1278] Phase 2 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}] I0219 10:03:34.485435 31721 tablet.cc:1311] Flush successful on 52051 rows (19462877 bytes) I0219 10:03:34.501569 31721 maintenance_manager.cc:359] Time spent running FlushMRSOp(c6069618c9864c1d8422912bf69efd44): real 5.541s user 5.454s sys 0.081s I0219 10:05:45.657285 31721 tablet.cc:1333] Compaction: stage 1 complete, picked 6 rowsets to compact I0219 10:05:45.657618 31721 compaction.cc:591] Selected 6 rowsets to compact: I0219 10:05:45.657727 31721 compaction.cc:594] RowSet(15418)(current size on disk: ~30348706 bytes) I0219 10:05:45.657793 31721 compaction.cc:594] RowSet(15544)(current size on disk: ~1123338 bytes) I0219 10:05:45.657855 31721 compaction.cc:594] RowSet(15551)(current size on disk: ~26151132 bytes) I0219 10:05:45.657886 31721 compaction.cc:594] RowSet(15610)(current size on disk: ~30315579 bytes) I0219 10:05:45.657913 31721 compaction.cc:594] RowSet(15611)(current size on disk: ~32233272 bytes) I0219 10:05:45.657934 31721 compaction.cc:594] RowSet(15617)(current size on disk: ~18863054 bytes) I0219 10:05:45.657984 31721 tablet.cc:1132] Compaction: entering phase 1 (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}] I0219 10:05:45.661224 31721 multi_column_writer.cc:82] Opened CFile writer for column country_id[int32 NOT NULL] I0219 10:05:45.661432 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_id[int64 NOT NULL] I0219 10:05:45.661598 31721 multi_column_writer.cc:82] Opened CFile writer for column cur_day[int32 NOT NULL] I0219 10:05:45.661761 31721 multi_column_writer.cc:82] Opened CFile writer for column country_name[string NOT NULL] I0219 10:05:45.661933 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_name[string NOT NULL] I0219 10:05:45.662081 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_id[int32 NOT NULL] I0219 10:05:45.662228 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_name[string NOT NULL] I0219 10:05:45.662384 31721 multi_column_writer.cc:82] Opened CFile writer for column product_id[int64 NOT NULL] I0219 10:05:45.662542 31721 multi_column_writer.cc:82] Opened CFile writer for column product_name[string NOT NULL] I0219 10:05:45.662695 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1[int32 NOT NULL] I0219 10:05:45.662832 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1_name[string NOT NULL] I0219 10:05:45.662962 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2[int32 NOT NULL] I0219 10:05:45.663100 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2_name[string NOT NULL] I0219 10:05:45.663239 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3[int32 NOT NULL] I0219 10:05:45.663393 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3_name[string NOT NULL] I0219 10:05:45.663538 31721 multi_column_writer.cc:82] Opened CFile writer for column data_type[int8 NOT NULL] I0219 10:05:45.663679 31721 multi_column_writer.cc:82] Opened CFile writer for column price[double NOT NULL] I0219 10:05:45.663830 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_date[timestamp NOT NULL] I0219 10:05:45.663964 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_number[int64 NOT NULL] I0219 10:05:45.664091 31721 multi_column_writer.cc:82] Opened CFile writer for column gmv[double NOT NULL] F0219 10:06:01.038265 31721 deltafile.cc:136] Check failed: last_key_.CompareTo<UNDO>(key) <= 0 must insert undo deltas in sorted order (ascending key, then descending ts): got key (row 278573@tx5962430420551352320) after (row 278573@tx5962430179277082624)
It appears that the timestamps are misordered (in ascending order instead of descending order), triggering a DCHECK. We don't normally see this in load tests because we run with RELEASE mode in most of our long running stress tests.
Attachments
Issue Links
- is related to
-
KUDU-1354 MVCC Snapshots chosen during flush can contain out-of-order transactions
- Resolved