Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1341

Out of order UNDO in delta file, possibly related to REINSERT

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.6.0
    • 0.7.1, 0.8.0
    • tablet
    • None

    Description

      As reported by Bruce Song Zhang, there is a crasher in debug mode in tablet compaction under load.

      Here is a snippet of the log file he sent:

      I0219 10:03:27.503398 31721 delta_tracker.cc:456] Flushing 8 deltas from DMS 114...
      I0219 10:03:27.504659 31721 delta_tracker.cc:410] Flushed delta block: 2109978395164279242
      I0219 10:03:27.504781 31721 delta_tracker.cc:416] Reopened delta block for read: 2109978395164279242
      I0219 10:03:28.150115 31721 maintenance_manager.cc:359] Time spent running FlushDeltaMemStoresOp(c6069618c9864c1d8422912bf69efd44): real 0.667s	user 0.647s	sys 0.020s
      I0219 10:03:28.665895 31721 log_reader.cc:415] T c6069618c9864c1d8422912bf69efd44: removed 1 log segments from log reader
      I0219 10:03:28.665935 31721 log.cc:736] Deleting log segment in path: /export/servers/kudu/tserver_wal_data/wals/c6069618c9864c1d8422912bf69efd44/wal-000000223 (GCed ops < 61476431)
      I0219 10:03:28.704988 31721 maintenance_manager.cc:359] Time spent running LogGCOp(c6069618c9864c1d8422912bf69efd44): real 0.039s	user 0.000s	sys 0.039s
      I0219 10:03:29.456583 31721 tablet.cc:653] Flush: entering stage 1 (old memrowset already frozen for inserts)
      I0219 10:03:29.456614 31721 compaction.cc:591] Selected 1 rowsets to compact:
      I0219 10:03:29.456630 31721 compaction.cc:594] memrowset(current size on disk: ~0 bytes)
      I0219 10:03:29.456686 31721 tablet.cc:655] Memstore in-memory size: 33095447 bytes
      I0219 10:03:29.456742 31721 tablet.cc:1132] Flush: entering phase 1 (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}]
      I0219 10:03:29.457515 31721 multi_column_writer.cc:82] Opened CFile writer for column country_id[int32 NOT NULL]
      I0219 10:03:29.457700 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_id[int64 NOT NULL]
      I0219 10:03:29.457865 31721 multi_column_writer.cc:82] Opened CFile writer for column cur_day[int32 NOT NULL]
      I0219 10:03:29.458027 31721 multi_column_writer.cc:82] Opened CFile writer for column country_name[string NOT NULL]
      I0219 10:03:29.458173 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_name[string NOT NULL]
      I0219 10:03:29.458333 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_id[int32 NOT NULL]
      I0219 10:03:29.458493 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_name[string NOT NULL]
      I0219 10:03:29.458636 31721 multi_column_writer.cc:82] Opened CFile writer for column product_id[int64 NOT NULL]
      I0219 10:03:29.458777 31721 multi_column_writer.cc:82] Opened CFile writer for column product_name[string NOT NULL]
      I0219 10:03:29.458927 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1[int32 NOT NULL]
      I0219 10:03:29.459066 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1_name[string NOT NULL]
      I0219 10:03:29.459198 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2[int32 NOT NULL]
      I0219 10:03:29.459347 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2_name[string NOT NULL]
      I0219 10:03:29.459491 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3[int32 NOT NULL]
      I0219 10:03:29.459630 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3_name[string NOT NULL]
      I0219 10:03:29.459774 31721 multi_column_writer.cc:82] Opened CFile writer for column data_type[int8 NOT NULL]
      I0219 10:03:29.459910 31721 multi_column_writer.cc:82] Opened CFile writer for column price[double NOT NULL]
      I0219 10:03:29.460060 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_date[timestamp NOT NULL]
      I0219 10:03:29.460197 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_number[int64 NOT NULL]
      I0219 10:03:29.460340 31721 multi_column_writer.cc:82] Opened CFile writer for column gmv[double NOT NULL]
      W0219 10:03:29.572443 31721 compaction.cc:714] Found REINSERT REDO truncating row history for Source Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233) Redo Mutations: [@5963145475847274496(DELETE), @5963145476833153024(REINSERT (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233))] Undo Mutations: [@5963144433666650112(DELETE)]
      Dest Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233) Redo Mutations: [] Undo Mutations: [@5963145476833153024(DELETE)] Note: this warning will appear only for the first truncated row
      W0219 10:03:32.302188 31721 compaction.cc:834] Total 7 rows lost some history due to REINSERT after DELETE
      I0219 10:03:32.346256 31721 tablet.cc:1222] Flush: entering phase 2 (starting to duplicate updates in new rowsets)
      I0219 10:03:32.846774 31721 tablet.cc:1277] Flush Phase 2: carrying over any updates which arrived during Phase 1
      I0219 10:03:32.846839 31721 tablet.cc:1278] Phase 2 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}]
      I0219 10:03:34.485435 31721 tablet.cc:1311] Flush successful on 52051 rows (19462877 bytes)
      I0219 10:03:34.501569 31721 maintenance_manager.cc:359] Time spent running FlushMRSOp(c6069618c9864c1d8422912bf69efd44): real 5.541s	user 5.454s	sys 0.081s
      I0219 10:05:45.657285 31721 tablet.cc:1333] Compaction: stage 1 complete, picked 6 rowsets to compact
      I0219 10:05:45.657618 31721 compaction.cc:591] Selected 6 rowsets to compact:
      I0219 10:05:45.657727 31721 compaction.cc:594] RowSet(15418)(current size on disk: ~30348706 bytes)
      I0219 10:05:45.657793 31721 compaction.cc:594] RowSet(15544)(current size on disk: ~1123338 bytes)
      I0219 10:05:45.657855 31721 compaction.cc:594] RowSet(15551)(current size on disk: ~26151132 bytes)
      I0219 10:05:45.657886 31721 compaction.cc:594] RowSet(15610)(current size on disk: ~30315579 bytes)
      I0219 10:05:45.657913 31721 compaction.cc:594] RowSet(15611)(current size on disk: ~32233272 bytes)
      I0219 10:05:45.657934 31721 compaction.cc:594] RowSet(15617)(current size on disk: ~18863054 bytes)
      I0219 10:05:45.657984 31721 tablet.cc:1132] Compaction: entering phase 1 (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}]
      I0219 10:05:45.661224 31721 multi_column_writer.cc:82] Opened CFile writer for column country_id[int32 NOT NULL]
      I0219 10:05:45.661432 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_id[int64 NOT NULL]
      I0219 10:05:45.661598 31721 multi_column_writer.cc:82] Opened CFile writer for column cur_day[int32 NOT NULL]
      I0219 10:05:45.661761 31721 multi_column_writer.cc:82] Opened CFile writer for column country_name[string NOT NULL]
      I0219 10:05:45.661933 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_name[string NOT NULL]
      I0219 10:05:45.662081 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_id[int32 NOT NULL]
      I0219 10:05:45.662228 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_name[string NOT NULL]
      I0219 10:05:45.662384 31721 multi_column_writer.cc:82] Opened CFile writer for column product_id[int64 NOT NULL]
      I0219 10:05:45.662542 31721 multi_column_writer.cc:82] Opened CFile writer for column product_name[string NOT NULL]
      I0219 10:05:45.662695 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1[int32 NOT NULL]
      I0219 10:05:45.662832 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1_name[string NOT NULL]
      I0219 10:05:45.662962 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2[int32 NOT NULL]
      I0219 10:05:45.663100 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2_name[string NOT NULL]
      I0219 10:05:45.663239 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3[int32 NOT NULL]
      I0219 10:05:45.663393 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3_name[string NOT NULL]
      I0219 10:05:45.663538 31721 multi_column_writer.cc:82] Opened CFile writer for column data_type[int8 NOT NULL]
      I0219 10:05:45.663679 31721 multi_column_writer.cc:82] Opened CFile writer for column price[double NOT NULL]
      I0219 10:05:45.663830 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_date[timestamp NOT NULL]
      I0219 10:05:45.663964 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_number[int64 NOT NULL]
      I0219 10:05:45.664091 31721 multi_column_writer.cc:82] Opened CFile writer for column gmv[double NOT NULL]
      F0219 10:06:01.038265 31721 deltafile.cc:136] Check failed: last_key_.CompareTo<UNDO>(key) <= 0 must insert undo deltas in sorted order (ascending key, then descending ts): got key (row 278573@tx5962430420551352320) after (row 278573@tx5962430179277082624)
      

      It appears that the timestamps are misordered (in ascending order instead of descending order), triggering a DCHECK. We don't normally see this in load tests because we run with RELEASE mode in most of our long running stress tests.

      Attachments

        Issue Links

          Activity

            People

              tlipcon Todd Lipcon
              mpercy Mike Percy
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: