Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1341

Out of order UNDO in delta file, possibly related to REINSERT

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.7.1, 0.8.0
    • Component/s: tablet
    • Labels:
      None
    • Target Version/s:

      Description

      As reported by Bruce Song Zhang, there is a crasher in debug mode in tablet compaction under load.

      Here is a snippet of the log file he sent:

      I0219 10:03:27.503398 31721 delta_tracker.cc:456] Flushing 8 deltas from DMS 114...
      I0219 10:03:27.504659 31721 delta_tracker.cc:410] Flushed delta block: 2109978395164279242
      I0219 10:03:27.504781 31721 delta_tracker.cc:416] Reopened delta block for read: 2109978395164279242
      I0219 10:03:28.150115 31721 maintenance_manager.cc:359] Time spent running FlushDeltaMemStoresOp(c6069618c9864c1d8422912bf69efd44): real 0.667s	user 0.647s	sys 0.020s
      I0219 10:03:28.665895 31721 log_reader.cc:415] T c6069618c9864c1d8422912bf69efd44: removed 1 log segments from log reader
      I0219 10:03:28.665935 31721 log.cc:736] Deleting log segment in path: /export/servers/kudu/tserver_wal_data/wals/c6069618c9864c1d8422912bf69efd44/wal-000000223 (GCed ops < 61476431)
      I0219 10:03:28.704988 31721 maintenance_manager.cc:359] Time spent running LogGCOp(c6069618c9864c1d8422912bf69efd44): real 0.039s	user 0.000s	sys 0.039s
      I0219 10:03:29.456583 31721 tablet.cc:653] Flush: entering stage 1 (old memrowset already frozen for inserts)
      I0219 10:03:29.456614 31721 compaction.cc:591] Selected 1 rowsets to compact:
      I0219 10:03:29.456630 31721 compaction.cc:594] memrowset(current size on disk: ~0 bytes)
      I0219 10:03:29.456686 31721 tablet.cc:655] Memstore in-memory size: 33095447 bytes
      I0219 10:03:29.456742 31721 tablet.cc:1132] Flush: entering phase 1 (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}]
      I0219 10:03:29.457515 31721 multi_column_writer.cc:82] Opened CFile writer for column country_id[int32 NOT NULL]
      I0219 10:03:29.457700 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_id[int64 NOT NULL]
      I0219 10:03:29.457865 31721 multi_column_writer.cc:82] Opened CFile writer for column cur_day[int32 NOT NULL]
      I0219 10:03:29.458027 31721 multi_column_writer.cc:82] Opened CFile writer for column country_name[string NOT NULL]
      I0219 10:03:29.458173 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_name[string NOT NULL]
      I0219 10:03:29.458333 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_id[int32 NOT NULL]
      I0219 10:03:29.458493 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_name[string NOT NULL]
      I0219 10:03:29.458636 31721 multi_column_writer.cc:82] Opened CFile writer for column product_id[int64 NOT NULL]
      I0219 10:03:29.458777 31721 multi_column_writer.cc:82] Opened CFile writer for column product_name[string NOT NULL]
      I0219 10:03:29.458927 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1[int32 NOT NULL]
      I0219 10:03:29.459066 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1_name[string NOT NULL]
      I0219 10:03:29.459198 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2[int32 NOT NULL]
      I0219 10:03:29.459347 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2_name[string NOT NULL]
      I0219 10:03:29.459491 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3[int32 NOT NULL]
      I0219 10:03:29.459630 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3_name[string NOT NULL]
      I0219 10:03:29.459774 31721 multi_column_writer.cc:82] Opened CFile writer for column data_type[int8 NOT NULL]
      I0219 10:03:29.459910 31721 multi_column_writer.cc:82] Opened CFile writer for column price[double NOT NULL]
      I0219 10:03:29.460060 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_date[timestamp NOT NULL]
      I0219 10:03:29.460197 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_number[int64 NOT NULL]
      I0219 10:03:29.460340 31721 multi_column_writer.cc:82] Opened CFile writer for column gmv[double NOT NULL]
      W0219 10:03:29.572443 31721 compaction.cc:714] Found REINSERT REDO truncating row history for Source Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233) Redo Mutations: [@5963145475847274496(DELETE), @5963145476833153024(REINSERT (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233))] Undo Mutations: [@5963144433666650112(DELETE)]
      Dest Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double gmv=12233) Redo Mutations: [] Undo Mutations: [@5963145476833153024(DELETE)] Note: this warning will appear only for the first truncated row
      W0219 10:03:32.302188 31721 compaction.cc:834] Total 7 rows lost some history due to REINSERT after DELETE
      I0219 10:03:32.346256 31721 tablet.cc:1222] Flush: entering phase 2 (starting to duplicate updates in new rowsets)
      I0219 10:03:32.846774 31721 tablet.cc:1277] Flush Phase 2: carrying over any updates which arrived during Phase 1
      I0219 10:03:32.846839 31721 tablet.cc:1278] Phase 2 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}]
      I0219 10:03:34.485435 31721 tablet.cc:1311] Flush successful on 52051 rows (19462877 bytes)
      I0219 10:03:34.501569 31721 maintenance_manager.cc:359] Time spent running FlushMRSOp(c6069618c9864c1d8422912bf69efd44): real 5.541s	user 5.454s	sys 0.081s
      I0219 10:05:45.657285 31721 tablet.cc:1333] Compaction: stage 1 complete, picked 6 rowsets to compact
      I0219 10:05:45.657618 31721 compaction.cc:591] Selected 6 rowsets to compact:
      I0219 10:05:45.657727 31721 compaction.cc:594] RowSet(15418)(current size on disk: ~30348706 bytes)
      I0219 10:05:45.657793 31721 compaction.cc:594] RowSet(15544)(current size on disk: ~1123338 bytes)
      I0219 10:05:45.657855 31721 compaction.cc:594] RowSet(15551)(current size on disk: ~26151132 bytes)
      I0219 10:05:45.657886 31721 compaction.cc:594] RowSet(15610)(current size on disk: ~30315579 bytes)
      I0219 10:05:45.657913 31721 compaction.cc:594] RowSet(15611)(current size on disk: ~32233272 bytes)
      I0219 10:05:45.657934 31721 compaction.cc:594] RowSet(15617)(current size on disk: ~18863054 bytes)
      I0219 10:05:45.657984 31721 tablet.cc:1132] Compaction: entering phase 1 (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 5963147244092280832 or (T in {5963147244092280832})}]
      I0219 10:05:45.661224 31721 multi_column_writer.cc:82] Opened CFile writer for column country_id[int32 NOT NULL]
      I0219 10:05:45.661432 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_id[int64 NOT NULL]
      I0219 10:05:45.661598 31721 multi_column_writer.cc:82] Opened CFile writer for column cur_day[int32 NOT NULL]
      I0219 10:05:45.661761 31721 multi_column_writer.cc:82] Opened CFile writer for column country_name[string NOT NULL]
      I0219 10:05:45.661933 31721 multi_column_writer.cc:82] Opened CFile writer for column sku_name[string NOT NULL]
      I0219 10:05:45.662081 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_id[int32 NOT NULL]
      I0219 10:05:45.662228 31721 multi_column_writer.cc:82] Opened CFile writer for column brand_name[string NOT NULL]
      I0219 10:05:45.662384 31721 multi_column_writer.cc:82] Opened CFile writer for column product_id[int64 NOT NULL]
      I0219 10:05:45.662542 31721 multi_column_writer.cc:82] Opened CFile writer for column product_name[string NOT NULL]
      I0219 10:05:45.662695 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1[int32 NOT NULL]
      I0219 10:05:45.662832 31721 multi_column_writer.cc:82] Opened CFile writer for column cid1_name[string NOT NULL]
      I0219 10:05:45.662962 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2[int32 NOT NULL]
      I0219 10:05:45.663100 31721 multi_column_writer.cc:82] Opened CFile writer for column cid2_name[string NOT NULL]
      I0219 10:05:45.663239 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3[int32 NOT NULL]
      I0219 10:05:45.663393 31721 multi_column_writer.cc:82] Opened CFile writer for column cid3_name[string NOT NULL]
      I0219 10:05:45.663538 31721 multi_column_writer.cc:82] Opened CFile writer for column data_type[int8 NOT NULL]
      I0219 10:05:45.663679 31721 multi_column_writer.cc:82] Opened CFile writer for column price[double NOT NULL]
      I0219 10:05:45.663830 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_date[timestamp NOT NULL]
      I0219 10:05:45.663964 31721 multi_column_writer.cc:82] Opened CFile writer for column sale_number[int64 NOT NULL]
      I0219 10:05:45.664091 31721 multi_column_writer.cc:82] Opened CFile writer for column gmv[double NOT NULL]
      F0219 10:06:01.038265 31721 deltafile.cc:136] Check failed: last_key_.CompareTo<UNDO>(key) <= 0 must insert undo deltas in sorted order (ascending key, then descending ts): got key (row 278573@tx5962430420551352320) after (row 278573@tx5962430179277082624)
      

      It appears that the timestamps are misordered (in ascending order instead of descending order), triggering a DCHECK. We don't normally see this in load tests because we run with RELEASE mode in most of our long running stress tests.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tlipcon Todd Lipcon
                Reporter:
                mpercy Mike Percy
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: