Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-1776

Data loss in many multi-partitions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Environment:
      MAC/Linux

      Description

      Total description: If we configure more than 24 partitions in each NC, we always loss almost half of the partitions, without any error information or logs.
      Schema:

      drop dataverse tpch if exists;
      create dataverse tpch;
      use dataverse tpch;
      
      create type LineItemType as closed {
        l_orderkey: int32,
        l_partkey: int32,
        l_suppkey: int32,
        l_linenumber: int32,
        l_quantity: int32,
        l_extendedprice: double,
        l_discount: double,
        l_tax: double,
        l_returnflag: string,
        l_linestatus: string,
        l_shipdate: string,
        l_commitdate: string,
        l_receiptdate: string,
        l_shipinstruct: string,
        l_shipmode: string,
        l_comment: string
      }
      
      create dataset LineItem(LineItemType)
        primary key l_orderkey, l_linenumber;
      load dataset LineItem 
      using localfs
      (("path"="127.0.0.1:///path-to-tpch-data/tpch0.001/lineitem.tbl"),("format"="delimited-text"),("delimiter"="|"));
      

      Query:

      use dataverse tpch;
      let $s := count(
      for $d in dataset LineItem
      return $d
      )
      return $s
      

      Return:

      6005
      

      Command:

      managix stop -n tpch
      managix start -n tpch
      

      Query:

      use dataverse tpch;
      let $s := count(
      for $d in dataset LineItem
      return $d
      )
      return $s
      

      Return:

      4521
      

      We lose 1/3 records in this tiny test. When we increase the tpch scale onto 200gb across 196 partitions by the distribution of 8 X 24, we should get 1.2 billion records, but it only returned 0.45 billion!

        Attachments

        1. cc.log
          151 kB
          Wenhai Li
        2. demo.xml
          3 kB
          Wenhai Li
        3. execute.log
          51 kB
          Wenhai Li
        4. tpch_node1.log
          106 kB
          Wenhai Li
        5. tpch_node2.log
          115 kB
          Wenhai Li

          Activity

            People

            • Assignee:
              imaxon Ian Maxon
              Reporter:
              lwhay Wenhai Li
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: