Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
-
MAC/Linux
Description
Total description: If we configure more than 24 partitions in each NC, we always loss almost half of the partitions, without any error information or logs.
Schema:
drop dataverse tpch if exists; create dataverse tpch; use dataverse tpch; create type LineItemType as closed { l_orderkey: int32, l_partkey: int32, l_suppkey: int32, l_linenumber: int32, l_quantity: int32, l_extendedprice: double, l_discount: double, l_tax: double, l_returnflag: string, l_linestatus: string, l_shipdate: string, l_commitdate: string, l_receiptdate: string, l_shipinstruct: string, l_shipmode: string, l_comment: string } create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber; load dataset LineItem using localfs (("path"="127.0.0.1:///path-to-tpch-data/tpch0.001/lineitem.tbl"),("format"="delimited-text"),("delimiter"="|"));
Query:
use dataverse tpch; let $s := count( for $d in dataset LineItem return $d ) return $s
Return:
6005
Command:
managix stop -n tpch managix start -n tpch
Query:
use dataverse tpch; let $s := count( for $d in dataset LineItem return $d ) return $s
Return:
4521
We lose 1/3 records in this tiny test. When we increase the tpch scale onto 200gb across 196 partitions by the distribution of 8 X 24, we should get 1.2 billion records, but it only returned 0.45 billion!