Description
See the title.
Here are some examples related to this bug.
default> \dfs -ls /customer.tbl Found 19 items -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000001 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000002 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000003 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000004 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000005 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000006 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000007 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000008 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000009 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000010 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000011 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000012 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000013 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000014 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000015 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:25 /customer.tbl/000016 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:26 /customer.tbl/000017 -rw-r--r-- 3 hadoop supergroup 134217728 2015-01-26 20:26 /customer.tbl/000018 -rw-r--r-- 3 hadoop supergroup 47571167 2015-01-26 20:26 /customer.tbl/000019 default> create external table test (C_CUSTKEY bigint, C_NAME text, C_ADDRESS text, C_NATIONKEY bigint, C_PHONE text, C_ACCTBAL double, C_MKTSEGMENT text, C_COMMENT text) using csv with ('csvfile.delimiter'='|') location 'hdfs://192.168.0.1:7020/customer.tbl'; OK default> \d test table name: tpch_swift.test table path: hdfs://192.168.0.1:7020/customer.tbl store type: CSV number of rows: unknown volume: 2.5 GB Options: 'text.delimiter'='|' schema: c_custkey INT8 c_name TEXT c_address TEXT c_nationkey INT8 c_phone TEXT c_acctbal FLOAT8 c_mktsegment TEXT c_comment TEXT default> select count(*) from test; ?count ------------------------------- 15000017 (1 rows, 3.2 sec, 9 B selected)
As you can see, the expected result is 15000000, but the real result was 15000017.
So, I investigated error tuples as follows.
default> select c_custkey, count(*) as cnt from customer2 group by c_custkey having cnt > 1; c_custkey, cnt ------------------------------- , 14 114575, 2 14711665, 2 34, 2 (4 rows, 16.681 sec, 29 B selected) default> select * from customer2 where c_custkey is null or c_custkey = 114575 or c_custkey = 14711665 or c_custkey = 34; c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment ------------------------------- 34, Customer#000000034, Q6G9wZ6dnczmtOx509xgE,M2KV, 15, 25-344-968-5422, 8589.7, HOUSEHOLD, nder against the even, pending accounts. even 114575, Customer#000114575, xqLzTzY0,QvqwlSPI8OLxjRQ4s2W7pkSWwK, 16, 26-303-921-2836, 6663.68, AUTOMOBILE, le fluffily final deposits. furiously regu , 21, 31-264-911-5053, , HOUSEHOLD, 0.0, , , IexCQQNp7tsMK63QKrGw37H3JJXGPaXBk, 18, , 4313.01, 0.0, the never pending accounts. slyly fluffy pinto beans run fluffily. furiously , , , , , , , , , 152.95, MACHINERY, , , , , , t the ironic, close accounts are careful, , , , , , , 20, 30-481-475-8163, , AUTOMOBILE, 0.0, , , , , , , , , , MACHINERY, ts use slyly even dependencie, , , , , , , , , , , , , 24, 34-639-456-9692, , FURNITURE, 0.0, , , , , , , , , 114575, , , , , , , 34, Customer#011457534, wFUkCU67OxuxvfQeSdvSMDtMB7DWt7jiw, 2, 12-145-168-8442, 145.78, MACHINERY, ic accounts. ironic, final ideas sleep qu , XPP8pRDTDs4MFMP7SSlv, 17, , 5437.09, 0.0, egular requests cajole slyly after the , , blithely along the regular, daring deposits. ironic acco, , , , , , , 12, 22-656-233-3821, , HOUSEHOLD, 0.0, , 14711665, Customer#0, , , , , , 14711665, QKTarsTkX7, 19, , 7017.62, 0.0, ly after the carefully ironic theodolites. pending requests are slyly across the deposits. even accounts boost. fina, (20 rows, 8.964 sec, 1.2 KiB selected)