Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-1315

Invalid results are returned when a source table consists of multiple csv files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • None
    • 0.10.0
    • Storage
    • None

    Description

      See the title.
      Here are some examples related to this bug.

      default> \dfs -ls /customer.tbl
      Found 19 items
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000001
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000002
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000003
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000004
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000005
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000006
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000007
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000008
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000009
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000010
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000011
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000012
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000013
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000014
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000015
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:25 /customer.tbl/000016
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:26 /customer.tbl/000017
      -rw-r--r--   3 hadoop supergroup  134217728 2015-01-26 20:26 /customer.tbl/000018
      -rw-r--r--   3 hadoop supergroup   47571167 2015-01-26 20:26 /customer.tbl/000019
      
      default> create external table test (C_CUSTKEY bigint, C_NAME text, C_ADDRESS text, C_NATIONKEY bigint, C_PHONE text, C_ACCTBAL double, C_MKTSEGMENT text, C_COMMENT text) using csv with ('csvfile.delimiter'='|') location 'hdfs://192.168.0.1:7020/customer.tbl';
      OK
      default> \d test
      
      table name: tpch_swift.test
      table path: hdfs://192.168.0.1:7020/customer.tbl
      store type: CSV
      number of rows: unknown
      volume: 2.5 GB
      Options: 
      	'text.delimiter'='|'
      
      schema: 
      c_custkey	INT8
      c_name	TEXT
      c_address	TEXT
      c_nationkey	INT8
      c_phone	TEXT
      c_acctbal	FLOAT8
      c_mktsegment	TEXT
      c_comment	TEXT
      
      default> select count(*) from test;
      ?count
      -------------------------------
      15000017
      (1 rows, 3.2 sec, 9 B selected)
      

      As you can see, the expected result is 15000000, but the real result was 15000017.

      So, I investigated error tuples as follows.

      default> select c_custkey, count(*) as cnt from customer2 group by c_custkey having cnt > 1;
      c_custkey,  cnt
      -------------------------------
      ,  14
      114575,  2
      14711665,  2
      34,  2
      (4 rows, 16.681 sec, 29 B selected)
      
      default> select * from customer2 where c_custkey is null or c_custkey = 114575 or c_custkey = 14711665 or c_custkey = 34;
      c_custkey,  c_name,  c_address,  c_nationkey,  c_phone,  c_acctbal,  c_mktsegment,  c_comment
      -------------------------------
      34,  Customer#000000034,  Q6G9wZ6dnczmtOx509xgE,M2KV,  15,  25-344-968-5422,  8589.7,  HOUSEHOLD,  nder against the even, pending accounts. even
      114575,  Customer#000114575,  xqLzTzY0,QvqwlSPI8OLxjRQ4s2W7pkSWwK,  16,  26-303-921-2836,  6663.68,  AUTOMOBILE,  le fluffily final deposits. furiously regu
      ,  21,  31-264-911-5053,  ,  HOUSEHOLD,  0.0,  ,  
      ,  IexCQQNp7tsMK63QKrGw37H3JJXGPaXBk,  18,  ,  4313.01,  0.0,   the never pending accounts. slyly fluffy pinto beans run fluffily. furiously ,  
      ,  ,  ,  ,  ,  ,  ,  
      ,  152.95,  MACHINERY,  ,  ,  ,  ,  
      ,  t the ironic, close accounts are careful,  ,  ,  ,  ,  ,  
      ,  20,  30-481-475-8163,  ,  AUTOMOBILE,  0.0,  ,  
      ,  ,  ,  ,  ,  ,  ,  
      ,  MACHINERY,  ts use slyly even dependencie,  ,  ,  ,  ,  
      ,  ,  ,  ,  ,  ,  ,  
      ,  24,  34-639-456-9692,  ,  FURNITURE,  0.0,  ,  
      ,  ,  ,  ,  ,  ,  ,  
      114575,  ,  ,  ,  ,  ,  ,  
      34,  Customer#011457534,  wFUkCU67OxuxvfQeSdvSMDtMB7DWt7jiw,  2,  12-145-168-8442,  145.78,  MACHINERY,  ic accounts. ironic, final ideas sleep qu
      ,  XPP8pRDTDs4MFMP7SSlv,  17,  ,  5437.09,  0.0,  egular requests cajole slyly after the ,  
      ,  blithely along the regular, daring deposits. ironic acco,  ,  ,  ,  ,  ,  
      ,  12,  22-656-233-3821,  ,  HOUSEHOLD,  0.0,  ,  
      14711665,  Customer#0,  ,  ,  ,  ,  ,  
      14711665,  QKTarsTkX7,  19,  ,  7017.62,  0.0,  ly after the carefully ironic theodolites. pending requests are slyly across the deposits. even accounts boost. fina,  
      (20 rows, 8.964 sec, 1.2 KiB selected)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            jihoonson Jihoon Son
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: