Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1075

unexpected join output from input of consecutive query results including 0-size tuple segment

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      9 HDFS Data Node (Tajo 1 Master + 8 Workers), OpenJDK 1.7

      Description

      Tajo generates unexpected join output with some input made by consecutive query execution including 0-size tuple segments. For example,

      0.
      external table t: 1.1GB data near 24M rows
      external table u: 776KB data near 64K rows

      1.
      tajo query 1> create table t1 as select * from t where t.a=xxx;
      tajo query 1a> create table t1_ext as select t1.*, u.b from t1 inner join u on t1.key=u.key;
      tajo query 2> create table t2 as select * from t where t.a=yyy;
      tajo query 2a> create table t2_ext as select t2.*, u.b from t2 inner join u on t2.key=u.key;

      Table t1_ext and t2_ext are flawless (tajo succeeded in executing queries). The last output from a series of query executions has 3MB data near 64K rows in hdfs, as following:

      Permission Owner Group Size Replication Block Size Name
      rw-rr- hadoop supergroup 0 B 2 32 MB part-03-000000-000
      rw-rr- hadoop supergroup 0 B 2 32 MB part-03-000001-000
      rw-rr- hadoop supergroup 0 B 2 32 MB part-03-000002-000
      rw-rr- hadoop supergroup 0 B 2 32 MB part-03-000003-000
      rw-rr- hadoop supergroup 0 B 2 32 MB part-03-000004-000
      rw-rr- hadoop supergroup 0 B 2 32 MB part-03-000005-000
      rw-rr- hadoop supergroup 0 B 2 32 MB part-03-000006-000
      rw-rr- hadoop supergroup 1.03 MB 2 32 MB part-03-000007-000
      rw-rr- hadoop supergroup 1.92 MB 2 32 MB part-03-000008-000

      2.
      tajo query 3> select * from t1_ext inner join t2_ext on t1_ext.key=t2_ext.key;

      the join query result produces abnormal output, for example, having nothing regardless of the existence of matching key.

      3.
      after receiving some advices from Jihoon Son, I manually removed 0-size tuple data in hdfs directory, and re-executed tajo query 3. The result indicates that TAJO WORKS PROPERLY.

      Since not explored tajo source in depth, I can pinpoint nowhere to modify. It needs committer's support.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                cepiross Jinhang Choi
                Reporter:
                cepiross Jinhang Choi
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: