Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1401

Wrong result with csv data when projecting columns not part of the Join or Filter

    XMLWordPrintableJSON

    Details

      Description

      #Wed Sep 10 13:35:05 PDT 2014
      git.commit.id.abbrev=686eb9e

      join (inner,full outer,left,right) does not work if the join is directly applied to csv files. It will produce either null values on the projected columns or NumberFormatException depends on joining condition applied on which columns (first or second columns). If you create views on the csv files, then join works on the views.

      For example, the following join give null values:

      0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as int), cast(`aggregate_100r.csv`.columns[1] as int), cast(`join_100r.csv`.columns[1] as int) from `aggregate_100r.csv` inner join `join_100r.csv` on cast(`aggregate_100r.csv`.columns[0] as int) = cast(`join_100r.csv`.columns[0] as int);
      --------------------------------

      EXPR$0 EXPR$1 EXPR$2

      --------------------------------

      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      0 null null
      1 null null
      1 null null
      1 null null
      1 null null

      The following give NumberFormatException:

      0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as int), cast(`aggregate_100r.csv`.columns[1] as int), cast(`join_100r.csv`.columns[1] as int) from `aggregate_100r.csv` inner join `join_100r.csv` on cast(`aggregate_100r.csv`.columns[1] as int) = cast(`join_100r.csv`.columns[1] as int);
      Query failed: Failure while running fragment. [3f67299e-f312-445b-8d6b-74984a820f0c]

      Error: exception while executing query: Failure while trying to get next result batch. (state=,code=0)

      The following with views works:

      0: jdbc:drill:schema=dfs> select `aggregate_100r_v`.c0, `aggregate_100r_v`.c1, `join_100r_v`.c1 from `aggregate_100r_v` inner join `join_100r_v` on `aggregate_100r_v`.c0 = `join_100r_v`.c0;
      --------------------------------

      c0 c1 c10

      --------------------------------

      0 0 0
      0 0 2
      0 0 1
      0 0 0
      0 0 0
      0 0 2
      0 0 1
      0 0 0
      0 1 0
      0 1 2
      0 1 1
      0 1 0
      0 2 0
      0 2 2
      0 2 1
      0 2 0
      1 1 1
      1 1 2
      1 1 1
      1 1 0

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                DrillCommitter DrillCommitter
                Reporter:
                cchang@maprtech.com Chun Chang
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: