Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4960

Wrong columns after scanning Json files where some files have missing columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.8.0
    • None
    • Server
    • None
    • Mac

    Description

      (This problem may be more general than just Json)

      To recreate: Scan two small Json files (e.g. copy twice contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a whole column was eliminated (e.g. "last_name").

      A "normal" scan (the missing column shows up as nulls):

      0: jdbc:drill:zk=local> select * from `drill/data/emp`;
      -------------------------------------------------------------------------------------------+

      employee_id full_name first_name last_name position_id rating position isFTE

      -------------------------------------------------------------------------------------------+

      1101 Steve Eurich Steve Eurich 16 23.0 Store T true
      1102 Mary Pierson Mary Pierson 16 45.6 Store T true
      1103 Leo Jones Leo Jones 16 85.94 Store Tem true
      1104 Nancy Beatty Nancy Beatty 16 97.16 Store T false
      1105 Clara McNight Clara McNight 16 81.25 Store true
      1106 null Marcella Isaacs 17 67.86 Stor false
      1107 Charlotte Yonce Charlotte Yonce 17 52.17 Stor true
      1108 Benjamin Foster Benjamin Foster 17 89.8 Stor false
      1109 John Reed John Reed 17 12.9 Store Per false
      1110 Lynn Kwiatkowski Lynn Kwiatkowski 17 25.76 St true
      1111 Donald Vann Donald Vann 17 34.86 Store Per false
      1112 null William Smith null 79.06 St true
      1113 Amy Hensley Amy Hensley 17 82.96 Store Pe false
      1114 Judy Owens Judy Owens 17 24.6 Store Per true
      1115 Frederick Castillo Frederick Castillo 17 82.36 S false
      1116 Phil Munoz Phil Munoz 17 97.63 Store Per false
      1117 Lori Lightfoot Lori Lightfoot 17 39.16 Store true
      1 Kumar Anil B 19 45.45 Store true
      2 Kamesh Bh Venkata null 32.89 Store true
      1101 Steve Eurich Steve null 16 23.0 Store T true
      1102 Mary Pierson Mary null 16 45.6 Store T true
      1103 Leo Jones Leo null 16 85.94 Store Tem true
      1104 Nancy Beatty Nancy null 16 97.16 Store T false
      1105 Clara McNight Clara null 16 81.25 Store true
      1106 null Marcella null 17 67.86 Stor false
      1107 Charlotte Yonce Charlotte null 17 52.17 Stor true
      1108 Benjamin Foster Benjamin null 17 89.8 Stor false
      1109 John Reed John null 17 12.9 Store Per false
      1110 Lynn Kwiatkowski Lynn null 17 25.76 St true
      1111 Donald Vann Donald null 17 34.86 Store Per false
      1112 null William null null 79.06 St true
      1113 Amy Hensley Amy null 17 82.96 Store Pe false
      1114 Judy Owens Judy null 17 24.6 Store Per true
      1115 Frederick Castillo Frederick null 17 82.36 S false
      1116 Phil Munoz Phil null 17 97.63 Store Per false
      1117 Lori Lightfoot Lori null 17 39.16 Store true
      1 Kumar Anil null 19 45.45 Store true
      2 Kamesh Bh null null 32.89 Store true

      -------------------------------------------------------------------------------------------+
      38 rows selected (0.16 seconds)

      But when the first alphabetically ordered file name is renamed to become second, that column ("last_name") does not show:

      0: jdbc:drill:zk=local> select * from foo;
      -------------------------------------------------------------------------------

      employee_id full_name first_name position_id rating position isFTE

      -------------------------------------------------------------------------------

      1101 Steve Eurich Steve 16 23.0 Store T true
      1102 Mary Pierson Mary 16 45.6 Store T true
      1103 Leo Jones Leo 16 85.94 Store Tem true
      1104 Nancy Beatty Nancy 16 97.16 Store T false
      1105 Clara McNight Clara 16 81.25 Store true
      1106 null Marcella 17 67.86 Stor false
      1107 Charlotte Yonce Charlotte 17 52.17 Stor true
      1108 Benjamin Foster Benjamin 17 89.8 Stor false
      1109 John Reed John 17 12.9 Store Per false
      1110 Lynn Kwiatkowski Lynn 17 25.76 St true
      1111 Donald Vann Donald 17 34.86 Store Per false
      1112 null William null 79.06 St true
      1113 Amy Hensley Amy 17 82.96 Store Pe false
      1114 Judy Owens Judy 17 24.6 Store Per true
      1115 Frederick Castillo Frederick 17 82.36 S false
      1116 Phil Munoz Phil 17 97.63 Store Per false
      1117 Lori Lightfoot Lori 17 39.16 Store true
      1 Kumar Anil 19 45.45 Store true
      2 Kamesh Bh null 32.89 Store true
      1101 Steve Eurich Steve 16 23.0 Store T true
      1102 Mary Pierson Mary 16 45.6 Store T true
      1103 Leo Jones Leo 16 85.94 Store Tem true
      1104 Nancy Beatty Nancy 16 97.16 Store T false
      1105 Clara McNight Clara 16 81.25 Store true
      1106 null Marcella 17 67.86 Stor false
      1107 Charlotte Yonce Charlotte 17 52.17 Stor true
      1108 Benjamin Foster Benjamin 17 89.8 Stor false
      1109 John Reed John 17 12.9 Store Per false
      1110 Lynn Kwiatkowski Lynn 17 25.76 St true
      1111 Donald Vann Donald 17 34.86 Store Per false
      1112 null William null 79.06 St true
      1113 Amy Hensley Amy 17 82.96 Store Pe false
      1114 Judy Owens Judy 17 24.6 Store Per true
      1115 Frederick Castillo Frederick 17 82.36 S false
      1116 Phil Munoz Phil 17 97.63 Store Per false
      1117 Lori Lightfoot Lori 17 39.16 Store true
      1 Kumar Anil 19 45.45 Store true
      2 Kamesh Bh null 32.89 Store true

      -------------------------------------------------------------------------------
      38 rows selected (0.261 seconds)

      But if requested explicitly, the column does show:

      0: jdbc:drill:zk=local> select last_name from `drill/data/emp`;
      --------------

      last_name

      --------------

      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      null
      Eurich
      Pierson
      Jones
      Beatty
      McNight
      Isaacs
      Yonce
      Foster
      Reed
      Kwiatkowski
      Vann
      Smith
      Hensley
      Owens
      Castillo
      Munoz
      Lightfoot
      B
      Venkata

      --------------
      38 rows selected (0.159 seconds)

      Things get even WORSE when a parallel plan is chosen – some column data shows up under the wrong columns:

      0: jdbc:drill:zk=local> alter session set planner.slice_target = 1;
      -------------------------------------+

      ok summary

      -------------------------------------+

      true planner.slice_target updated.

      -------------------------------------+
      1 row selected (0.084 seconds)
      0: jdbc:drill:zk=local> select * from `drill/data/emp`;
      -----------------------------------------------------------------------------------

      employee_id full_name first_name position_id rating position isFTE

      -----------------------------------------------------------------------------------

      1101 Steve Eurich Steve 16 23.0 Store T true
      1102 Mary Pierson Mary 16 45.6 Store T true
      1103 Leo Jones Leo 16 85.94 Store Tem true
      1104 Nancy Beatty Nancy 16 97.16 Store T false
      1105 Clara McNight Clara 16 81.25 Store true
      1106 null Marcella 17 67.86 Stor false
      1107 Charlotte Yonce Charlotte 17 52.17 Stor true
      1108 Benjamin Foster Benjamin 17 89.8 Stor false
      1109 John Reed John 17 12.9 Store Per false
      1110 Lynn Kwiatkowski Lynn 17 25.76 St true
      1111 Donald Vann Donald 17 34.86 Store Per false
      1112 null William null 79.06 St true
      1113 Amy Hensley Amy 17 82.96 Store Pe false
      1114 Judy Owens Judy 17 24.6 Store Per true
      1115 Frederick Castillo Frederick 17 82.36 S false
      1116 Phil Munoz Phil 17 97.63 Store Per false
      1117 Lori Lightfoot Lori 17 39.16 Store true
      1 Kumar Anil 19 45.45 Store true
      2 Kamesh Bh null 32.89 Store true
      1101 Steve Eurich Steve Eurich 16 23.0 Store T
      1102 Mary Pierson Mary Pierson 16 45.6 Store T
      1103 Leo Jones Leo Jones 16 85.94 Store Tem
      1104 Nancy Beatty Nancy Beatty 16 97.16 Store T
      1105 Clara McNight Clara McNight 16 81.25 Store
      1106 null Marcella Isaacs 17 67.86 Stor
      1107 Charlotte Yonce Charlotte Yonce 17 52.17 Stor
      1108 Benjamin Foster Benjamin Foster 17 89.8 Stor
      1109 John Reed John Reed 17 12.9 Store Per
      1110 Lynn Kwiatkowski Lynn Kwiatkowski 17 25.76 St
      1111 Donald Vann Donald Vann 17 34.86 Store Per
      1112 null William Smith null 79.06 St
      1113 Amy Hensley Amy Hensley 17 82.96 Store Pe
      1114 Judy Owens Judy Owens 17 24.6 Store Per
      1115 Frederick Castillo Frederick Castillo 17 82.36 S
      1116 Phil Munoz Phil Munoz 17 97.63 Store Per
      1117 Lori Lightfoot Lori Lightfoot 17 39.16 Store
      1 Kumar Anil B 19 45.45 Store
      2 Kamesh Bh Venkata null 32.89 Store

      -----------------------------------------------------------------------------------
      38 rows selected (0.253 seconds)

      Attachments

        Activity

          People

            paul-rogers Paul Rogers
            ben-zvi Boaz Ben-Zvi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: