Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4255

SELECT DISTINCT query over JSON data returns UNSUPPORTED OPERATION

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.12.0
    • Component/s: Execution - Flow
    • Labels:
      None
    • Environment:

      CentOS

      Description

      SELECT DISTINCT over mapr fs generated audit logs (JSON files) results in unsupported operation. An exact query over another set of JSON data returns correct results.

      MapR Drill 1.4.0, commit ID : 9627a80f
      MapRBuildVersion : 5.1.0.36488.GA
      OS : CentOS x86_64 GNU/Linux

      0: jdbc:drill:schema=dfs.tmp> select distinct t.operation from `auditlogs` t;
      Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes
      
      Fragment 3:3
      
      [Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 on example.com:31010] (state=,code=0)
      

      Stack trace from drillbit.log

      2016-01-08 11:35:35,093 [297060f9-1c7a-b32c-09e8-24b5ad863e73:frag:3:3] INFO  o.a.d.e.p.i.aggregate.HashAggBatch - User Error Occurred
      org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes
      
      
      [Error Id: 1233bf68-13da-4043-a162-cf6d98c07ec9 ]
              at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext(HashAggBatch.java:144) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:132) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250) [drill-java-exec-1.4.0.jar:1.4.0]
              at java.security.AccessController.doPrivileged(Native Method) [na:1.7.0_65]
              at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_65]
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) [hadoop-common-2.7.0-mapr-1506.jar:na]
               at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250) [drill-java-exec-1.4.0.jar:1.4.0]
              at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.4.0.jar:1.4.0]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
      

      Query plan for above query.

      00-00    Screen : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.4100499276E7 rows, 1.69455861396E8 cpu, 0.0 io, 1.2165858754560001E10 network, 2.7382234176000005E8 memory}, id = 7572
      00-01      UnionExchange : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.408635556E7 rows, 1.6944171768E8 cpu, 0.0 io, 1.2165858754560001E10 network, 2.7382234176000005E8 memory}, id = 7571
      01-01        Project(operation=[$0]) : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.3944918400000006E7 rows, 1.683102204E8 cpu, 0.0 io, 1.15865321472E10 network, 2.7382234176000005E8 memory}, id = 7570
      01-02          HashAgg(group=[{0}]) : rowType = RecordType(ANY operation): rowcount = 141437.16, cumulative cost = {3.3944918400000006E7 rows, 1.683102204E8 cpu, 0.0 io, 1.15865321472E10 network, 2.7382234176000005E8 memory}, id = 7569
      01-03            Project(operation=[$0]) : rowType = RecordType(ANY operation): rowcount = 1414371.6, cumulative cost = {3.2530546800000004E7 rows, 1.569952476E8 cpu, 0.0 io, 1.15865321472E10 network, 2.4892940160000002E8 memory}, id = 7568
      01-04              HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative cost = {3.2530546800000004E7 rows, 1.569952476E8 cpu, 0.0 io, 1.15865321472E10 network, 2.4892940160000002E8 memory}, id = 7567
      02-01                UnorderedMuxExchange : rowType = RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative cost = {3.1116175200000003E7 rows, 1.34365302E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7566
      03-01                  Project(operation=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)]) : rowType = RecordType(ANY operation, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1414371.6, cumulative cost = {2.97018036E7 rows, 1.329509304E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7565
      03-02                    HashAgg(group=[{0}]) : rowType = RecordType(ANY operation): rowcount = 1414371.6, cumulative cost = {2.8287432E7 rows, 1.27293444E8 cpu, 0.0 io, 0.0 network, 2.4892940160000002E8 memory}, id = 7564
      03-03                      Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/auditlogs, numFiles=31, columns=[`operation`], files=[maprfs:/tmp/auditlogs/DBAudit.log-2015-12-30-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-002.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-30-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-003.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-31-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-04-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-002.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-31-003.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-03-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-31-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-29-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2015-12-28-004.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-01-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-004.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-29-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2015-12-28-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-01-001.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-004.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-004.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-06-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-07-001.json, maprfs:/tmp/auditlogs/DBAudit.log-2016-01-06-002.json, maprfs:/tmp/auditlogs/FSAudit.log-2016-01-08-001.json]]]) : rowType = RecordType(ANY operation): rowcount = 1.4143716E7, cumulative cost = {1.4143716E7 rows, 1.4143716E7 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 7563
      

      Another query that is exactly like the failing query reported here, this one returns correct results though.

      0: jdbc:drill:schema=dfs.tmp> select distinct t.key2 from `twoKeyJsn.json` t;
      +-------+
      | key2  |
      +-------+
      | d     |
      | c     |
      | b     |
      | 1     |
      | a     |
      | 0     |
      | k     |
      | m     |
      | j     |
      | h     |
      | e     |
      | n     |
      | g     |
      | f     |
      | l     |
      | i     |
      +-------+
      16 rows selected (27.097 seconds)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                khfaraaz Khurram Faraaz
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: