Hive
  1. Hive
  2. HIVE-7817

distinct/group by don't work on partition columns

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.14.0
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      suppose you have a table like this:

      CREATE TABLE page_view(
             viewTime INT,
             userid BIGINT,
              page_url STRING,
              referrer_url STRING,
              ip STRING COMMENT 'IP Address of the User')
      COMMENT 'This is the page view table'
      PARTITIONED BY(dt STRING, country STRING)
      CLUSTERED BY(userid) INTO 4 BUCKETS
      

      Then

      select distinct dt from page_view;
      select distinct dt, country from page_view;
      select dt, country from page_view group by dt, country;
      

      all fail with

      Query ID = ekoifman_20140820172626_b03ba819-c111-433f-a3fc-453c7d5a3e86
      Total jobs = 1
      Launching Job 1 out of 1
      Number of reduce tasks not specified. Estimated from input data size: 1
      In order to change the average load for a reducer (in bytes):
        set hive.exec.reducers.bytes.per.reducer=<number>
      In order to limit the maximum number of reducers:
        set hive.exec.reducers.max=<number>
      In order to set a constant number of reducers:
        set mapreduce.job.reduces=<number>
      Job running in-process (local Hadoop)
      Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
      2014-08-20 17:26:13,018 Stage-1 map = 0%,  reduce = 0%
      Ended Job = job_local165359429_0013 with errors
      Error during job, obtaining debugging information...
      FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
      MapReduce Jobs Launched: 
      Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
      Total MapReduce CPU Time Spent: 0 msec
      

      but

      select dt, country, count(*) from page_view group by dt, country;
      

      works fine.

        Issue Links

          Activity

          Eugene Koifman created issue -
          Eugene Koifman made changes -
          Field Original Value New Value
          Description suppose you have a table like this:
          {code:sql}
          CREATE TABLE page_view(
                 viewTime INT,
                 userid BIGINT,
                  page_url STRING,
                  referrer_url STRING,
                  ip STRING COMMENT 'IP Address of the User')
          COMMENT 'This is the page view table'
          PARTITIONED BY(dt STRING, country STRING)
          CLUSTERED BY(userid) INTO 4 BUCKETS
          {code:sql}

          Then
          {code:sql}
          select distinct dt from page_view;
          select distinct dt, country from page_view;
          select dt, country from page_view group by dt, country;
          {code:sql}

          all fail with

          {noformat}
          Query ID = ekoifman_20140820172626_b03ba819-c111-433f-a3fc-453c7d5a3e86
          Total jobs = 1
          Launching Job 1 out of 1
          Number of reduce tasks not specified. Estimated from input data size: 1
          In order to change the average load for a reducer (in bytes):
            set hive.exec.reducers.bytes.per.reducer=<number>
          In order to limit the maximum number of reducers:
            set hive.exec.reducers.max=<number>
          In order to set a constant number of reducers:
            set mapreduce.job.reduces=<number>
          Job running in-process (local Hadoop)
          Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
          2014-08-20 17:26:13,018 Stage-1 map = 0%, reduce = 0%
          Ended Job = job_local165359429_0013 with errors
          Error during job, obtaining debugging information...
          FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
          MapReduce Jobs Launched:
          Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
          Total MapReduce CPU Time Spent: 0 msec
          {noformat}

          but
          {code:sql}
          select dt, country, count(*) from page_view group by dt, country;
          {code:sql}

          works fine.
          suppose you have a table like this:
          {code:sql}
          CREATE TABLE page_view(
                 viewTime INT,
                 userid BIGINT,
                  page_url STRING,
                  referrer_url STRING,
                  ip STRING COMMENT 'IP Address of the User')
          COMMENT 'This is the page view table'
          PARTITIONED BY(dt STRING, country STRING)
          CLUSTERED BY(userid) INTO 4 BUCKETS
          {code}

          Then
          {code:sql}
          select distinct dt from page_view;
          select distinct dt, country from page_view;
          select dt, country from page_view group by dt, country;
          {code}

          all fail with

          {noformat}
          Query ID = ekoifman_20140820172626_b03ba819-c111-433f-a3fc-453c7d5a3e86
          Total jobs = 1
          Launching Job 1 out of 1
          Number of reduce tasks not specified. Estimated from input data size: 1
          In order to change the average load for a reducer (in bytes):
            set hive.exec.reducers.bytes.per.reducer=<number>
          In order to limit the maximum number of reducers:
            set hive.exec.reducers.max=<number>
          In order to set a constant number of reducers:
            set mapreduce.job.reduces=<number>
          Job running in-process (local Hadoop)
          Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
          2014-08-20 17:26:13,018 Stage-1 map = 0%, reduce = 0%
          Ended Job = job_local165359429_0013 with errors
          Error during job, obtaining debugging information...
          FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
          MapReduce Jobs Launched:
          Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
          Total MapReduce CPU Time Spent: 0 msec
          {noformat}

          but
          {code:sql}
          select dt, country, count(*) from page_view group by dt, country;
          {code}

          works fine.
          Hide
          Vikram Dixit K added a comment -

          Eugene Koifman This seems like a major outage or is this a new feature request? Has this worked in an earlier version of hive say 0.13? If so can you please mark this as a blocker for 0.14.

          Thanks
          Vikram.

          Show
          Vikram Dixit K added a comment - Eugene Koifman This seems like a major outage or is this a new feature request? Has this worked in an earlier version of hive say 0.13? If so can you please mark this as a blocker for 0.14. Thanks Vikram.
          Hide
          Eugene Koifman added a comment -

          I don't know if this ever worked. I ran into this by accident while trying to test something else in 0.14.

          Show
          Eugene Koifman added a comment - I don't know if this ever worked. I ran into this by accident while trying to test something else in 0.14.
          Pengcheng Xiong made changes -
          Link This issue is related to HIVE-3108 [ HIVE-3108 ]
          Hide
          Pengcheng Xiong added a comment -

          by the way, i do not think hive-3108 is ever solved.

          Show
          Pengcheng Xiong added a comment - by the way, i do not think hive-3108 is ever solved.

            People

            • Assignee:
              Unassigned
              Reporter:
              Eugene Koifman
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Development