Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4891

Distinct includes duplicate records

    XMLWordPrintableJSON

Details

    Description

      I have two partitions, one is sequence file, another is RCFile, but they are the same data(only different file format).

      I have the following SQL:

      select distinct uid from test where (dt ='20130718' or dt ='20130718_1') and cur_url like '%cq.aa.com%';
      

      dt ='20130718' is sequence file,(default input format, which specified when create table)

      dt ='20130718_1' is RCFile.

      ALTER TABLE test ADD IF NOT EXISTS PARTITION (dt='20130718_1') LOCATION '/user/test/test-data'
      ALTER TABLE test PARTITION(dt='20130718_1') SET FILEFORMAT RCFILE;
      

      but there are duplicate recoreds in the result.

      If two partitions with the same input format, then there are no duplicate records.

      Attachments

        Activity

          People

            Unassigned Unassigned
            azuryy Fengdong Yu
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: