Hive
  1. Hive
  2. HIVE-4372

When trying to populate an external HBase table using a hive query involving joins, the data is incorrectly getting mixed up inside the rows.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0
    • Fix Version/s: None
    • Component/s: HBase Handler
    • Labels:
      None

      Activity

      Hide
      Navis added a comment -

      Yashaswy Andavilli Could you describe the problem in detail?

      Show
      Navis added a comment - Yashaswy Andavilli Could you describe the problem in detail?
      Hide
      Yashaswy Andavilli added a comment -

      This is the create command I am using to create the HBase-integrated table:
      CREATE EXTERNAL TABLE CALLCOUNTBYGENDER(Id STRING, Hour STRING, Gender String, Count String)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:Hour,cf1:Gender,cf1:Count")
      TBLPROPERTIES ("hbase.table.name" = "hbase_cdr_summary_callcountbygender");

      I am using the following hive statement to populate the above table:
      INSERT OVERWRITE TABLE CALLCOUNTBYGENDER
      SELECT concat(tod.hour,sd.SubscriberAgeGroup),tod.hour,sd.SubscriberAgeGroup,count
      FROM FACT f JOIN TimeofDayD tod on f.timeofdaykey=tod.timeofdaykey
      JOIN SubscriberDemographicsD sd on f.SubscriberDemographicsKey=sd.SubscriberDemographicsKey
      GROUP BY tod.hour,sd.SubscriberAgeGroup;

      The table is getting populated correctly when I am viewing it in HBase, the third column 'Gender' is getting messed up. It is supposed to contain only male/female values but the Hour column values are also getting populated under Gender column. After repopulating the table again and again, it finally got populated correctly. I'm not sure what kind of a bug it is.

      Show
      Yashaswy Andavilli added a comment - This is the create command I am using to create the HBase-integrated table: CREATE EXTERNAL TABLE CALLCOUNTBYGENDER(Id STRING, Hour STRING, Gender String, Count String)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:Hour,cf1:Gender,cf1:Count") TBLPROPERTIES ("hbase.table.name" = "hbase_cdr_summary_callcountbygender"); I am using the following hive statement to populate the above table: INSERT OVERWRITE TABLE CALLCOUNTBYGENDER SELECT concat(tod.hour,sd.SubscriberAgeGroup),tod.hour,sd.SubscriberAgeGroup,count FROM FACT f JOIN TimeofDayD tod on f.timeofdaykey=tod.timeofdaykey JOIN SubscriberDemographicsD sd on f.SubscriberDemographicsKey=sd.SubscriberDemographicsKey GROUP BY tod.hour,sd.SubscriberAgeGroup; The table is getting populated correctly when I am viewing it in HBase, the third column 'Gender' is getting messed up. It is supposed to contain only male/female values but the Hour column values are also getting populated under Gender column. After repopulating the table again and again, it finally got populated correctly. I'm not sure what kind of a bug it is.

        People

        • Assignee:
          Unassigned
          Reporter:
          Yashaswy Andavilli
        • Votes:
          0 Vote for this issue
          Watchers:
          2 Start watching this issue

          Dates

          • Created:
            Updated:

            Development