Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1003

Handle partitions correctly when sync non-partitioned table to hive.

    XMLWordPrintableJSON

Details

    Description

      When sync hudi non-parititioned table to hive with the following options:

      option("hoodie.datasource.hive_sync.enable", "true").
      option("hoodie.datasource.hive_sync.table", tableName).
      option("hoodie.datasource.hive_sync.username", "root").
      option("hoodie.datasource.hive_sync.password", "123456").
      option("hoodie.datasource.hive_sync.jdbcurl", "jdbc:hive2://localhost:10000").
      option("hoodie.datasource.hive_sync.partition_fields", "region,country,city").
      option("hoodie.datasource.write.operation", writeOperation).
      option("hoodie.datasource.write.table.type", tableType).
      option("hoodie.datasource.hive_sync.partition_extractor_class", "org.apache.hudi.hive.NonPartitionedExtractor")

       

      it will create the following tables:

      CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
      `_hoodie_commit_time` string,
      `_hoodie_commit_seqno` string,
      `_hoodie_record_key` string,
      `_hoodie_partition_path` string,
      `_hoodie_file_name` string,
      `age` bigint,
      `location` string,
      `name` string,
      `sex` string,
      `ts` bigint)
      PARTITIONED BY (
      `region` string,
      `country` string,
      `city` string)
      ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
      STORED AS INPUTFORMAT
      'org.apache.hudi.hadoop.HoodieParquetInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
      LOCATION
      'file:/Users/sflee/personal/hudi_java_client_dataset'
      TBLPROPERTIES (
      'last_commit_time_sync'='20200606200453',
      'transient_lastDdlTime'='1591445103')

       

      but indeed it has no partition, and would not query any data using select * from  hudi_trips_cow_hive_non_partitioned.

      so when user use NonPartitionedExtractor and set hoodie.datasource.hive_sync.partition_fields to some fields,

      we need throw exception or create proper create like below:**

      CREATE EXTERNAL TABLE `hudi_trips_cow_hive_non_partitioned`(
      `_hoodie_commit_time` string,
      `_hoodie_commit_seqno` string,
      `_hoodie_record_key` string,
      `_hoodie_partition_path` string,
      `_hoodie_file_name` string,
      `age` bigint,
      `location` string,
      `name` string,
      `sex` string,
      `ts` bigint)
      ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
      STORED AS INPUTFORMAT
      'org.apache.hudi.hadoop.HoodieParquetInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
      LOCATION
      'file:/Users/sflee/personal/hudi_java_client_dataset'
      TBLPROPERTIES (
      'last_commit_time_sync'='20200606201124',
      'transient_lastDdlTime'='1591445493')

       

      I am incline to create the table normally using correct sql.

      Attachments

        Issue Links

          Activity

            People

              LUOYAJUN luoyajun
              xleesf leesf
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: