Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7306

CustomKeyGenerator with TIMESTAMP skipping the microseconds part

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • 1.1.0
    • writer-core
    • None

    Description

      CustomKeyGenerator is not able to parse the microseconds part. Below is the reproducible code. In output its giving - "2023-03-04 14:44:42.046000"
      ```
      fake = Faker()
      data = [

      {"ID": fake.uuid4(), "EventTime": "2023-03-04 14:44:42.046661", "FullName": fake.name(), "Address": fake.address(), "CompanyName": fake.company(), "JobTitle": fake.job(), "EmailAddress": fake.email(), "PhoneNumber": fake.phone_number(), "RandomText": fake.sentence(), "City": fake.city(), "State": fake.state(), "Country": fake.country()}

      for _ in range(5)]
      pandas_df = pd.DataFrame(data)

      hoodie_properties =

      { 'hoodie.table.name': "pj_poc", 'hoodie.datasource.write.recordkey.field': 'ID', 'hoodie.datasource.write.partitionpath.field': 'State:SIMPLE,Country:SIMPLE,EventTime:TIMESTAMP', 'hoodie.datasource.write.table.name': "pj_poc", 'hoodie.datasource.write.precombine.field': 'EventTime', 'hoodie.datasource.write.hive_style_partitioning':'true', 'hoodie.datasource.write.keygenerator.class':'org.apache.hudi.keygen.CustomKeyGenerator', 'hoodie.keygen.timebased.input.dateformat':'yyyy-MM-dd HH:mm:ss.SSSSSS', 'hoodie.keygen.timebased.output.dateformat':'yyyy-MM-dd HH:mm:ss.SSSSSS', 'hoodie.keygen.timebased.timestamp.type' : 'DATE_STRING', 'hoodie.keygen.timebased.timestamp.scalar.time.unit': 'MICROSECONDS', 'hoodie.parquet.outputtimestamptype': 'TIMESTAMP_MICROS', }

      spark.sparkContext.setLogLevel("WARN")

      df = spark.createDataFrame(pandas_df)
      df.write.format("hudi").options(**hoodie_properties).mode("overwrite").save(PATH)
      spark.read.options(**hoodie_properties).format("hudi").load(PATH).select("_hoodie_partition_path", "EventTime").show(10, False)
      ```

      Attachments

        Activity

          People

            Unassigned Unassigned
            adityagoenka Aditya Goenka
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: