Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
Description
CustomKeyGenerator is not able to parse the microseconds part. Below is the reproducible code. In output its giving - "2023-03-04 14:44:42.046000"
```
fake = Faker()
data = [
for _ in range(5)]
pandas_df = pd.DataFrame(data)
hoodie_properties =
{ 'hoodie.table.name': "pj_poc", 'hoodie.datasource.write.recordkey.field': 'ID', 'hoodie.datasource.write.partitionpath.field': 'State:SIMPLE,Country:SIMPLE,EventTime:TIMESTAMP', 'hoodie.datasource.write.table.name': "pj_poc", 'hoodie.datasource.write.precombine.field': 'EventTime', 'hoodie.datasource.write.hive_style_partitioning':'true', 'hoodie.datasource.write.keygenerator.class':'org.apache.hudi.keygen.CustomKeyGenerator', 'hoodie.keygen.timebased.input.dateformat':'yyyy-MM-dd HH:mm:ss.SSSSSS', 'hoodie.keygen.timebased.output.dateformat':'yyyy-MM-dd HH:mm:ss.SSSSSS', 'hoodie.keygen.timebased.timestamp.type' : 'DATE_STRING', 'hoodie.keygen.timebased.timestamp.scalar.time.unit': 'MICROSECONDS', 'hoodie.parquet.outputtimestamptype': 'TIMESTAMP_MICROS', }spark.sparkContext.setLogLevel("WARN")
df = spark.createDataFrame(pandas_df)
df.write.format("hudi").options(**hoodie_properties).mode("overwrite").save(PATH)
spark.read.options(**hoodie_properties).format("hudi").load(PATH).select("_hoodie_partition_path", "EventTime").show(10, False)
```