Details
-
Improvement
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
Description
EDIT
====
While investigating, perf hits in the Bulk Insert a few issues were found:
- NonPartitionedKeyGenerator does not implement `getRecordKey`, `getParititionKey` for `InternalRow`, leading to invocation of default implementation converting row to Avro.
HUDI-3993: Using UDF to fetch record keys, similarly has to deserialize `InternalRow` into deserialized `Row`
Attachments
Issue Links
- is related to
-
HUDI-3993 Avoid calling into Spark UDF in Bulk Insert
- Closed
-
HUDI-4038 Avoid invoking `getDataSize` in the hot-path
- Closed
-
HUDI-4039 Make sure builtin key-generators can efficiently fetch record-key, partition-path
- Closed
- relates to
-
HUDI-4384 Hive style partition not work and record key loss prefix using ComplexKey in bulk_insert
- Closed
- links to