Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5541

Disable precombine in bootstrap

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • bootstrap, hudi-utilities
    • None

    Description

      When I run a bootstrap to convert a hive table to Hudi in the 0.12.2 version, it throws the following error. This table is `call_center` in the TPC-DS standard. It hasn’t `ts` field.

       

      org.apache.hudi.exception.HoodieInsertException: Failed to bulk insert for commit time 00000000000002
      
      ...
      
      Caused by: org.apache.hudi.exception.HoodieException: ts(Part -ts) field not found in record. Acceptable fields were :[cc_call_center_sk... cc_tax_percentage]
              at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:542)
              at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldValAsString(HoodieAvroUtils.java:520)
              at org.apache.hudi.bootstrap.SparkFullBootstrapDataProviderBase.lambda$generateInputRecords$5ff1ef2f$1(SparkFullBootstrapDataProviderBase.java:73)
              ... 

       

      The following is my bootstrap command. I can't disable precombine by setting specific options.

      bin/spark-submit --master yarn \
      --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
      --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /opt/hudi-utilities-bundle_2.12-0.12.2.jar \
      --run-bootstrap \
      --target-base-path /tpcds_hudi_3.db/call_center \
      --target-table call_center \
      --table-type COPY_ON_WRITE \
      --hoodie-conf hoodie.bootstrap.base.path=/tpcds_bin_partitioned_parquet_3.db/call_center \
      --hoodie-conf hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator \
      --hoodie-conf hoodie.datasource.write.recordkey.field=cc_call_center_sk \
      --hoodie-conf hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider \
      --hoodie-conf hoodie.bootstrap.mode.selector=org.apache.hudi.client.bootstrap.selector.BootstrapRegexModeSelector \
      --hoodie-conf hoodie.bootstrap.mode.selector.regex.mode=FULL_RECORD \
      --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator 

      Attachments

        Activity

          People

            Unassigned Unassigned
            ana4 Luning Wang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: