Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4070

Better Spark SQL default configs

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • 0.12.0
    • None
    • None

    Description

      Default configs should be:

      1. Optimized for insert/bulk_insert e.g. by default if we have NONE sort mode then it's as good as parquet writes with some additional work for meta columns. An extension of this is to keep a map of minimal optimized configs per operation type. This is partly related to better performant configs HUDI-2151
      2. Make reasonable assumptions, e.g. for index type, bloom filter does not rely on any external system, so it can be a better default candidate than let's say HBase index.
      3. Scout all configs with noDefaultValue and assign a default if necessary.
      4. Keep spark-sql and spark datasource config keys same as much as possible, otherwise it's difficult operationally for the user. Rename/reuse existing datasource keys that are meant for same purpose. This is related to HUDI-4071 as well.

      Attachments

        Issue Links

          Activity

            People

              codope Sagar Sumit
              codope Sagar Sumit
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: