Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11059

[Rust] [DataFusion] Implement extensible configuration mechanism

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • Rust - DataFusion
    • None

    Description

      We are getting to the point where there are multiple settings we could add to operators to fine-tune performance. Custom operators provided by crates that extend DataFusion may also need this capability.

      I propose that we add support for key-value configuration options so that we don't need to plumb through each new configuration setting that we add.

      For example. I am about to start on a "coalesce batches" operator and I would like a setting such as "coalesce.batch.size".

      For built-in settings like this we can provide information such as documentation and default values and generate documentation from this.

      For example, here is how Spark defines configs:

        val PARQUET_VECTORIZED_READER_ENABLED =
                  buildConf("spark.sql.parquet.enableVectorizedReader")
                    .doc("Enables vectorized parquet decoding.")
                    .version("2.0.0")
                    .booleanConf
                    .createWithDefault(true) 

      Attachments

        Activity

          People

            andygrove Andy Grove
            andygrove Andy Grove
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: