Uploaded image for project: 'Apache Open Climate Workbench (Retired)'
  1. Apache Open Climate Workbench (Retired)
  2. CLIMATE-575

Implement initial config based execution of an evaluation

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.5
    • 1.0.0
    • general
    • None

    Description

      Brainstorming ideas for an initial config format for running an evaluation. I have an idea of one below. Note that this doesn't necessarily encapsulate all the functionality in the system yet. Empty sections are still a work in progress and will be filled in when possible.


      At the moment, the assumption is that there will a single config file for one evaluation.

      Sections

      There will be sections for

      • Datasets
      • Metrics
      • Plotting

      Datasets

      Specified under a [datasets] tag. This will be where all the datasets that will be loaded will be specified. A dataset will be specified with the following format:

      eval_purpose_identifier: data_source_keyword data_source_locator_data optional_keyword_args

      eval_purpose_identifier

      Either "reference" or "target". If there are multiple target datasets in the evaluation then they should all share the eval_purpose_identifier of "target"

      data_source_keyword

      Specifies which data source will be used to load this dataset. At the current state of the library the valid options would be "local", "dap", "rcmed", and "esgf".

      data_source_locator_data

      Data necessary for loading the dataset. This varies based on the data source that will be used for loading this data. If you look at the docs for the data sources, these are effectively the required elements for loading a dataset.

      local data_source_locator_data

      There will be two parts for a local datasource. Each of these should be separated by a space.

      • The path to the file to load (if it's a single file dataset) or the path to the directory where multiple files are located, the accepted separator text (tentatively "###"), and the glob pattern for the files to load.
      • The variable name

      dap data_source_locator_data

      Each of these should be separated by a space.

      • OpenDAP URL
      • Variable name

      rcmed data_source_locator_data

      Each of these should be separated by a space.

      • dataset_id
      • parameter_id
      • min_lat
      • max_lat
      • min_lon
      • max_lon
      • start_time
      • end_time

      esgf data_source_locator_data

      Each of these should be separated by a space.

      • dataset_id
      • variable name
      • esgf username
      • esgf password

      optional_keyword_args

      Any additional keyword args should be specified as a tuple after all of the required values have been specified. Again, these should be separated by a space from each other. Check the API docs for valid keyword args.

      Metrics

      Plotting

      Thoughts?

      A few of my concerns are:

      • Can we use whitespace to separate multiple items that we're passing and how will we handle single elements which contain valid whitespace? For instance file paths. If we place elements in quotes will that help with grouping? Should we use a specific separator value to split everything?
      • How should we pass the time formats for RCMED datasets?
      • Can we pass keyword args as a tuple? Will this work

      Attachments

        Issue Links

          Activity

            People

              mjoyce Michael Joyce
              mjoyce Michael Joyce
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: