Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10695

[C++][Dataset] Allow to use a UUID in the basename_template when writing a dataset

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Won't Do
    • None
    • None
    • C++

    Description

      Currently we allow the user to specify a basename_template, and this can include a "{i}" part to replace it with an automatically incremented integer (so each generated file written to a single partition is unique):

      https://github.com/apache/arrow/blob/master/python/pyarrow/dataset.py#L713-L717

      It might be useful to also have the ability to use a UUID, to ensure the file is unique in general (not only for a single write) and to mimic the behaviour of the old write_to_dataset implementation.

      For example, we could look for a "{uuid}" in the template string, and if present replace it for each file with a new UUID.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: