Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12201

DataFrame API: to_parquet(partition_cols=) doesn't work as intended

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.26.0, 2.27.0, 2.28.0, 2.29.0
    • None
    • dsl-dataframe, sdk-py-core

    Description

      Currently we accept the partition_cols keyword argument, but it doesn't work as intended. It should partition by the specified columns and use dynamic destinations to write partitions to different files.

      Context: https://lists.apache.org/thread.html/ra1e647440ffb43e922d9289cbe6f59e581c00055cf7f6a71b3fab205%40%3Cuser.beam.apache.org%3E

      Attachments

        Activity

          People

            Unassigned Unassigned
            bhulette Brian Hulette
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: