Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12201

DataFrame API: to_parquet(partition_cols=) doesn't work as intended

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: P2
    • Resolution: Unresolved
    • Affects Version/s: 2.26.0, 2.27.0, 2.28.0, 2.29.0
    • Fix Version/s: None
    • Component/s: dsl-dataframe, sdk-py-core
    • Labels:

      Description

      Currently we accept the partition_cols keyword argument, but it doesn't work as intended. It should partition by the specified columns and use dynamic destinations to write partitions to different files.

      Context: https://lists.apache.org/thread.html/ra1e647440ffb43e922d9289cbe6f59e581c00055cf7f6a71b3fab205%40%3Cuser.beam.apache.org%3E

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bhulette Brian Hulette
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: