Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5004

[Python] Confusing behaviour with boolean partition keys

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 0.12.1
    • None
    • Python

    Description

      https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L686

      Here the type of a partition key is converted to match the type of a filter variable.

      using the write_to_dataset function allows boolean partition keys (True or False) but these silently break at the linked line as bool('False') evaluates as True.

      I understand a docstring (https://github.com/apache/arrow/blob/3129732a18210d0c8921b45f79be4f34eadf0cc3/python/pyarrow/parquet.py#L653) refers to only string or int partition variables being supported although this is somewhat buried away from the user facing API.

      It may be beneficial to detect the boolean case and raise a warning or to ensure the function returns a more intuitive output when partition key is 'False' and the filter variable is False.

      Attachments

        Activity

          People

            Unassigned Unassigned
            scotttaylor Scott Taylor
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: