Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6038

[Python] pyarrow.Table.from_batches produces corrupted table if any of the batches were empty

Details

    Description

      When creating a Table from a list/iterator of batches which contains an "empty" RecordBatch a Table is produced but attempts to run any pyarrow built-in functions (such as unique()) occasionally result in a Segfault.

      The MWE is attached: segfault_ex.py

      1. The segfaults happen randomly, around 30% of the time.
      2. Commenting out line 10 in the MWE results in no segfaults.
      3. The segfault is triggered using the unique() function, but I doubt the behaviour is specific to that function, from what I gather the problem lies in Table creation.

      I'm on Windows 10, using Python 3.6 and pyarrow 0.14.0 installed through pip (problem also occurs with 0.13.0 from conda-forge).

      Attachments

        1. segfault_ex.py
          0.5 kB
          Piotr Bajger

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              Bajger Piotr Bajger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m

                  Slack

                    Issue deployment