Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5965

[Python] Regression: segfault when reading hive table with v0.14

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 0.14.0
    • Fix Version/s: 0.15.0
    • Component/s: Python
    • Labels:

      Description

      I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow installed in a conda env.

      The data I'm reading is a hive(-registered) table written as parquet, and with v0.13, reading this table (that is partitioned) does not cause any issues.

      The code that worked before and now crashes with v0.14 is simply:

      ```
      import pyarrow.parquet as pq
      pq.ParquetDataset('hdfs:///data/raw/source/table').read()

      ```

      Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I cannot report much more, but this is a pretty severe usability restriction. So far the solution is to enforce `pyarrow<0.14`

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                h-vetinari H. Vetinari
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: