Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5965

[Python] Regression: segfault when reading hive table with v0.14

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 0.14.0
    • 0.15.0
    • Python

    Description

      I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow installed in a conda env.

      The data I'm reading is a hive(-registered) table written as parquet, and with v0.13, reading this table (that is partitioned) does not cause any issues.

      The code that worked before and now crashes with v0.14 is simply:

      ```
      import pyarrow.parquet as pq
      pq.ParquetDataset('hdfs:///data/raw/source/table').read()

      ```

      Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I cannot report much more, but this is a pretty severe usability restriction. So far the solution is to enforce `pyarrow<0.14`

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              h-vetinari H. Vetinari
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment