Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15737

pyarrow.parquet.read_table("parquet_file") causes bus error in ipython

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 7.0.0
    • None
    • Python
    • macOS 12.2.1 aarch64
      python. 3.10.1
      arrow 7.0.0

    Description

      I have a parquet file with two columns (int64 and double) and 9 million rows. The parquet tools (parquet, parquet-reader, parquet-schema...) read it perfectly. (I have many files, actually, but they all exhibit the same behavior).

      The following code fails with "zsh bus error  ipython":

      import pyarrow.parquet as pq
      pq.read_table("parquet_file")

      These snippets work properly.

      pq.read_table("parquet_file", use_lagacy_dataset=True)

      f = pq.ParquetFile("parquet_file")
      f.read()
      for batch in f.iterbatches():
      print(len(batch))

      Attachments

        Activity

          People

            Unassigned Unassigned
            meangrape Jay Edwards
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: