Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9456

[Python] Dataset segfault when not importing pyarrow.parquet

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: Python
    • Labels:
      None

      Description

      To reproduce:

      1. import pyarrow.parquet # if we skip this...
        import pyarrow as pa
        import pyarrow.dataset as ds
        import glob
        ds = pa.dataset.dataset('/data/taxi_parquet/data_0.parquet')
        ds.to_table() # this will crash
         
        $ python pyarrow/crash.py dev
        terminate called after throwing an instance of 'parquet::ParquetException'
        what(): The file only has 19 columns, requested metadata for column: 1049198736
        [1] 1559395 abort (core dumped) python pyarrow/crash.py
         
        When the import is there, it will work fine.
         

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              maartenbreddels Maarten Breddels
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: