Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2654

[Python] Error with errno 22 when loading 3.6 GB Parquet file

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.12.0
    • Component/s: Python
    • Labels:

      Description

      I saved a file using pandas to_parquet method, but can't read it back in. Here's the full stack trace:

       

      Traceback (most recent call last):
      File "src/data/CLXP_pull.py", line 214, in <module>
       main()
       File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 722, in _call_
       return self.main(*args, **kwargs)
       File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 697, in main
       rv = self.invoke(ctx)
       File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 895, in invoke
       return ctx.invoke(self.callback, **ctx.params)
       File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/click/core.py", line 535, in invoke
       return callback(*args, **kwargs)
       File "src/data/CLXP_pull.py", line 188, in main
       results[fullname] = pd.read_parquet(os.path.join(project_dir, "data", "raw", fullname+".parquet"), engine="pyarrow")
       File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py", line 257, in read_parquet
       return impl.read(path, columns=columns, **kwargs)
       File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pandas/io/parquet.py", line 130, in read
       **kwargs).to_pandas()
       File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py", line 939, in read_table
       pf = ParquetFile(source, metadata=metadata)
       File "/Users/mm51929/projects/2018/03-advisor-recruiting/pyenv/lib/python3.6/site-packages/pyarrow/parquet.py", line 64, in _init_
       self.reader.open(source, metadata=metadata)
       File "_parquet.pyx", line 651, in pyarrow._parquet.ParquetReader.open
       File "error.pxi", line 79, in pyarrow.lib.check_status
       pyarrow.lib.ArrowIOError: Arrow error: IOError: [Errno 22] Invalid argument
      

      Any ideas what could cause this? The file itself is 3.6GB.

      I'm running pandas==0.22.0.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesmckinn Wes McKinney
                Reporter:
                andyreagan Andy Reagan
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: