Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5122

[Python] pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 0.12.1
    • Fix Version/s: 0.14.0
    • Component/s: Python
    • Labels:
    • Environment:
      Windows

      Description

      I think this might be a small bug with the read_table interface when used to load a directory full of parquets in Windows. It works just fine if I use directly a ParquetDataset object to read the table represented by the directory, or if I use read_table in a linux terminal.

      Apparently the problem comes from the _make_manifest() method in parquet.py, I think around line ~1045. Either _is_path_like() or the FileSystem method isdir() fail to recognize the path as a valid directory (I tested with a raw Windows path and a pathlib.WindowsPath object).

      I hope this helps a little.

      P.D. Thank you for your effort developing this package!

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              IlyaOrson Ilya Orson Sandoval
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: