Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8286

[Python] Creating dataset from pathlib results in UnionDataset instead of FileSystemDataset

    XMLWordPrintableJSON

Details

    Description

      import pyarrow as pa
      import pyarrow.parquet as pq
      import pyarrow.dataset as ds
      
      table = pa.table({'a': np.random.randn(10), 'b': range(10), 'c': ['a', 'b'] * 5})
      pq.write_table(table, "test.parquet")
      
      import pathlib
      
      ds.dataset(pathlib.Path("./test.parquet"))
      # gives UnionDataset
      
      ds.dataset(str(pathlib.Path("./test.parquet")))
      # correctly gives FileSystemDataset
      

      and since those two dataset classes have different API, this is important to give FileSystemDataset

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h