Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8136

[C++][Python] Creating dataset from relative path no longer working

    XMLWordPrintableJSON

Details

    Description

      Since https://github.com/apache/arrow/pull/6597, local relative paths don't work anymore:

      In [1]: import pyarrow.dataset as ds  
      
      In [2]: ds.dataset("test.parquet")  
      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <ipython-input-2-23ecfce52d13> in <module>
      ----> 1 ds.dataset("test.parquet")
      
      ~/scipy/repos/arrow/python/pyarrow/dataset.py in dataset(paths_or_factories, filesystem, partitioning, format)
          327 
          328     if isinstance(paths_or_factories, str):
      --> 329         return factory(paths_or_factories, **kwargs).finish()
          330 
          331     if not isinstance(paths_or_factories, list):
      
      ~/scipy/repos/arrow/python/pyarrow/dataset.py in factory(path_or_paths, filesystem, partitioning, format)
          246     factories = []
          247     for path in path_or_paths:
      --> 248         fs, paths_or_selector = _ensure_fs_and_paths(path, filesystem)
          249         factories.append(FileSystemDatasetFactory(fs, paths_or_selector,
          250                                                   format, options))
      
      ~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs_and_paths(path, filesystem)
          165     from pyarrow.fs import FileType, FileSelector
          166 
      --> 167     filesystem, path = _ensure_fs(filesystem, _stringify_path(path))
          168     infos = filesystem.get_target_infos([path])[0]
          169     if infos.type == FileType.Directory:
      
      ~/scipy/repos/arrow/python/pyarrow/dataset.py in _ensure_fs(filesystem, path)
          158     if filesystem is not None:
          159         return filesystem, path
      --> 160     return FileSystem.from_uri(path)
          161 
          162 
      
      ~/scipy/repos/arrow/python/pyarrow/_fs.pyx in pyarrow._fs.FileSystem.from_uri()
      
      ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()
      
      ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowInvalid: URI has empty scheme: 'test.parquet'
      
      

      apitrou Is this something that should be fixed in FileSystemFromUriOrPath or rather on the python side? (FileSystem.from_uri ensures to get the absolute path for Pathlib objects, but not for strings)

      Attachments

        Activity

          People

            jorisvandenbossche Joris Van den Bossche
            jorisvandenbossche Joris Van den Bossche
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 40m
                3h 40m