Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5310

[Python] better error message on creating ParquetDataset from empty directory

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Python
    • Labels:

      Description

      Currently, you get when path is an existing but empty directory:

      >>> dataset = pq.ParquetDataset(path)
      ---------------------------------------------------------------------------
      IndexError                                Traceback (most recent call last)
      <ipython-input-16-346f72ae525e> in <module>
      ----> 1 dataset = pq.ParquetDataset(path)
      
      ~/scipy/repos/arrow/python/pyarrow/parquet.py in __init__(self, path_or_paths, filesystem, schema, metadata, split_row_groups, validate_schema, filters, metadata_nthreads, memory_map)
          989 
          990         if validate_schema:
      --> 991             self.validate_schemas()
          992 
          993         if filters is not None:
      
      ~/scipy/repos/arrow/python/pyarrow/parquet.py in validate_schemas(self)
         1025                 self.schema = self.common_metadata.schema
         1026             else:
      -> 1027                 self.schema = self.pieces[0].get_metadata().schema
         1028         elif self.schema is None:
         1029             self.schema = self.metadata.schema
      
      IndexError: list index out of range
      

      That could be a nicer error message.

      Unless we actually want to allow this? (although I am not sure there are good use cases of empty directories to support this, because from an empty directory we cannot get any schema or metadata information?)
      It is only failing when validating the schemas, so with validate_schema=False it actually returns a ParquetDataset object, just with an empty list for pieces and no schema. So it would be easy to not error when validating the schemas as well for this empty-directory case.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jorisvandenbossche Joris Van den Bossche
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: