Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1285

PYTHON: NotImplemented exception creates empty parquet file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.5.0
    • 0.6.0
    • Python
    • None

    Description

      This is correctly raising (because categorical is not implemented), but it is creating an empty file.

      xref https://github.com/pandas-dev/pandas/pull/15838#pullrequestreview-52576290

      In [2]:    df = pd.DataFrame({'a': list('abc'),
         ...:                       'b': list(range(1, 4)),
         ...:                       'c': np.arange(3, 6).astype('u1'),
         ...:                       'd': np.arange(4.0, 7.0, dtype='float64'),
         ...:                       'e': [True, False, True],
         ...:                       'f': pd.Categorical(list('abc')),
         ...:                       'g': pd.date_range('20130101', periods=3),
         ...:                       'h': pd.date_range('20130101', periods=3, tz='US/Eastern'),
         ...:                       'i': pd.date_range('20130101', periods=3, freq='ns')})
         ...: 
      
      In [3]: df.to_parquet('foo.pq')
      ---------------------------------------------------------------------------
      ---------------------------------------------------------------------------
      ArrowNotImplementedError                  Traceback (most recent call last)
      <ipython-input-3-8070fb7e3e2c> in <module>()
      ----> 1 df.to_parquet('foo.pq')
      
      /Users/jreback/pandas/pandas/core/frame.py in to_parquet(self, fname, engine, compression, **kwargs)
         1620         from pandas.io.parquet import to_parquet
         1621         to_parquet(self, fname, engine,
      -> 1622                    compression=compression, **kwargs)
         1623 
         1624     @Substitution(header='Write out column names. If a list of string is given, \
      
      /Users/jreback/pandas/pandas/io/parquet.py in to_parquet(df, path, engine, compression, **kwargs)
          152         raise ValueError("parquet must have string column names")
          153 
      --> 154     return impl.write(df, path, compression=compression)
          155 
          156 
      
      /Users/jreback/pandas/pandas/io/parquet.py in write(self, df, path, compression, **kwargs)
           53         table = self.api.Table.from_pandas(df, timestamps_to_ms=True)
           54         self.api.parquet.write_table(
      ---> 55             table, path, compression=compression, **kwargs)
           56 
           57     def read(self, path):
      
      /Users/jreback/miniconda3/envs/pandas/lib/python3.6/site-packages/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, use_deprecated_int96_timestamps, **kwargs)
          770         version=version,
          771         use_deprecated_int96_timestamps=use_deprecated_int96_timestamps)
      --> 772     writer = ParquetWriter(where, table.schema, **options)
          773     writer.write_table(table, row_group_size=row_group_size)
          774     writer.close()
      
      _parquet.pyx in pyarrow._parquet.ParquetWriter.__cinit__()
      
      error.pxi in pyarrow.lib.check_status()
      
      ArrowNotImplementedError: NotImplemented: unhandled type
      
      In [4]: !ls -ltr foo.pq
      -rw-r--r--  1 jreback  staff  0 Jul 27 06:03 foo.pq
      

      Attachments

        Activity

          People

            wesm Wes McKinney
            jreback Jeff Reback
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: