Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5169

[Python] non-nullable fields are converted to nullable in {{Table.from_pandas}}

    XMLWordPrintableJSON

Details

    Description

      In version 0.13.0, the Table.from_pandas function modifies the input schema by making all non-nullable types nullable.

      This can cause problems for example with this code:

      df = pd.DataFrame(list(range(200)), columns=['numcol'])
      schema = pa.schema([
           pa.field('numcol', pa.int64(), nullable=False),
      ])
      writer = pq.ParquetWriter(io.BytesIO(), schema, version='2.0')
      table = pa.Table.from_pandas(df, schema=schema)
      writer.write_table(table)
      

      Which fails due to the writer schema and the table schema being different.

      I believe the direct cause could be https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L622 where nullable is set to True by default, resulting in the table schema being modified.

       

      Thanks for your valuable work on this library.

      Giacomo

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              giacomo1112 giacomo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h