Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1681

[Python] Error writing with nulls in lists

    XMLWordPrintableJSON

Details

    Description

      Created from https://github.com/apache/arrow/issues/1208

      Hi,
      Not sure if this is related or the same as ARROW-1584, but I can't seem to find a way to handle arrays of lists which occasionally consist of empty lists only.

      To reproduce:

      na = [] # None, [""]
      
      arrays = {
          'c1': pa.array([["test"], na, na], type=pa.list_(pa.string())),
          'c2': pa.array([na, na, na], type=pa.list_(pa.string())),
      }
      
      rb = pa.RecordBatch.from_arrays(list(arrays.values()), list(arrays.keys()))
      df = rb.to_pandas()
      
      pa.serialize_pandas(df)
      # > ArrowNotImplementedError: Unable to convert type: null
      
      tbl = pa.Table.from_pandas(df)
      sink = pa.BufferOutputStream()
      writer = pa.RecordBatchFileWriter(sink, tbl.schema)
      writer.write_table(tbl)
      # > ArrowNotImplementedError: Unable to convert type: null
      

      In my use case I'm processing data in batches where individual fields contain lists of strings. Some of the batches may, however, contain empty lists only. And there doesn't seem to be any representation in Arrow at the moment to deal with this situation.

      Also, since I'm serializing the batches into a single file/stream, their schemas need to be consistent, which is why I tried explicitly specifying the type of the array as list_(string). The only workaround I've found is to replace empty lists with [""], but that implies lots of unnecessary glue code on the client side. Is there a better workaround until this is fixed in an official conda release?

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              wesm Wes McKinney
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: