Description
Error messages raised by `applyInPandas` and `mapInPadnas` are very generic or useless when used with complex schemata:
KeyError: 'val'
RuntimeError: Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: 2 Actual: 3
java.lang.IllegalArgumentException: not all nodes and buffers were consumed. nodes: [ArrowFieldNode [length=3, nullCount=0]] buffers: [ArrowBuf[304], address:139860828549160, length:0, ArrowBuf[305], address:139860828549160, length:24]
pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64
pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to convert to double
These should be improved by adding column names or descriptive messages (in the same order as above):
RuntimeError: Column names of the returned pandas.DataFrame do not match specified schema. Missing: val Unexpected: v Schema: id, val
RuntimeError: Column names of the returned pandas.DataFrame do not match specified schema. Missing: val Unexpected: foo, v Schema: id, val
RuntimeError: Column names of the returned pandas.DataFrame do not match specified schema. Unexpected: v Schema: id, id
pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64
The above exception was the direct cause of the following exception:
TypeError: Exception thrown when converting pandas.Series (int64) with name 'val' to Arrow Array (string).
pyarrow.lib.ArrowInvalid: Could not convert '0' with type str: tried to convert to double The above exception was the direct cause of the following exception: ValueError: Exception thrown when converting pandas.Series (object) with name 'val' to Arrow Array (double).
When no column names are given, the following error was returned:
RuntimeError: Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: 2 Actual: 3
Where it should contain the output schema:
RuntimeError: Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: 2 Actual: 3 Schema: id, val