Apache Thrift introduced a `MaxMessageSize` configuration option (https://github.com/apache/thrift/blob/master/doc/specs/thrift-tconfiguration.md#maxmessagesize) in version 0.14 (
I think this is the cause of an issue reported originally at https://github.com/dask/dask/issues/8027, where one can get a "OSError: Couldn't deserialize thrift: MaxMessageSize reached" error while reading a large Parquet (metadata-only) file.
In the original report, the file was writting using the python fastparquet library (which uses the python thrift bindings, which still use Thrift 0.13), but I was able to construct a reproducible code example with pyarrow.
Create a large metadata Parquet file with pyarrow in an environment with Arrow built against Thrift 0.13 (eg with a local install from source, or installing pyarrow 2.0 from conda-forge can be installed with libthrift 0.13):
And then reading this file again in the same environment works fine, but reading it in an environment with recent Thrift 0.14 (eg installing latest pyarrow with conda-forge) gives the following error: