Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.10.1
-
None
-
None
Description
Writing following protobuf message as parquet file is not possible:
syntax = "proto3"; import "google/protobuf/struct.proto"; package test; option java_outer_classname = "CustomMessage"; message TestMessage { map<string, google.protobuf.ListValue> data = 1; }
Protobuf introduced "well known json type" such like ListValue to work around json schema conversion.
However writing above messages traps parquet writer into an infinite loop due to the "general type" support in protobuf. Current implementation will keep referencing 6 possible types defined in protobuf (null, bool, number, string, struct, list) and entering infinite loop when referencing "struct".
java.lang.StackOverflowErrorjava.lang.StackOverflowError at java.base/java.util.Arrays$ArrayItr.<init>(Arrays.java:4418) at java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at java.base/java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1044) at java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)