Details
Description
When converting proto::Type to TypeImpl, we need to check that LIST and MAP types have a correct number of subtypes. Otherwise, it will lead to later errors in the c++ reader.
A file with ill types is attached.
The java reader also has this problem and throws an IndexOutOfBoundsException immediately:
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:3212)
at org.apache.orc.OrcProto$Type.getSubtypes(OrcProto.java:12642)
at org.apache.orc.OrcUtils.convertTypeFromProtobuf(OrcUtils.java:506)
at org.apache.orc.OrcUtils.convertTypeFromProtobuf(OrcUtils.java:515)
at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:386)
at org.apache.orc.OrcFile.createReader(OrcFile.java:327)
at org.apache.orc.tools.FileDump.getReader(FileDump.java:241)
at org.apache.orc.tools.FileDump.printMetaDataImpl(FileDump.java:300)
at org.apache.orc.tools.FileDump.printMetaData(FileDump.java:274)
at org.apache.orc.tools.FileDump.main(FileDump.java:135)
at org.apache.orc.tools.Driver.main(Driver.java:105)
We should also check the subtype count of UNION type. If its subtype_size == 0, the 'counts' pointer used in UnionColumnReader::skip and UnionColumnReader::next will be null pointer. The attached file (no_subtypes_union.orc) can reproduce this.