Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7629

Parquet MAP field support missing in recent stable release (?)

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.17.0
    • Fix Version/s: 1.18.0
    • Component/s: Storage - Parquet
    • Labels:
      None
    • Environment:

      Drill 1.17
      Zulu OpenJDK 8 build 1.8.0_232
      Debian Buster 10.3
      Kernel version 4.19.98-1
      EC c5.2xlarge instances (8 Cores, 16GB RAM)

      Description

      Encountered this issue when lowering planner.slice_target  (to say, 100) in order to make drill generate more fragments. Queries then started crashing with the following error:

      Caused by: java.io.IOException: Unable to parse column [`currencyPair` STRUCT<`bfix` MAP<`map` STRUCT<`key` ARRAY<VARCHAR>, `value` ARRAY<DOUBLE>>>> not null]: Line [1], position [29], offending symbol [@4,29:31='MAP',<26>,1:29]: no viable alternative at input '`bfix`MAP'
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:80)
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:61)
      	at org.apache.drill.exec.record.metadata.AbstractColumnMetadata.createColumnMetadata(AbstractColumnMetadata.java:75)
      	at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at com.fasterxml.jackson.databind.introspect.AnnotatedMethod.call(AnnotatedMethod.java:109)
      	at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:283)
      	... 72 common frames omitted
      Caused by: org.apache.drill.exec.record.metadata.schema.parser.SchemaParsingException: Line [1], position [29], offending symbol [@4,29:31='MAP',<26>,1:29]: no viable alternative at input '`bfix`MAP'
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser$ErrorListener.syntaxError(SchemaExprParser.java:120)
      	at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
      	at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
      	at org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:310)
      	at org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:136)
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:403)
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column_def(SchemaParser.java:317)
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.columns(SchemaParser.java:262)
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_type(SchemaParser.java:1395)
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_column(SchemaParser.java:579)
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:383)
      	at org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:78)

      All files in the queried directory are parquet files that share the same schema, just to be clear.

      Looking into the stack-trace, this seems like an antlr error. Assuming SchemaParser generated from this g4 file you can see MAP support is lacking

      Looking around a bit in Jira/Github, I noticed that this issue had already been fixed in DRILL-7361. I can also confirm that upgrading to the last SNAPSHOT version (built from source today) resolved the issue.

      A few questions:

      • Did you intentionally drop parquet MAP field support in Drill for 1.17 as part of the Antlr lexer refactoring, or was it never present to begin with (I see 1.16 is not using antlr parsing for parquet schema)?
      • Can we safely assume the (newly added) MAP field support will persist from here on out, or at as part of the 1.18 release?
      • Probably not the best place to ask, but as for 1.18, is there a timeline/plan for that already? or is there a possibility for a hot-fix version release? would really be happy to work on a stable version rather than a self-built one.

      I'd be able to provide parquet files and guidance towards re-creating this issue in 1.17, should the need arise.

      Thanks in advance!

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sheinbergon Idan Sheinberg

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment