Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-452

Support converting MAP column from JSON to ORC

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.5.5, 1.6.0
    • Java, tools
    • None

    Description

      The JSON convert tool has not supported MAP column yet.

      java -jar orc-tools-1.5.4-uber.jar convert -s "struct<id:int,map_col:map<string,string>>" map.json 
      log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
      log4j:WARN Please initialize the log4j system properly.
      log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
      Processing map.json
      Exception in thread "main" java.lang.IllegalArgumentException: Unhandled type map<string,string>
              at org.apache.orc.tools.convert.JsonReader.createConverter(JsonReader.java:252)
              at org.apache.orc.tools.convert.JsonReader.<init>(JsonReader.java:277)
              at org.apache.orc.tools.convert.JsonReader.<init>(JsonReader.java:260)
              at org.apache.orc.tools.convert.ConvertTool$FileInformation.getRecordReader(ConvertTool.java:149)
              at org.apache.orc.tools.convert.ConvertTool.run(ConvertTool.java:203)
              at org.apache.orc.tools.convert.ConvertTool.main(ConvertTool.java:165)
              at org.apache.orc.tools.Driver.main(Driver.java:113)
      

      In JsonReader.java, there's no converter for MAP type.

      We'd like to convert following JSON rows into an ORC file

      {"id": 0, "map_col": {"k1": "v1", "k2": "v2"}}
      {"id": 1, "map_col": {"k3": "v3", "k4": "v4", "k5": "v5"}}

       

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m