Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-150

Add tool to convert from JSON to ORC

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: None
    • Labels:
      None

      Description

      We should have a tool to convert from JSON data to ORC. Something like

      % java jar orc-tools*uber.jar create "struct<x:int,y:string>" *.json

        Issue Links

          Activity

          Hide
          binarylogic Ben Johnson added a comment -

          Would the schema need to be specified, or could it be automatically determined from the JSON structure? If it is automatic, it would be cleaner to separate them, allowing someone to use them independently if desired.

          Show
          binarylogic Ben Johnson added a comment - Would the schema need to be specified, or could it be automatically determined from the JSON structure? If it is automatic, it would be cleaner to separate them, allowing someone to use them independently if desired.
          Hide
          owen.omalley Owen O'Malley added a comment -

          Actually, I have a JSON schema detector too, so I guess we could make the schema optional.

          Show
          owen.omalley Owen O'Malley added a comment - Actually, I have a JSON schema detector too, so I guess we could make the schema optional.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user omalley opened a pull request:

          https://github.com/apache/orc/pull/95

          ORC-150. Add tools for finding schema from JSON and converting into ORC

          Here is a first pass for making a JSON to ORC converter along with the JSON schema discovery.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/omalley/orc orc-150

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/orc/pull/95.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #95


          commit ffc3f46a5d55287cc2b405fda402d00ce3936c53
          Author: Owen O'Malley <omalley@apache.org>
          Date: 2017-02-21T04:37:18Z

          ORC-150. Add tools for finding schema from JSON documents and converting JSON
          into ORC files.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user omalley opened a pull request: https://github.com/apache/orc/pull/95 ORC-150 . Add tools for finding schema from JSON and converting into ORC Here is a first pass for making a JSON to ORC converter along with the JSON schema discovery. You can merge this pull request into a Git repository by running: $ git pull https://github.com/omalley/orc orc-150 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/orc/pull/95.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #95 commit ffc3f46a5d55287cc2b405fda402d00ce3936c53 Author: Owen O'Malley <omalley@apache.org> Date: 2017-02-21T04:37:18Z ORC-150 . Add tools for finding schema from JSON documents and converting JSON into ORC files.
          Hide
          owen.omalley Owen O'Malley added a comment -

          I've updated the pull request with:

          • JsonReader now implements the org.apache.orc.RecordReader interface.
          • I've added constructors for OrcMapredRecordReader and OrcMapreduceRecordReader that take org.apache.orc.RecordReader so that you can layer them on top of the JsonReader.
          Show
          owen.omalley Owen O'Malley added a comment - I've updated the pull request with: JsonReader now implements the org.apache.orc.RecordReader interface. I've added constructors for OrcMapredRecordReader and OrcMapreduceRecordReader that take org.apache.orc.RecordReader so that you can layer them on top of the JsonReader.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/orc/pull/95

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/orc/pull/95
          Hide
          leftylev Lefty Leverenz added a comment -

          Should this be documented in the wiki?

          Show
          leftylev Lefty Leverenz added a comment - Should this be documented in the wiki? ORC Tools
          Hide
          owen.omalley Owen O'Malley added a comment -

          Lefty Leverenz Ok, I made a pull request at https://github.com/apache/orc/pull/98 for the updated page.

          Show
          owen.omalley Owen O'Malley added a comment - Lefty Leverenz Ok, I made a pull request at https://github.com/apache/orc/pull/98 for the updated page.
          Hide
          leftylev Lefty Leverenz added a comment -

          Thanks Owen, here's the link:

          By the way, the formatting in paragraph 2 is broken (subcommands).

          Show
          leftylev Lefty Leverenz added a comment - Thanks Owen, here's the link: Java ORC Tools By the way, the formatting in paragraph 2 is broken (subcommands).
          Hide
          ctr Christian Tramnitz added a comment -

          Is this expected to work in combination with ORC-111 (files on hdfs) or only local files?

          Show
          ctr Christian Tramnitz added a comment - Is this expected to work in combination with ORC-111 (files on hdfs) or only local files?
          Hide
          ctr Christian Tramnitz added a comment - - edited

          It seems there is a problem with schema detection when floats are used. A sample json like

          {"float":5.2}
          

          will result in:

          $ java -jar orc-tools-1.4.0-SNAPSHOT-uber.jar convert sample_message.json -o sample.orc
          Scanning sample_message.json for schema
          Exception in thread "main" java.lang.IllegalArgumentException: precision 2 is out of range 1 .. 10
                  at org.apache.orc.TypeDescription.withPrecision(TypeDescription.java:410)
                  at org.apache.orc.tools.json.NumericType.getSchema(NumericType.java:105)
                  at org.apache.orc.tools.json.StructType.getSchema(StructType.java:110)
                  at org.apache.orc.tools.json.JsonSchemaFinder.getSchema(JsonSchemaFinder.java:257)
                  at org.apache.orc.tools.convert.ConvertTool.computeSchema(ConvertTool.java:48)
                  at org.apache.orc.tools.convert.ConvertTool.main(ConvertTool.java:58)
                  at org.apache.orc.tools.Driver.main(Driver.java:112)
          

          Owen O'Malley shouldn't it be "scale < precision" rather than "scale > precision" in /java/core/src/java/org/apache/orc/TypeDescription.java#L417 ?

          Show
          ctr Christian Tramnitz added a comment - - edited It seems there is a problem with schema detection when floats are used. A sample json like {"float":5.2} will result in: $ java -jar orc-tools-1.4.0-SNAPSHOT-uber.jar convert sample_message.json -o sample.orc Scanning sample_message.json for schema Exception in thread "main" java.lang.IllegalArgumentException: precision 2 is out of range 1 .. 10 at org.apache.orc.TypeDescription.withPrecision(TypeDescription.java:410) at org.apache.orc.tools.json.NumericType.getSchema(NumericType.java:105) at org.apache.orc.tools.json.StructType.getSchema(StructType.java:110) at org.apache.orc.tools.json.JsonSchemaFinder.getSchema(JsonSchemaFinder.java:257) at org.apache.orc.tools.convert.ConvertTool.computeSchema(ConvertTool.java:48) at org.apache.orc.tools.convert.ConvertTool.main(ConvertTool.java:58) at org.apache.orc.tools.Driver.main(Driver.java:112) Owen O'Malley shouldn't it be "scale < precision" rather than "scale > precision" in /java/core/src/java/org/apache/orc/TypeDescription.java#L417 ?
          Hide
          owen.omalley Owen O'Malley added a comment -

          Released as part of ORC 1.4.0

          Show
          owen.omalley Owen O'Malley added a comment - Released as part of ORC 1.4.0

            People

            • Assignee:
              owen.omalley Owen O'Malley
              Reporter:
              owen.omalley Owen O'Malley
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development