Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-120

Create a backwards compatibility mode of ignoring names for evolution

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: None
    • Labels:
      None

      Description

      ORC's schema evolution uses the column names when they are available. Hive 2.1 uses a positional schema, so ORC should support a backward compatibility mode for Hive users during the transition.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user omalley opened a pull request:

          https://github.com/apache/orc/pull/72

          ORC-120. Add option to force positional matching of schema evolution.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/omalley/orc orc-120

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/orc/pull/72.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #72


          commit 65f77ee9f00792637cdee66342f0016549bb3ca1
          Author: Owen O'Malley <omalley@apache.org>
          Date: 2016-12-13T02:49:44Z

          ORC-120. Add option to force positional matching of schema evolution.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user omalley opened a pull request: https://github.com/apache/orc/pull/72 ORC-120 . Add option to force positional matching of schema evolution. You can merge this pull request into a Git repository by running: $ git pull https://github.com/omalley/orc orc-120 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/orc/pull/72.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #72 commit 65f77ee9f00792637cdee66342f0016549bb3ca1 Author: Owen O'Malley <omalley@apache.org> Date: 2016-12-13T02:49:44Z ORC-120 . Add option to force positional matching of schema evolution.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/orc/pull/72

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/orc/pull/72
          Hide
          owen.omalley Owen O'Malley added a comment -

          I just committed this.

          Show
          owen.omalley Owen O'Malley added a comment - I just committed this.
          Hide
          leftylev Lefty Leverenz added a comment -

          This will need documentation, as discussed in some Dec. 12 messages on the dev@orc email thread, archived here:

          It's strange that the messages don't appear here in the JIRA.

          Synopsis:

          Dain Sundstrom – Is "ORC's schema evolution uses the column names when they are available” documented somewhere?

          Owen O'Malley – No, unfortunately, but it needs to be. The basic rules from SchemaEvolution.java look like:

          structs (including the top row):
          if field names are available (post HIVE-4243), use name matching
          otherwise use positional matching

          lists, maps, unions:
          children must match

          Many primitives can convert to each other, but this list needs to be cleaned up:
          boolean, byte, short, int, long, float, double, decimal -> boolean, byte,
          short, int, long, float, double, decimal, string, char, varchar, timestamp
          string, char, varchar -> all
          timestamp -> boolean, byte, short, int, long, float, double, decimal,
          string, char, varchar, date
          date -> string, char, varchar, timestamp
          binary -> string, char, varchar, date

          Dain Sundstrom – So, rename column is not expected to work anymore?

          Owen O'Malley – ORC-120 will add an option to force positional mapping.

          Dain Sundstrom – Oh, I see this is an ORC feature like the Parquet schema evolution stuff. We implemented support for ordering by the top level struct names in Presto a while back.

          Show
          leftylev Lefty Leverenz added a comment - This will need documentation, as discussed in some Dec. 12 messages on the dev@orc email thread, archived here: https://mail-archives.apache.org/mod_mbox/orc-dev/201612.mbox/%3cED4820A4-8EE1-462A-8658-551D543237F2@iq80.com%3e It's strange that the messages don't appear here in the JIRA. Synopsis: Dain Sundstrom – Is "ORC's schema evolution uses the column names when they are available” documented somewhere? Owen O'Malley – No, unfortunately, but it needs to be. The basic rules from SchemaEvolution.java look like: structs (including the top row): if field names are available (post HIVE-4243 ), use name matching otherwise use positional matching lists, maps, unions: children must match Many primitives can convert to each other, but this list needs to be cleaned up: boolean, byte, short, int, long, float, double, decimal -> boolean, byte, short, int, long, float, double, decimal, string, char, varchar, timestamp string, char, varchar -> all timestamp -> boolean, byte, short, int, long, float, double, decimal, string, char, varchar, date date -> string, char, varchar, timestamp binary -> string, char, varchar, date Dain Sundstrom – So, rename column is not expected to work anymore? Owen O'Malley – ORC-120 will add an option to force positional mapping. Dain Sundstrom – Oh, I see this is an ORC feature like the Parquet schema evolution stuff. We implemented support for ordering by the top level struct names in Presto a while back.
          Hide
          leftylev Lefty Leverenz added a comment -

          The new configuration parameter in OrcConf.java is orc.force.positional.evolution.

          Show
          leftylev Lefty Leverenz added a comment - The new configuration parameter in OrcConf.java is orc.force.positional.evolution .
          Hide
          owen.omalley Owen O'Malley added a comment -

          ORC 1.3.0 was released.

          Show
          owen.omalley Owen O'Malley added a comment - ORC 1.3.0 was released.

            People

            • Assignee:
              owen.omalley Owen O'Malley
              Reporter:
              owen.omalley Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development