Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3753

'CTAS' and INSERT OVERWRITE send different column names to the underlying SerDe

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.9.0
    • None
    • None

    Description

      A good example is with a JSON serde (https://github.com/rathboma/Hive-JSON-Serde-1)
      Here is a simple example of how the two results differ:
      CREATE TABLE foo ROW FORMAT SERDE '....JsonSerDe' SELECT host from table1;
      generates =>

      {"_col0": "localhost"}

      CREATE TABLE foo(host string) ROW FORMAT SERDE '....JsonSerDe';
      INSERT OVERWRITE TABLE FOO SELECT host FROM table;
      generates =>

      {"host": "localhost"}

      The SerDe gets passed column names in two places:
      1) The property Constants.LIST_COLUMNS
      2) It gets passed a StructObjectInspector on serialize

      In the CTAS example above, both of these contain '_col0' as the column name. This is not true in the second example, as the LIST_COLUMNS property contains the real column names.

      I'd be happy to help out with this change, but I fear that the solution lies somewhere in SemanticAnalyser.java, and I'm having a hard time finding my way around.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rathboma Matthew Rathbone
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified