Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12936

Better support for multimap semantics

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Currently life gets difficult when working with data in the form of: array<struct<key:string,value:string>>.

      When processing incoming files, the struct type as well as the simpler: map<string,string> are well supported. If the incoming data has duplicate keys, then the array struct semantic needs to be used or data will be lost. But at this point it's very difficult to perform reasonable queries.

      There are various UDF features I'd like to see, as well as Serde for TextInputFormat.

      Examples:
      UDF:

      • str_to_map - have an equivalent for str_to_structarray.
      • array_struct_indexof - Search the array of structs and return the first offset. This is very difficult to perform in a reasonable manner using straight SQL, as I believe it needs: lateral outer view inline partition by over. I need to be able to say str_to_structarray("k=v,k=v2", "key","value") to get array(struct(key,value)). And I need to be able to run array_struct_indexof(array(struct), "key", "k") to get an offset of [0] so I can reasonably select the value.

      For TextInputFormat, I'd like to be able to process Map<string, Array<string>>. This would simply collect values instead of only using one value when there are duplicate keys.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Downchuck Charles Pritchard
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: