Hive
  1. Hive
  2. HIVE-1540

Read-only, columnar data file for nested data structures

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      RCFile is a great start on an optimized layout for working with structured data with Hive. Given that Hive's data model supports nested lists and maps, and taking inspiration from the recent work by Google on Dremel, it may be useful for the Hive community to think about how to improve the RCFile format for nested data structures.

        Activity

        Hide
        Joydeep Sen Sarma added a comment -

        are there a lot of use cases for nested data structures? Google's approach is motivated by widespread use of Protocol Buffers. At Facebook - thrift serialized data sets (that motivated the initial support for nested data types) hasn't taken off.

        I think what's much more common is json serialized data (or map types more restrictively). it would be much more worthwhile, to begin with, to have optimized codecs and deserializers for map types.

        Show
        Joydeep Sen Sarma added a comment - are there a lot of use cases for nested data structures? Google's approach is motivated by widespread use of Protocol Buffers. At Facebook - thrift serialized data sets (that motivated the initial support for nested data types) hasn't taken off. I think what's much more common is json serialized data (or map types more restrictively). it would be much more worthwhile, to begin with, to have optimized codecs and deserializers for map types.
        Hide
        Jeff Hammerbacher added a comment -

        We've got an increasing number of customers using Avro's serialization format for working with data in Hadoop, and that's where our nested data structures come from. Any design which could incorporate a serialization framework like Avro would be of interest to me.

        Show
        Jeff Hammerbacher added a comment - We've got an increasing number of customers using Avro's serialization format for working with data in Hadoop, and that's where our nested data structures come from. Any design which could incorporate a serialization framework like Avro would be of interest to me.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jeff Hammerbacher
          • Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

            • Created:
              Updated:

              Development