Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-333

Add TFileTransport deserializer

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • Linux

    Description

      I've been googling around all night and havn't really found what I am looking for. Basically, I want to transfer some data from my web servers to hive in a format that's a little more verbose than plain CSV files. It seems like JSON or thrift would be perfect for this. I am planning on sending this serialized json or thrift data through scribe and loading it into Hive.. I just can't figure out how to tell hive that the input data is a bunch of serialized thrift records (all of the records are the "struct" type) in a TFileTransport. Hopefully this makes sense...

      Reply from Joydeep Sen Sarma (jssarma@facebook.com)

      Unfortunately the open source code base does not have the loaders we run to convert thrift records in a tfiletransport into a sequencefile that hadoop/hive can work with. One option is that we add this to Hive code base (should be straightforward).

      No process required. Please file a jira - I will try to upload a patch this weekend (just cut'n'paste for most part). Would appreciate some help in finessing it out .. (the internal code is hardwired to some assumptions etc. )

      Attachments

        1. libthrift_asf.jar
          187 kB
          Joydeep Sen Sarma
        2. hive-333.patch.2
          42 kB
          Joydeep Sen Sarma
        3. hive-333.patch.1
          36 kB
          Joydeep Sen Sarma

        Issue Links

          Activity

            People

              jsensarma Joydeep Sen Sarma
              scorona Steve Corona
              Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: