HBase
  1. HBase
  2. HBASE-2965

Implement MultipleTableInputs which is analogous to MultipleInputs in Hadoop

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: mapreduce
    • Labels:
      None

      Description

      This feature would be helpful for doing reduce side joins, or even passing similarly structured data from multiple tables through map reduce. The API I envision would be very similar to the already existent MultipleInputs, parts of which could be reused.

      MultipleTableInputs would have a public api like:

      class MultipleTableInputs

      { public static void addInputTable(Job job, Table table, Scan scan, Class<? extends TableInputFormatBase> inputFormatClass, Class<? extends Mapper> mapperClass); }

      ;

      MultipleTableInputs would build a mapping of Tables to configured TableInputFormats the same way MultipleInputs builds a mapping between Paths and InputFormats. Since most people will probably use TableInputFormat.class as the input format class, the MultipleTableInput implementation will have to replace the TableInputFormatBase's private scan and table members that are configured when an instance of TableInputFormat is created (from within its setConf() method) by calling setScan and setHTable with the table and scan that are passed into addInputTable above. MultipleTableInputFormat's addInputTable() member function would also set the input format for the job to DelegatingTableInputFormat, described below.

      A new class called DelegatingTableInputFormat would be analogous to DelegatingInputFormat, where getSplits() would return TaggedInputSplits (same TaggedInputSplit object that the Hadoop DelegatingInputFormat uses), which tag the split with its InputFormat and Mapper. These are created by looping through the HTable to InputFormat mappings, and calling getSplits on each input format, and using the split, the input format, and mapper as constructor args to TaggedInputSplits.

      The createRecordReader() function in DelegatingTableInputFormat could have the same implementation as the Hadoop DelegatingInputFormat.

        Activity

        Hide
        Nick Dimiduk added a comment -

        I think this ticket is stale; the functionality described is already provided by MultiTableInputFormat in a modern HBase release.

        Show
        Nick Dimiduk added a comment - I think this ticket is stale; the functionality described is already provided by MultiTableInputFormat in a modern HBase release.
        Hide
        Alexey Romanenko added a comment -

        It seems this is not implemented yet, isn't it?

        Show
        Alexey Romanenko added a comment - It seems this is not implemented yet, isn't it?
        Hide
        Ophir Cohen added a comment -

        Hi Akash,
        I just began work on it but I'll be happy to get any comments/remarks concern this topic.
        Ophir

        Show
        Ophir Cohen added a comment - Hi Akash, I just began work on it but I'll be happy to get any comments/remarks concern this topic. Ophir
        Hide
        Akash Ashok added a comment -

        I have not yet started workin on it. Will start next Saturday if no one else has taken it up.

        Show
        Akash Ashok added a comment - I have not yet started workin on it. Will start next Saturday if no one else has taken it up.
        Hide
        Ophir Cohen added a comment -

        Yes please.
        Hopefully I'll start work on it this week

        Show
        Ophir Cohen added a comment - Yes please. Hopefully I'll start work on it this week
        Hide
        stack added a comment -

        I don't know of anyone working on it Ophir. If you want me to assign you the issue just say, and I'll add you as a contributor. Thanks.

        Show
        stack added a comment - I don't know of anyone working on it Ophir. If you want me to assign you the issue just say, and I'll add you as a contributor. Thanks.
        Hide
        Ophir Cohen added a comment -

        Actually I ment to start work on that - but held back by work and other stuff...
        Is anybody working on that?
        Please let me know before I dive into it.
        If nobody work on that - I'll start it soon.
        Thanks.

        Show
        Ophir Cohen added a comment - Actually I ment to start work on that - but held back by work and other stuff... Is anybody working on that? Please let me know before I dive into it. If nobody work on that - I'll start it soon. Thanks.
        Hide
        stack added a comment -

        Akash, do you have a patch?

        Show
        stack added a comment - Akash, do you have a patch?
        Hide
        Akash Ashok added a comment -

        Hi, I was lookin for the exact same thing. As we are moving from just processing on hadoop to using Hbase, we are in dire need of this MultipleTableInputs for our reduce side joins. Could some1 please temme as to when this will be implemented .

        Also can I move this feature from Minor to Major as this is a very important feature?

        Show
        Akash Ashok added a comment - Hi, I was lookin for the exact same thing. As we are moving from just processing on hadoop to using Hbase, we are in dire need of this MultipleTableInputs for our reduce side joins. Could some1 please temme as to when this will be implemented . Also can I move this feature from Minor to Major as this is a very important feature?

          People

          • Assignee:
            Unassigned
            Reporter:
            Adam Warrington
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development