Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3086

Supporting range scan using TFile, TotalOrderPartitioner and partition index

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.205.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Hive/HBase already has similar or more powerful functionality, but using hive/hbase is overkill or inconvenient for some cases, so add some lightweight utility classes to only support range scan should be reasonable. The utility classes include:

      1. InputFormat supporting range scan: Indexed(Text|Binary)InputFormat
        The input directory for IndexInputFormat should contain one partition index and many tfiles, each tfile store a certain range of keys, not overlapping with other tfiles, the boundaries are stored in partition index.
        Add 4 jobconfs: mapred.indexed(text|binary)inputformat.key.(start|end), indicate range scan parameters.
        For a mapreduce job using IndexedInputFormat, IndexedInputFormat.getSplits filter out tfiles which are not in the scan range using partition index
        IndexedInputFormat do not support multi directory & splitting in single file, these can be added in future.
      2. Tool to convert data of other format into IndexedInputForamt: TotalOrderIndexBuilder
        If the input data is already total order partitioned and is tfile format, just add partition index to input directory
        Or run InputSampler to generate partiton index, then run mapreduce job with TotalOrder partitioner to generate tfile backed data, finally move partition index to output directory.
      3. Client tool to scan/search indexed data directory

        Activity

        Hide
        Binglin Chang added a comment -

        It seems that most test cases for o.a.h.mapred are not integrated into hadoop-mapreduce-client-core, and not activated yet. If I want to make this patch against trunk and enable unit test, where do I put the source files to?

        Show
        Binglin Chang added a comment - It seems that most test cases for o.a.h.mapred are not integrated into hadoop-mapreduce-client-core, and not activated yet. If I want to make this patch against trunk and enable unit test, where do I put the source files to?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12498597/MAPREDUCE-3086.v1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 8 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/987//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498597/MAPREDUCE-3086.v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/987//console This message is automatically generated.
        Binglin Chang made changes -
        Attachment MAPREDUCE-3086.v1.patch [ 12498597 ]
        Binglin Chang made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.20.205.0 [ 12316391 ]
        Hide
        Binglin Chang added a comment -

        preliminary patch for 0.20

        Show
        Binglin Chang added a comment - preliminary patch for 0.20
        Binglin Chang made changes -
        Field Original Value New Value
        Assignee Binglin Chang [ decster ]
        Fix Version/s 0.23.0 [ 12315570 ]
        Fix Version/s 0.20.204.0 [ 12316318 ]
        Target Version/s 0.20.205.0 [ 12316391 ]
        Binglin Chang created issue -

          People

          • Assignee:
            Binglin Chang
            Reporter:
            Binglin Chang
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development