Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3086

Supporting range scan using TFile, TotalOrderPartitioner and partition index

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.205.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Hive/HBase already has similar or more powerful functionality, but using hive/hbase is overkill or inconvenient for some cases, so add some lightweight utility classes to only support range scan should be reasonable. The utility classes include:

      1. InputFormat supporting range scan: Indexed(Text|Binary)InputFormat
        The input directory for IndexInputFormat should contain one partition index and many tfiles, each tfile store a certain range of keys, not overlapping with other tfiles, the boundaries are stored in partition index.
        Add 4 jobconfs: mapred.indexed(text|binary)inputformat.key.(start|end), indicate range scan parameters.
        For a mapreduce job using IndexedInputFormat, IndexedInputFormat.getSplits filter out tfiles which are not in the scan range using partition index
        IndexedInputFormat do not support multi directory & splitting in single file, these can be added in future.
      2. Tool to convert data of other format into IndexedInputForamt: TotalOrderIndexBuilder
        If the input data is already total order partitioned and is tfile format, just add partition index to input directory
        Or run InputSampler to generate partiton index, then run mapreduce job with TotalOrder partitioner to generate tfile backed data, finally move partition index to output directory.
      3. Client tool to scan/search indexed data directory

        Activity

        Binglin Chang created issue -
        Binglin Chang made changes -
        Field Original Value New Value
        Assignee Binglin Chang [ decster ]
        Fix Version/s 0.23.0 [ 12315570 ]
        Fix Version/s 0.20.204.0 [ 12316318 ]
        Target Version/s 0.20.205.0 [ 12316391 ]
        Hide
        Binglin Chang added a comment -

        preliminary patch for 0.20

        Show
        Binglin Chang added a comment - preliminary patch for 0.20
        Binglin Chang made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.20.205.0 [ 12316391 ]
        Binglin Chang made changes -
        Attachment MAPREDUCE-3086.v1.patch [ 12498597 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12498597/MAPREDUCE-3086.v1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 8 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/987//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498597/MAPREDUCE-3086.v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/987//console This message is automatically generated.
        Hide
        Binglin Chang added a comment -

        It seems that most test cases for o.a.h.mapred are not integrated into hadoop-mapreduce-client-core, and not activated yet. If I want to make this patch against trunk and enable unit test, where do I put the source files to?

        Show
        Binglin Chang added a comment - It seems that most test cases for o.a.h.mapred are not integrated into hadoop-mapreduce-client-core, and not activated yet. If I want to make this patch against trunk and enable unit test, where do I put the source files to?
        Allen Wittenauer made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Allen Wittenauer added a comment -

        patch no longer applies.

        Show
        Allen Wittenauer added a comment - patch no longer applies.
        Allen Wittenauer made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Allen Wittenauer made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        16d 5h 45m 2 Allen Wittenauer 06/Feb/15 23:19
        Patch Available Patch Available Open Open
        1214d 9h 16m 2 Allen Wittenauer 06/Feb/15 23:19

          People

          • Assignee:
            Binglin Chang
            Reporter:
            Binglin Chang
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development