Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2112

Create a Common Data-Generator for Testing Hadoop

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      It is useful to have a common data-generator for testing Hadoop and related projects. Such a tool
      should be able to generate data in a specified format and should be able to use a Hadoop cluster
      for speeding up the data-generation. This tool can then be used across Hadoop (e.g. GridMix3),
      Pig, Hive, etc. reducing the need for each project to invent something like this itself.

      We can use the data-generator used in PigMix2 (PIG-200) as a starting point. It is described
      in http://wiki.apache.org/pig/DataGeneratorHadoop. Since it depends on the SDSU
      Java library (http://www.eli.sdsu.edu/java-SDSU/) released under the GNU GPL, it has to be
      modified a bit to eliminate this dependency before it can be included in Apache Hadoop.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ranjit Ranjit Mathew
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: