Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6453

Repeatable Input File Format

    XMLWordPrintableJSON

Details

    Description

      We are interested in running the training process of deep learning architectures on Hadoop clusters. We developed an algorithm that can carry out this training process in a MapReduce fashion. However, there is still a problem that we can improve.

      In deep learning, training data is usually repeated multiple times (10 or even more). However, we were not able to find a way to go through the input training file multiple times without having to reduce first and then go back and then map and reduce and so on so forth. So, to carry on the experiments, we were forced to phyiscally repeat the files 10 or 20 times. This is not the best solution, obviously, because first the file size is becoming much larger, and second, it is not a neat way to carry out the job.

      Thus, what we aim to do is to create an interface that input file formats can implement that would provide them with the ability to repeat a file n times before eventually reducing, which will solve the problem and make Hadoop more suitable for the training of deep learning algorithms, or for such problems that require going over the data multiple times before reducing.

      Attachments

        Activity

          People

            AbdulRahman AbdulRahman AlHamali
            AbdulRahman AbdulRahman AlHamali
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 672h
                672h
                Remaining:
                Remaining Estimate - 672h
                672h
                Logged:
                Time Spent - Not Specified
                Not Specified