Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1700

OutOfMemory Problem in ABtDenseOutJob in Distributed SSVD

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 0.9, 0.10.0
    • 0.12.0
    • classic

    Description

      Recently, I tried mahout's hadoop ssvd(mahout-0.9 or mahout-1.0) job. There's a java heap space out of memory problem in ABtDenseOutJob. I found the reason, the ABtDenseOutJob map code is as below:

      protected void map(Writable key, VectorWritable value, Context context)
      throws IOException, InterruptedException {

      Vector vec = value.get();

      int vecSize = vec.size();
      if (aCols == null)

      { aCols = new Vector[vecSize]; }

      else if (aCols.length < vecSize)

      { aCols = Arrays.copyOf(aCols, vecSize); }

      if (vec.isDense()) {
      for (int i = 0; i < vecSize; i++)

      { extendAColIfNeeded(i, aRowCount + 1); aCols[i].setQuick(aRowCount, vec.getQuick(i)); }

      } else if (vec.size() > 0) {
      for (Vector.Element vecEl : vec.nonZeroes())

      { int i = vecEl.index(); extendAColIfNeeded(i, aRowCount + 1); aCols[i].setQuick(aRowCount, vecEl.get()); }

      }
      aRowCount++;
      }

      If the input is RandomAccessSparseVector, usually with big data, it's vec.size() is Integer.MAX_VALUE, which is 2^31, then aCols = new Vector[vecSize] will introduce the OutOfMemory problem. The settlement of course should be enlarge every tasktracker's maximum memory:
      <property>
      <name>mapred.child.java.opts</name>
      <value>-Xmx1024m</value>
      </property>
      However, if you are NOT hadoop administrator or ops, you have no permission to modify the config. So, I try to modify ABtDenseOutJob map code to support RandomAccessSparseVector situation, I use hashmap to represent aCols instead of the original Vector[] aCols array, the modified code is as below:

      private Map<Integer, Vector> aColsMap = new HashMap<Integer, Vector>();
      protected void map(Writable key, VectorWritable value, Context context)
      throws IOException, InterruptedException {

      Vector vec = value.get();
      if (vec.isDense()) {
      for (int i = 0; i < vecSize; i++) {
      //extendAColIfNeeded(i, aRowCount + 1);
      if (aColsMap.get == null)

      { aColsMap.put(i, new RandomAccessSparseVector(Integer.MAX_VALUE, 100)); }
      aColsMap.get.setQuick(aRowCount, vec.getQuick);
      //aCols[i].setQuick(aRowCount, vec.getQuick);
      }
      } else if (vec.size() > 0) {
      for (Vector.Element vecEl : vec.nonZeroes()) {
      int i = vecEl.index();
      //extendAColIfNeeded(i, aRowCount + 1);
      if (aColsMap.get == null) { aColsMap.put(i, new RandomAccessSparseVector(Integer.MAX_VALUE, 100)); }

      aColsMap.get.setQuick(aRowCount, vecEl.get());
      //aCols[i].setQuick(aRowCount, vecEl.get());
      }
      }
      aRowCount++;
      }

      Then the OutofMemory problem is dismissed.

      Attachments

        Activity

          People

            smarthi Suneel Marthi
            lastarsenal Ethan Yi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: