Mahout
  1. Mahout
  2. MAHOUT-235

GenericSorting.java also needs replacing

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.2
    • Fix Version/s: 0.3
    • Component/s: Math
    • Labels:
      None

      Description

      It looks like GenericSorting.java is more code of the same dubious parentage that needs the same treatment.

      1. 235_and_hash_bases.patch
        98 kB
        Benson Margulies
      2. oops.diff
        6 kB
        Benson Margulies
      3. patch.235.take2.diff
        99 kB
        Benson Margulies

        Activity

        Sean Owen made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Benson Margulies added a comment -

        looks good, please go ahead and resolve.

        Show
        Benson Margulies added a comment - looks good, please go ahead and resolve.
        Sean Owen made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Sean Owen added a comment -

        OK committed, per Benson

        Show
        Sean Owen added a comment - OK committed, per Benson
        Grant Ingersoll made changes -
        Assignee Grant Ingersoll [ gsingers ]
        Hide
        Drew Farris added a comment -

        Ok, applied patch against r895535 by running the following from the top level mahout directory:

        patch -p0 -E < patch.235.take2.diff

        Full build and tests complete successfully for me.

        Show
        Drew Farris added a comment - Ok, applied patch against r895535 by running the following from the top level mahout directory: patch -p0 -E < patch.235.take2.diff Full build and tests complete successfully for me.
        Hide
        Benson Margulies added a comment -

        I had still posted the wrong patch. Correct patch now available.

        Show
        Benson Margulies added a comment - I had still posted the wrong patch. Correct patch now available.
        Benson Margulies made changes -
        Attachment patch.235.take2.diff [ 12429337 ]
        Hide
        Benson Margulies added a comment -

        Here's the correct patch

        Show
        Benson Margulies added a comment - Here's the correct patch
        Sean Owen made changes -
        Affects Version/s 0.2 [ 12313278 ]
        Affects Version/s 0.3 [ 12314281 ]
        Fix Version/s 0.3 [ 12314281 ]
        Benson Margulies made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Benson Margulies added a comment -

        Please ignore oops.patch and take the main patch.

        Show
        Benson Margulies added a comment - Please ignore oops.patch and take the main patch.
        Benson Margulies made changes -
        Attachment 235_and_hash_bases.patch [ 12429264 ]
        Hide
        Benson Margulies added a comment -

        Here is an IP-clean replacement for the GenericSorting quicksort and the first raft of templatization of the hash table code.

        Show
        Benson Margulies added a comment - Here is an IP-clean replacement for the GenericSorting quicksort and the first raft of templatization of the hash table code.
        Hide
        Benson Margulies added a comment - - edited

        I think I see the problem, it's the partitionValue. In all the working versions, that is a VALUE, not an index. And the value is captured, even as the cell it came from might get swapped.

        Show
        Benson Margulies added a comment - - edited I think I see the problem, it's the partitionValue. In all the working versions, that is a VALUE, not an index. And the value is captured, even as the cell it came from might get swapped.
        Hide
        Benson Margulies added a comment -

        the odd thing is that I got here by modifying
        one of the many working copies to use an
        external swapper.

        On Dec 31, 2009, at 6:24 PM, "Ted Dunning (JIRA)" <jira@apache.org>

        Show
        Benson Margulies added a comment - the odd thing is that I got here by modifying one of the many working copies to use an external swapper. On Dec 31, 2009, at 6:24 PM, "Ted Dunning (JIRA)" <jira@apache.org>
        Hide
        Ted Dunning added a comment - - edited

        I just looked at this. Didn't find the problem, but did make some comments in the code and clarified it a bit. My guess is that the main partition loop leaves b and c in the wrong order. If i get another chance to look at this, I will put in some assertions that match what I think the loop invariants should be. That will probably make things obvious.

        Here is my commented version fo the key sort routine:

          private static void quickSort0(int start, int end, IntComparator comp, Swapper swap) {
            int length = end - start;
            if (length < 7) {
              insertionSort(start, end, comp, swap);
              return;
            }
            int middle = (start + end) / 2;
            if (length > 7) {
              int bottom = start;
              int top = end - 1;
              if (length > 40) {
                // for lots of data, bottom, middle and top are medians near the beginning, middle or end of the data
                int skosh = length / 8;
                bottom = med3(bottom, bottom + skosh, bottom + (2 * skosh), comp);
                middle = med3(middle - skosh, middle, middle + skosh, comp);
                top = med3(top - (2 * skosh), top - skosh, top, comp);
              }
              middle = med3(bottom, middle, top, comp);
            }
        
            int partitionIndex = middle; // an index, not a value.
        
            // regions from a to b and from c to d are what we will recursively sort
            int a, b, c, d;
            a = b = start;
            c = d = end - 1;
            while (b <= c) {
              // copy all values equal to the partition value to before a..b.  In the process, advance b
              // as long as values less than the partition or equal are found, also stop when a..b collides with c..d
              int comparison = comp.compare(b, partitionIndex);
              while (b <= c && comparison <= 0) {
                if (comparison == 0) {
                  swap.swap(a, b);
                  a++;
                }
                b++;
                comparison = comp.compare(b, partitionIndex);
              }
              // at this point [start..a) has partition values, [a..b) has values < partition
              // also, either b>c or v[b] > partition value
        
              comparison = comp.compare(c, partitionIndex);
              while (c >= b && comparison >= 0) {
                if (comparison == 0) {
                  swap.swap(c, d);
                  d--;
                }
                c--;
                comparison = comp.compare(c, partitionIndex);
              }
              // now we also know that [d..end] contains partition values,
              // [c..d) contains values > partition value
              // also, either b>c or (v[b] > partition OR v[c] < partition)
        
              if (b <= c) {
                // v[b] > partition OR v[c] < partition
                // swapping will let us continue to grow the two regions
                swap.swap(b, c);
                b++;
                c--;
              }
            }
            // now we know
            // b = c+1
            // [start..a) and [d..end) contain partition value
            // all of [a..b) are less than partition
            // all of [c..d) are greater than partition
        
            // shift [a..b) to beginning
            length = Math.min(a - start, b - a);
            int l = start;
            int h = b - length;
            while (length-- > 0) {
              swap.swap(l, h);
              l++;
              h++;
            }
        
            // shift [c..d) to end
            length = Math.min(d - c, end - 1 - d);
            l = b;
            h = end - length;
             while (length-- > 0) {
              swap.swap(l, h);
              l++;
              h++;
            }
        
            // recurse left and right
            length = b - a;
            if (length > 0) {
              quickSort0(start, start + length, comp, swap);
            }
        
            length = d - c;
            if (length > 0) {
              quickSort0(end - length, end, comp, swap);
            }
          }
        
          /**
           * In-place insertion sort that is fast for pre-sorted data.
           *
           * @param start Where to start sorting (inclusive)
           * @param end   Where to stop (exclusive)
           * @param comp  Sort order.
           * @param swap  How to swap items.
           */
          private static void insertionSort(int start, int end, IntComparator comp, Swapper swap) {
            for (int i = start + 1; i < end; i++) {
              for (int j = i; j > start && comp.compare(j - 1, j) > 0; j--) {
                swap.swap(j - 1, j);
              }
            }
          }
        
        Show
        Ted Dunning added a comment - - edited I just looked at this. Didn't find the problem, but did make some comments in the code and clarified it a bit. My guess is that the main partition loop leaves b and c in the wrong order. If i get another chance to look at this, I will put in some assertions that match what I think the loop invariants should be. That will probably make things obvious. Here is my commented version fo the key sort routine: private static void quickSort0(int start, int end, IntComparator comp, Swapper swap) { int length = end - start; if (length < 7) { insertionSort(start, end, comp, swap); return; } int middle = (start + end) / 2; if (length > 7) { int bottom = start; int top = end - 1; if (length > 40) { // for lots of data, bottom, middle and top are medians near the beginning, middle or end of the data int skosh = length / 8; bottom = med3(bottom, bottom + skosh, bottom + (2 * skosh), comp); middle = med3(middle - skosh, middle, middle + skosh, comp); top = med3(top - (2 * skosh), top - skosh, top, comp); } middle = med3(bottom, middle, top, comp); } int partitionIndex = middle; // an index, not a value. // regions from a to b and from c to d are what we will recursively sort int a, b, c, d; a = b = start; c = d = end - 1; while (b <= c) { // copy all values equal to the partition value to before a..b. In the process, advance b // as long as values less than the partition or equal are found, also stop when a..b collides with c..d int comparison = comp.compare(b, partitionIndex); while (b <= c && comparison <= 0) { if (comparison == 0) { swap.swap(a, b); a++; } b++; comparison = comp.compare(b, partitionIndex); } // at this point [start..a) has partition values, [a..b) has values < partition // also, either b>c or v[b] > partition value comparison = comp.compare(c, partitionIndex); while (c >= b && comparison >= 0) { if (comparison == 0) { swap.swap(c, d); d--; } c--; comparison = comp.compare(c, partitionIndex); } // now we also know that [d..end] contains partition values, // [c..d) contains values > partition value // also, either b>c or (v[b] > partition OR v[c] < partition) if (b <= c) { // v[b] > partition OR v[c] < partition // swapping will let us continue to grow the two regions swap.swap(b, c); b++; c--; } } // now we know // b = c+1 // [start..a) and [d..end) contain partition value // all of [a..b) are less than partition // all of [c..d) are greater than partition // shift [a..b) to beginning length = Math.min(a - start, b - a); int l = start; int h = b - length; while (length-- > 0) { swap.swap(l, h); l++; h++; } // shift [c..d) to end length = Math.min(d - c, end - 1 - d); l = b; h = end - length; while (length-- > 0) { swap.swap(l, h); l++; h++; } // recurse left and right length = b - a; if (length > 0) { quickSort0(start, start + length, comp, swap); } length = d - c; if (length > 0) { quickSort0(end - length, end, comp, swap); } } /** * In-place insertion sort that is fast for pre-sorted data. * * @param start Where to start sorting (inclusive) * @param end Where to stop (exclusive) * @param comp Sort order. * @param swap How to swap items. */ private static void insertionSort(int start, int end, IntComparator comp, Swapper swap) { for (int i = start + 1; i < end; i++) { for (int j = i; j > start && comp.compare(j - 1, j) > 0; j--) { swap.swap(j - 1, j); } } }
        Benson Margulies made changes -
        Field Original Value New Value
        Attachment oops.diff [ 12429217 ]
        Hide
        Benson Margulies added a comment -

        This doesn't work, and I'm a bit stumped.

        Show
        Benson Margulies added a comment - This doesn't work, and I'm a bit stumped.
        Benson Margulies created issue -

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Benson Margulies
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development