Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-23

PCollection#sort doesn't do a full sort on values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.3.0
    • None
    • None

    Description

      When a PCollection is sorted (using PCollection#sort), the sorting that is performed is only per reducer, and not an absolute sort over all values. This means that the values are not in sorted order if they are iterated over on a materialized collection. It also means that the sorted files that are output from a sort operation can not be simply concatenated to come to a single sorted file.

      Attachments

        1. 0001-CRUNCH-23-fix-sorting.patch
          15 kB
          Rahul Sharma
        2. CRUNCH-23-hardcoded_reducers.patch
          1 kB
          Rahul Sharma
        3. CRUNCH-23-sorting-issue.patch
          36 kB
          Rahul Sharma
        4. CRUNCH-23-used-TotalOrderpartioner-for-sorting-keys.patch
          8 kB
          Rahul Sharma
        5. SortTest.java
          0.8 kB
          Gabriel Reid

        Activity

          People

            rahul.sharma Rahul Sharma
            gabriel.reid Gabriel Reid
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: