Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-545

Writing to HFiles starts a job per column family

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      When writing to HFiles via HFileUtils.writeToHFilesForIncrementalLoad, a separate MR job is started up per column family defined for the table, regardless of whether or not there is any data for each of these column families.

      Each of the column family jobs runs over the full set of Cells, filters for the desired column family, and then partitions the data.

      For tables with multiple column families, it would be a lot more efficient to sort/partition all of the data together, and then split it out per column family afterwards.

        Attachments

        1. CRUNCH-545.patch
          7 kB
          Gabriel Reid
        2. post.dot.png
          90 kB
          Gabriel Reid
        3. pre.dot.png
          110 kB
          Gabriel Reid

          Activity

            People

            • Assignee:
              gabriel.reid Gabriel Reid
              Reporter:
              gabriel.reid Gabriel Reid
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: