Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14150

Add BulkLoad functionality to HBase-Spark Module

    XMLWordPrintableJSON

    Details

    • Hadoop Flags:
      Reviewed

      Description

      Add on to the work done in HBASE-13992 to add functionality to do a bulk load from a given RDD.

      This will do the following:
      1. figure out the number of regions and sort and partition the data correctly to be written out to HFiles
      2. Also unlike the MR bulkload I would like that the columns to be sorted in the shuffle stage and not in the memory of the reducer. This will allow this design to support super wide records with out going out of memory.

        Attachments

        1. HBASE-14150.1.patch
          43 kB
          Theodore michael Malaska
        2. HBASE-14150.2.patch
          42 kB
          Theodore michael Malaska
        3. HBASE-14150.3.patch
          47 kB
          Theodore michael Malaska
        4. HBASE-14150.4.patch
          47 kB
          Theodore michael Malaska
        5. HBASE-14150.5.patch
          47 kB
          Theodore michael Malaska

          Issue Links

            Activity

              People

              • Assignee:
                ted.m Theodore michael Malaska
                Reporter:
                ted.m Theodore michael Malaska
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: