Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-1973

Improve CsvBulkLoadTool performance by moving keyvalue construction from map phase to reduce phase

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7.0
    • Component/s: None
    • Labels:

      Description

      It's similar to HBASE-8768. Only thing is we need to write custom mapper and reducer in Phoenix. In Map phase we just need to get row key from primary key columns and write the full text of a line as usual(to ensure sorting). In reducer we need to get actual key values by running upsert query.
      It's basically reduces lot of map output to write to disk and data need to be transferred through network.

        Attachments

        1. PHOENIX-1973-1.patch
          16 kB
          Sergey Soldatov
        2. PHOENIX-1973-2.patch
          20 kB
          Sergey Soldatov
        3. PHOENIX-1973-3.patch
          20 kB
          Sergey Soldatov
        4. PHOENIX-1973-4.patch
          20 kB
          Sergey Soldatov
        5. PHOENIX-1973-5.patch
          21 kB
          Sergey Soldatov
        6. PHOENIX-1973-6.patch
          24 kB
          Sergey Soldatov
        7. PHOENIX-1973-7.patch
          24 kB
          Sergey Soldatov

          Activity

            People

            • Assignee:
              sergey.soldatov Sergey Soldatov
              Reporter:
              rajeshbabu Rajeshbabu Chintaguntla
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: