Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14520

Optimize the number of calls for tags creation in bulk load

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • None
    • None
    • Reviewed

    Description

      At present, ttl and Visibility expr is one per tsv line i.e. the values and the tags remain same for all the columns present in that line. As per the code, List of tags are created for each cell, Instead of creating new tags for each cell, tags created once for the line can be reused by other cells.

      Assume 1Million rows and 1000 columns. Currently tags creation will happen for 1M * 1000 times. If reuse the tags, the tags creation can reduce to 1M times. (i.e. one per tsv line).

      This is applicable in both TsvImporterMapper and TextSortReducer logic.

      Attachments

        1. HBASE-14520.patch
          5 kB
          Bhupendra Kumar Jain

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Bhupendra Bhupendra Kumar Jain
            Bhupendra Bhupendra Kumar Jain
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment