Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6942

Endpoint implementation for bulk deletion of data

    XMLWordPrintableJSON

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This issue gives an example Endpoint implementation for efficiently deleting bulk data from tables. Which data to be deleted can be controlled using a Scan object passed to the endpoint.
      We can delete rows, column families, column qualifiers or cell versions based on delete type passed.
      Optionally timestamp also can be passed. When timestamp is passed for ROW, FAMILY and COLUMN delete types, all the versions before that time(specified time inclusive) will get deleted.
      When the type is VERSION, if a timestamp is passed, only one version(with ts as given value) of all the cells which the Scan selected will be deleted. When no timestamp value passed for VERSION type delete it will delete all the cell versions which the Scan selected. Using appropriate Scan with Timerange etc user can control which versions to be deleted.
      The API returns the number of rows deleted (In types other than ROW it is not entire row deleted) and when type is VERSION it will return total number of versions deleted also.
      The Scan can be created with a rowkey range, with some filters, with Timerange etc based on the delete usecase.

      Show
      This issue gives an example Endpoint implementation for efficiently deleting bulk data from tables. Which data to be deleted can be controlled using a Scan object passed to the endpoint. We can delete rows, column families, column qualifiers or cell versions based on delete type passed. Optionally timestamp also can be passed. When timestamp is passed for ROW, FAMILY and COLUMN delete types, all the versions before that time(specified time inclusive) will get deleted. When the type is VERSION, if a timestamp is passed, only one version(with ts as given value) of all the cells which the Scan selected will be deleted. When no timestamp value passed for VERSION type delete it will delete all the cell versions which the Scan selected. Using appropriate Scan with Timerange etc user can control which versions to be deleted. The API returns the number of rows deleted (In types other than ROW it is not entire row deleted) and when type is VERSION it will return total number of versions deleted also. The Scan can be created with a rowkey range, with some filters, with Timerange etc based on the delete usecase.

      Description

      We can provide an end point implementation for doing a bulk deletion of data(based on a scan) at the server side. This can reduce the time taken for such an operation as right now it need to do a scan to client and issue delete(s) using rowkeys.

      Query like delete from table1 where...

        Attachments

        1. HBASE-6942.patch
          15 kB
          Anoop Sam John
        2. HBASE-6942_V2.patch
          15 kB
          Anoop Sam John
        3. HBASE-6942_V3.patch
          15 kB
          Anoop Sam John
        4. HBASE-6942_V4.patch
          15 kB
          Anoop Sam John
        5. HBASE-6942_V5.patch
          28 kB
          Anoop Sam John
        6. HBASE-6942_V6.patch
          32 kB
          Anoop Sam John
        7. HBASE-6942_DeleteTemplate.patch
          30 kB
          Anoop Sam John
        8. HBASE-6942_V7.patch
          32 kB
          Anoop Sam John
        9. HBASE-6942_Trunk.patch
          92 kB
          Anoop Sam John
        10. HBASE-6942_94-V8.patch
          31 kB
          Anoop Sam John
        11. HBASE-6942_Trunk-V2.patch
          93 kB
          Anoop Sam John

          Activity

            People

            • Assignee:
              anoopsamjohn Anoop Sam John
              Reporter:
              anoopsamjohn Anoop Sam John
            • Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: