Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0
    • Component/s: mapreduce, snapshots
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Added TableSnapshotInputFormat and TableSnapshotScanner for performing scans over hbase table snapshots from the client side, bypassing the hbase servers. The former configures a mapreduce job, while the latter does single client side scan over snapshot files. Can also be used with offline HBase with in-place or exported snapshot files.

      WARNING: This feature bypasses HBase-level security completely since the files are read from the hdfs directly. The user who is running the scan / job has to have read permissions to the data files and snapshot files.

      Show
      Added TableSnapshotInputFormat and TableSnapshotScanner for performing scans over hbase table snapshots from the client side, bypassing the hbase servers. The former configures a mapreduce job, while the latter does single client side scan over snapshot files. Can also be used with offline HBase with in-place or exported snapshot files. WARNING: This feature bypasses HBase-level security completely since the files are read from the hdfs directly. The user who is running the scan / job has to have read permissions to the data files and snapshot files.

      Description

      The idea is to add an InputFormat, which can run the mapreduce job over snapshot files directly bypassing hbase server layer. The IF is similar in usage to TableInputFormat, taking a Scan object from the user, but instead of running from an online table, it runs from a table snapshot. We do one split per region in the snapshot, and open an HRegion inside the RecordReader. A RegionScanner is used internally for doing the scan without any HRegionServer bits.

      Users have been asking and searching for ways to run MR jobs by reading directly from hfiles, so this allows new use cases if reading from stale data is ok:

      • Take snapshots periodically, and run MR jobs only on snapshots.
      • Export snapshots to remote hdfs cluster, run the MR jobs at that cluster without HBase cluster.
      • (Future use case) Combine snapshot data with online hbase data: Scan from yesterday's snapshot, but read today's data from online hbase cluster.
      1. hbase-8369_v0.patch
        73 kB
        Enis Soztutar
      2. hbase-8369_v11.patch
        152 kB
        Enis Soztutar
      3. hbase-8369_v5.patch
        160 kB
        Enis Soztutar
      4. hbase-8369_v6.patch
        148 kB
        Enis Soztutar
      5. hbase-8369_v7.patch
        151 kB
        Enis Soztutar
      6. hbase-8369_v8.patch
        151 kB
        Enis Soztutar
      7. hbase-8369_v9.patch
        150 kB
        Enis Soztutar
      8. HBASE-8369-0.94_v2.patch
        24 kB
        Bryan Keller
      9. HBASE-8369-0.94_v3.patch
        24 kB
        Bryan Keller
      10. HBASE-8369-0.94_v4.patch
        24 kB
        Bryan Keller
      11. HBASE-8369-0.94_v5.patch
        24 kB
        Bryan Keller
      12. HBASE-8369-0.94.patch
        23 kB
        Bryan Keller
      13. HBASE-8369-trunk_v1.patch
        24 kB
        Bryan Keller
      14. HBASE-8369-trunk_v2.patch
        24 kB
        Bryan Keller
      15. HBASE-8369-trunk_v3.patch
        24 kB
        Bryan Keller

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Lars Hofhansl made changes -
          Link This issue relates to HBASE-10076 [ HBASE-10076 ]
          Enis Soztutar made changes -
          Attachment hbase-8369_v11.patch [ 12614550 ]
          Enis Soztutar made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Enis Soztutar made changes -
          Attachment hbase-8369_v9.patch [ 12613728 ]
          Enis Soztutar made changes -
          Attachment hbase-8369_v8.patch [ 12613507 ]
          Enis Soztutar made changes -
          Attachment hbase-8369_v7.patch [ 12612097 ]
          Enis Soztutar made changes -
          Release Note Added TableSnapshotInputFormat and TableSnapshotScanner for performing scans over hbase table snapshots from the client side, bypassing the hbase servers. The former configures a mapreduce job, while the latter does single client side scan over snapshot files. Can also be used with offline HBase with in-place or exported snapshot files.

          WARNING: This feature bypasses HBase-level security completely since the files are read from the hdfs directly. The user who is running the scan / job has to have read permissions to the data files and snapshot files.

          Enis Soztutar made changes -
          Attachment hbase-8369_v6.patch [ 12611171 ]
          Enis Soztutar made changes -
          Attachment hbase-8369_v5.patch [ 12610963 ]
          stack made changes -
          Fix Version/s 0.96.0 [ 12324822 ]
          stack made changes -
          Fix Version/s 0.96.0 [ 12324822 ]
          Fix Version/s 0.95.2 [ 12320040 ]
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94_v5.patch [ 12590699 ]
          Attachment HBASE-8369-trunk_v3.patch [ 12590700 ]
          Bryan Keller made changes -
          Attachment HBASE-8369-trunk_v2.patch [ 12590450 ]
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94_v4.patch [ 12590449 ]
          Bryan Keller made changes -
          Attachment HBASE-8369-trunk_v1.patch [ 12590328 ]
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94_v3.patch [ 12590324 ]
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94_v2.patch [ 12590321 ]
          Bryan Keller made changes -
          Affects Version/s 0.94.8 [ 12324145 ]
          Bryan Keller made changes -
          Release Note Snapshot map-reduce scan functionality for 0.94. This is new functionality only and should not affect other
          Bryan Keller made changes -
          Labels newbie
          Bryan Keller made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Snapshot map-reduce scan functionality for 0.94. This is new functionality only and should not affect other
          Affects Version/s 0.94.8 [ 12324145 ]
          Labels newbie
          Bryan Keller made changes -
          Attachment HBASE-8369-0.94.patch [ 12590305 ]
          Enis Soztutar made changes -
          Field Original Value New Value
          Attachment hbase-8369_v0.patch [ 12579216 ]
          Enis Soztutar created issue -

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Enis Soztutar
            • Votes:
              2 Vote for this issue
              Watchers:
              37 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development