Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2188

Reading from big HBase Tables

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • None
    • 0.9
    • None
    • None

    Description

      I detected a bug in the reading from a big Hbase Table.

      I used a cluster of 13 machines with 13 processing slots for each machine which results in a total number of processing slots of 169. Further, our cluster uses cdh5.4.1 and the HBase version is 1.0.0-cdh5.4.1. There is a Hbase Table with nearly 100. mio rows. I used Spark and Hive to count the number of rows and both results are identical (nearly 100 mio.).
      Then, I used Flink to count the number of rows. For that I added the hbase-client 1.0.0-cdh5.4.1 Java API as dependency in maven and excluded the other hbase-client dependencies. The result of the job is nearly 102 mio. , 2 mio. rows more than the result of Spark and Hive. Moreover, I run the Flink job multiple times and sometimes the result fluctuates by +-5.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            HilmiYildirim Hilmi Yildirim
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment