[FLINK-2188] Reading from big HBase Tables - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Not A Problem
Affects Version/s: None
Fix Version/s: 0.9
Component/s: None
Labels:
None

Description

I detected a bug in the reading from a big Hbase Table.

I used a cluster of 13 machines with 13 processing slots for each machine which results in a total number of processing slots of 169. Further, our cluster uses cdh5.4.1 and the HBase version is 1.0.0-cdh5.4.1. There is a Hbase Table with nearly 100. mio rows. I used Spark and Hive to count the number of rows and both results are identical (nearly 100 mio.).
Then, I used Flink to count the number of rows. For that I added the hbase-client 1.0.0-cdh5.4.1 Java API as dependency in maven and excluded the other hbase-client dependencies. The result of the job is nearly 102 mio. , 2 mio. rows more than the result of Spark and Hive. Moreover, I run the Flink job multiple times and sometimes the result fluctuates by +-5.

Attachments

flinkTest.zip
09/Jun/15 12:15
13 kB
Hilmi Yildirim

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Hilmi Yildirim

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Jun/15 09:39

Updated:: 12/Jun/15 09:31

Resolved:: 12/Jun/15 09:31

Agile

View on Board

Reading from big HBase Tables

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment