Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Implemented
-
None
-
None
-
None
-
None
Description
We have a >1000 node physical cluster at our disposal for a short time, before it'll be handed off to its intended use.
Loaded a bunch of data (TPCs LINEITEM table, among others) and ran a bunch of queries. Most tables are between 100G and 500G (uncompressed) and between 600m and 2bn rows.
The good news is that many things just worked. We sorted > 400G is < 5s with HBase and Phoenix. Scans work. Joins work (as long as one side is kept under 1m rows or so).
For the issues we observers I'll file sub jiras under this.
I'm going to write a lob post about this and attach a link here.
Attachments
1.
|
Distinct Queries are slower than expected at scale. | Resolved | Unassigned | |
2.
|
Phoenix should have an option to fail a query that would ship large amounts of data to the client. | Resolved | Unassigned | |
3.
|
OFFSET is very slow at scale | Resolved | Unassigned | |
4.
|
Index tables should not be configured with a custom/smaller MAX_FILESIZE | Closed | Lars Hofhansl | |
5.
|
Support approximate COUNT(*) by using stats. | Resolved | Unassigned |