Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
-
None
Description
Right now the parallel scaling is defined by a constant (I think 32) that defines the number of threads/splits that can drive a single query.
This number might be too large for a small cluster and too small for a large cluster; and this value should change as a cluster grows.
One idea is to instead have a "scaling number". This would be a floating point number define the the number of threads to use per involved RegionServer.
Say a query touches 10 RegionServers, than a scaling factor
- of 1.0 would mean 10 threads
- 0.1 means 1 thread
- 10.0 means 100 thread
- etc
That way one can define the cost of a query in terms of cluster resources.