Previously, I was involved as a technical expert into an in-memory database for on-line businesses in Alibaba group. That's an internal project, which can do group by aggregation on billions of rows in less than 1 second.
I'd like to apply this technology into tajo, make it much faster than it is. From some benchmark, we believe that spark&shark currently is the fastest solution among all the open source interactive query system , such as impala, presto, tajo. The main reason is that it benefit from in-memory data.
I will take memory cached table as my first step to accelerate query speed of tajo. Actually , this is the reason why I concerned at table partition during Xmas and new year holidays.
Will submit a proposal soon.
|Table partition catalog recap||Resolved|