[PHOENIX-1203] Uable to work for count (distinct col) queries via phoenix table with secondary indexes - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 4.2.0, 3.2.0
Component/s: None
Labels:
Environment:

hadoop-2.2.0 hbase: Version 0.98.3-hadoop2

Description

I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases
And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project:

0: jdbc:phoenix:zookeeper1> select count (distinct t.imsi) from ranapsignal t where t.pkttime>=1404964800000 and t.pkttime<=1404965699999 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
-------------

COUNT(IMSI)

-------------

2322

-------------
1 row selected (70.572 seconds)

As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups.

Here are some considerations:
1. count (distinct col) query over index table did not work as expectation.
2. only distinct query over index table works fine.
3. If the phoenix version got some wrong configuration, correct me.

Thanks and Best Regards,
Sun
-------------------------------
Hi Sun,
Thanks for the detailed description. Yes, your syntax is correct, and
it's definitely true that the count distinct query should return the
same result with and without the index. Would you mind trying this on
our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
file a JIRA?

One thing that make make it easier for your testing: do you know about
our NO_INDEX hint which forces the query not to use an index, like
this:

select /*+ NO_INDEX */ ...

Another question too. What about this query with and with/out the index:

select count from ranapsignal t
where t.pkttime>=1404964800000 and t.pkttime<=1404965699999
and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
and t.ranapsubmsgtype=0
group by t.imsi;

Thanks,
James

On Thu, Aug 21, 2014 at 10:38 PM, sunfl@certusnet.com.cn
<sunfl@certusnet.com.cn> wrote:
>
>
> Hi James,
>
> Recently I got trouble while trying to conduct some query performance test
> in my phoenix tables with secondary indexes.
> I created a table called RANAPSIGNAL for my projects in phoenix via
> sqlline client and load data into the table. Then I create
> an index on the specific column PKTTIME for the table RANAPSIGNAL while
> including other more columns for adjusting my
> index query, like the following DDL:
> create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
> (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
> The index creation worked successfully without any errors. So, when I am
> trying to conduct such query as:
> select count (distinct t.imsi) from ranapsignal t where
> t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
> t.ranapsubmsgtype=0 ;
> Without secondary indexes, the final result got 536 distinct imsi, wihch
> is the right distinct count results. However, after I create the above
> secondary index PKT_IDX and reconducting the above count (distinct imsi)
> query, I got 2322 imsi rows which obviously are not the expected
> distinct counts results. I used the explain grammar to observe the scan of
> the above select query and found that it definitely scaned over
> the index table PKT_IDX. I then tried to conduct the following query with
> no count function:
> select distinct t.imsi from ranapsignal t where
> t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
> t.ranapsubmsgtype=0 ;
> And the result is right 536 distinct imsi over scanning the index table.
> By the way, imsi is one of the primary key when creating the table
> RANAPSIGNAL.
> Here are several considerations for my trouble and practice:
> 1. Did you guys ever practice such count (distinct) queries over phoenix
> table via secondary index?
> 2. I am not sure whether this problem was due to the index table, but my
> practice may assume that conclusion.
> 3. Corrects my if I am wrong with my previous operations (index creation,
> DDL grammar, etc..)
> 4. Any available hints or reply are best appreciated.
>
> Thanks and Best Regards
> Sun

Uable to work for count (distinct col) queries via phoenix table with secondary indexes

Details

Description

Attachments

Attachments

Activity

People

Dates