Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-1203

Uable to work for count (distinct col) queries via phoenix table with secondary indexes

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 4.2.0, 3.2.0
    • Component/s: None
    • Environment:

      hadoop-2.2.0 hbase: Version 0.98.3-hadoop2

      Description

      I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases
      And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project:

      0: jdbc:phoenix:zookeeper1> select count (distinct t.imsi) from ranapsignal t where t.pkttime>=1404964800000 and t.pkttime<=1404965699999 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
      -------------

      COUNT(IMSI)

      -------------

      2322

      -------------
      1 row selected (70.572 seconds)

      As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups.

      Here are some considerations:
      1. count (distinct col) query over index table did not work as expectation.
      2. only distinct query over index table works fine.
      3. If the phoenix version got some wrong configuration, correct me.

      Thanks and Best Regards,
      Sun
      -------------------------------
      Hi Sun,
      Thanks for the detailed description. Yes, your syntax is correct, and
      it's definitely true that the count distinct query should return the
      same result with and without the index. Would you mind trying this on
      our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
      file a JIRA?

      One thing that make make it easier for your testing: do you know about
      our NO_INDEX hint which forces the query not to use an index, like
      this:

      select /*+ NO_INDEX */ ...

      Another question too. What about this query with and with/out the index:

      select count from ranapsignal t
      where t.pkttime>=1404964800000 and t.pkttime<=1404965699999
      and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
      and t.ranapsubmsgtype=0
      group by t.imsi;

      Thanks,
      James

      On Thu, Aug 21, 2014 at 10:38 PM, sunfl@certusnet.com.cn
      <sunfl@certusnet.com.cn> wrote:
      >
      >
      > Hi James,
      >
      > Recently I got trouble while trying to conduct some query performance test
      > in my phoenix tables with secondary indexes.
      > I created a table called RANAPSIGNAL for my projects in phoenix via
      > sqlline client and load data into the table. Then I create
      > an index on the specific column PKTTIME for the table RANAPSIGNAL while
      > including other more columns for adjusting my
      > index query, like the following DDL:
      > create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
      > (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
      > The index creation worked successfully without any errors. So, when I am
      > trying to conduct such query as:
      > select count (distinct t.imsi) from ranapsignal t where
      > t.pkttime>=1404964800000 and t.pkttime<=1404965699999
      > and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
      > t.ranapsubmsgtype=0 ;
      > Without secondary indexes, the final result got 536 distinct imsi, wihch
      > is the right distinct count results. However, after I create the above
      > secondary index PKT_IDX and reconducting the above count (distinct imsi)
      > query, I got 2322 imsi rows which obviously are not the expected
      > distinct counts results. I used the explain grammar to observe the scan of
      > the above select query and found that it definitely scaned over
      > the index table PKT_IDX. I then tried to conduct the following query with
      > no count function:
      > select distinct t.imsi from ranapsignal t where
      > t.pkttime>=1404964800000 and t.pkttime<=1404965699999
      > and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
      > t.ranapsubmsgtype=0 ;
      > And the result is right 536 distinct imsi over scanning the index table.
      > By the way, imsi is one of the primary key when creating the table
      > RANAPSIGNAL.
      > Here are several considerations for my trouble and practice:
      > 1. Did you guys ever practice such count (distinct) queries over phoenix
      > table via secondary index?
      > 2. I am not sure whether this problem was due to the index table, but my
      > practice may assume that conclusion.
      > 3. Corrects my if I am wrong with my previous operations (index creation,
      > DDL grammar, etc..)
      > 4. Any available hints or reply are best appreciated.
      >
      > Thanks and Best Regards
      > Sun

        Attachments

          Activity

            People

            • Assignee:
              anoop.hbase Anoop Sam John
              Reporter:
              sunfl Sun Fulin

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment