Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-914

Native HBase timestamp support to optimize date range queries in Phoenix

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.6.0
    • None
    • None

    Description

      For many applications one of the column of a table can be (and must be) naturally mapped

      to HBase timestamp. What it gives us is the optimization on StoreScanner where HFiles with timestamps out of range of

      a Scan operator will be omitted. Let us say that we have time-series type of data (EVENTS) and custom compaction, where we create
      series of HFiles with continuous non-overlapping timestamp ranges.

      CREATE TABLE IF NOT EXISTS ODS.EVENTS (
      METRICID VARCHAR NOT NULL,
      METRICNAME VARCHAR,
      SERVICENAME VARCHAR NOT NULL,
      ORIGIN VARCHAR NOT NULL,
      APPID VARCHAR,
      IPID VARCHAR,
      NVALUE DOUBLE,
      TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/,
      DATA VARCHAR,
      SVALUE VARCHAR
      CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID, TIME)
      ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';

      Make note on TIME TIMESTAMP NOT NULL /+ TIMESTAMP +/ - this is the Hint to Phoenix that the column

      TIME must be mapped to HBase timestamp.

      The Query:

      Select all events of type 'X' for last 7 days

      SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() - 7*24*3600000; (this may be not correct SQL syntax of course)

      These types of queries will be efficiently optimized if:

      1. Phoenix maps TIME column to HBase timestamp

      2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan timerange

      Although this :

      Properties props = new Properties();
      props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
      Connection conn = DriverManager.connect(myUrl, props);

      conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
      conn.commit();

      will work in my case- it may not be efficient from performance point of view because for every INSERT/UPSERT
      new Connection object and new Statement is created, beside this we still need the optimization 2. (see above).

      Attachments

        1. wip.patch
          159 kB
          Samarth Jain
        2. PHOENIX-914.patch
          9 kB
          Jeffrey Zhong
        3. PHOENIX-914.patch
          213 kB
          Samarth Jain
        4. PHOENIX-914_v3.patch
          206 kB
          Samarth Jain
        5. PHOENIX-914_v2.patch
          201 kB
          Samarth Jain

        Issue Links

          Activity

            People

              samarthjain Samarth Jain
              vrodionov Vladimir Rodionov
              Votes:
              6 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: