Details

    • Type: New Feature
    • Status: Open
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • old issue number:
      23

      Description

      Support the WINDOW, PARTITION OVER, GROUPING, RANK, DENSE RANK, ORDER BY etc. functionality.

        Activity

        Hide
        RCheungIT Haoran Zhang added a comment - - edited

        Dear James,

        I'm Haoran Zhang, a Msc(Computing) student at Imperial College.
        I had worked for about three years at Alibaba Group in the area of distributed computing.
        Here is my cv(www.doc.ic.ac.uk/~hz114/haoran_cv.pdf). Hopefully, this can make you know me better.

        I'm very keen to participate Google Summer of Code 2016 and this issue is very interesting to me.

        About this issue, I have a question. OLAP extensions are a big family and it is quite impossible to support all of them in 12 weeks by one person. Moreover, some OLAP functions may not suitable for Phoenix, as they may lead to numerous data translation that may block the network and reduce the throughput.
        I want to know could the student who is assigned to this issue only choose several OLAP functions to support?

        In addition, would you mind giving any suggestions or materials for me to start getting familiar with this issue to conduct a formal proposal?

        Thank you very much.

        Haoran Zhang

        Show
        RCheungIT Haoran Zhang added a comment - - edited Dear James, I'm Haoran Zhang, a Msc(Computing) student at Imperial College. I had worked for about three years at Alibaba Group in the area of distributed computing. Here is my cv(www.doc.ic.ac.uk/~hz114/haoran_cv.pdf). Hopefully, this can make you know me better. I'm very keen to participate Google Summer of Code 2016 and this issue is very interesting to me. About this issue, I have a question. OLAP extensions are a big family and it is quite impossible to support all of them in 12 weeks by one person. Moreover, some OLAP functions may not suitable for Phoenix, as they may lead to numerous data translation that may block the network and reduce the throughput. I want to know could the student who is assigned to this issue only choose several OLAP functions to support? In addition, would you mind giving any suggestions or materials for me to start getting familiar with this issue to conduct a formal proposal? Thank you very much. Haoran Zhang
        Hide
        jamestaylor James Taylor added a comment - - edited

        Glad to hear you're interested, Haoran Zhang. You have some impressive and relevant experience. I completely agree that the scope of this JIRA is too large to complete for a single person over a single GSoC term. I think the first step would be to file sub tasks that break this down into more manageable parts with the fundamental core pieces being developed first.

        I'd recommend doing this work on our calcite branch [1]. This branch will become our master branch as soon as the integration work is complete [2]. Maryann Xue is leading this effort, so she'll likely have information to add here. Apache Calcite [3] already supports the parsing and planning of the OLAP extensions, producing relational algebra. The core work is already done to have Phoenix work on top of calcite in terms of querying, so these OLAP extensions would fit within that framework and the work would focus on the actual runtime pieces and pushing as much of the work into the cluster as possible. There's an interesting discussion on a good initial use case here: PHOENIX-2700. One of the important core pieces IMO is pushing the sliding window building into the HBase region servers through our coprocessors but still handling the case where a the logical boundary of a window spans across our scan or region boundaries (i.e. dealing with combining the windows that are being populated on different servers without having to pull all the row-level information to the client).

        [1] https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=shortlog;h=refs/heads/calcite
        [2] https://issues.apache.org/jira/issues/?filter=12334819
        [3] https://calcite.apache.org/

        Show
        jamestaylor James Taylor added a comment - - edited Glad to hear you're interested, Haoran Zhang . You have some impressive and relevant experience. I completely agree that the scope of this JIRA is too large to complete for a single person over a single GSoC term. I think the first step would be to file sub tasks that break this down into more manageable parts with the fundamental core pieces being developed first. I'd recommend doing this work on our calcite branch [1] . This branch will become our master branch as soon as the integration work is complete [2] . Maryann Xue is leading this effort, so she'll likely have information to add here. Apache Calcite [3] already supports the parsing and planning of the OLAP extensions, producing relational algebra. The core work is already done to have Phoenix work on top of calcite in terms of querying, so these OLAP extensions would fit within that framework and the work would focus on the actual runtime pieces and pushing as much of the work into the cluster as possible. There's an interesting discussion on a good initial use case here: PHOENIX-2700 . One of the important core pieces IMO is pushing the sliding window building into the HBase region servers through our coprocessors but still handling the case where a the logical boundary of a window spans across our scan or region boundaries (i.e. dealing with combining the windows that are being populated on different servers without having to pull all the row-level information to the client). [1] https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=shortlog;h=refs/heads/calcite [2] https://issues.apache.org/jira/issues/?filter=12334819 [3] https://calcite.apache.org/
        Hide
        RCheungIT Haoran Zhang added a comment - - edited

        Thanks for your advice James Taylor.

        By reading the material you provides, I have a better understand about this issue and also have already generated an initial plan.
        However, there are still several points that I'm quite confusing.

        1. I plan to draft a proposal which will implement the window functions[1] when the window is in the format of [ PARTITION BY expression [, expression ]* ] for Apache Phenix. In other words, it adds support for the keyword: PARTITION BY. I want to know whether the workload is enough for a GSOC term.

        2. About this issue PHOENIX-2700 , I notice a suitable solution is to implement the sliding window which can improve the performance by reducing unnecessary data translation. However, in my opinion, it only works when child query exists especially when the child query is OLAP query. For example, if we have a sample query like

        SELECT country_name, 
               state_name, 
               county_name, 
               Sum(population) 
                 OVER ( 
                   PARTITION BY country_name) AS country_population, 
               Sum(population) 
                 OVER ( 
                   PARTITION BY state_name)   AS state_population, 
               Sum(population) 
                 OVER ( 
                   PARTITION BY county_name ) AS county_population 
        

        In this case, I think the sliding window may not benefit the performance. The sliding window is not the basis of window functions, but the improvement.
        Is that right?

        3. When we have an SQL contains 'PARTITION BY partition_key', I think we should guarantee each partion_key only spread in only one region server, otherwise, the situation could be quite tricky. Nonetheless, I can't find an appropriate way to guarantee it. If we have a restriction in DDL it is not a universal solution. If we just throw an exception, it is not user-friendly. Would you mind giving me any suggestions?

        Thanks

        [1] https://calcite.apache.org/docs/reference.html#window-functions

        Show
        RCheungIT Haoran Zhang added a comment - - edited Thanks for your advice James Taylor . By reading the material you provides, I have a better understand about this issue and also have already generated an initial plan. However, there are still several points that I'm quite confusing. 1. I plan to draft a proposal which will implement the window functions [1] when the window is in the format of [ PARTITION BY expression [, expression ] * ] for Apache Phenix. In other words, it adds support for the keyword: PARTITION BY. I want to know whether the workload is enough for a GSOC term. 2. About this issue PHOENIX-2700 , I notice a suitable solution is to implement the sliding window which can improve the performance by reducing unnecessary data translation. However, in my opinion, it only works when child query exists especially when the child query is OLAP query. For example, if we have a sample query like SELECT country_name, state_name, county_name, Sum(population) OVER ( PARTITION BY country_name) AS country_population, Sum(population) OVER ( PARTITION BY state_name) AS state_population, Sum(population) OVER ( PARTITION BY county_name ) AS county_population In this case, I think the sliding window may not benefit the performance. The sliding window is not the basis of window functions, but the improvement. Is that right? 3. When we have an SQL contains 'PARTITION BY partition_key', I think we should guarantee each partion_key only spread in only one region server, otherwise, the situation could be quite tricky. Nonetheless, I can't find an appropriate way to guarantee it. If we have a restriction in DDL it is not a universal solution. If we just throw an exception, it is not user-friendly. Would you mind giving me any suggestions? Thanks [1] https://calcite.apache.org/docs/reference.html#window-functions
        Hide
        jamestaylor James Taylor added a comment -

        For (1), you need to confirm whether or not Apache Calcite supports the syntax you're proposing as that's a prerequisite. If they don't, then you need to work with the syntax they have.

        I'm not sure I follow (2). Assuming the rows are ordered by the partition key, why wouldn't sliding windows help? Probably best to comment directly on PHOENIX-2700 and write the query you think will express what we want to do (and we can further discuss there).

        For (3), no, it'd need to work across the multiple region servers (that's kind of the point). You can do work on the client as well (that's how we support aggregation).

        Show
        jamestaylor James Taylor added a comment - For (1), you need to confirm whether or not Apache Calcite supports the syntax you're proposing as that's a prerequisite. If they don't, then you need to work with the syntax they have. I'm not sure I follow (2). Assuming the rows are ordered by the partition key, why wouldn't sliding windows help? Probably best to comment directly on PHOENIX-2700 and write the query you think will express what we want to do (and we can further discuss there). For (3), no, it'd need to work across the multiple region servers (that's kind of the point). You can do work on the client as well (that's how we support aggregation).
        Hide
        maryannxue Maryann Xue added a comment - - edited

        Calcite has full support for the window function OVER clause as illustrated below, so I suppose we don't need to anything in calcite for now.

        <window function> OVER
         ([PARTITION BY <expression list>]
         [ORDER BY <expression [ASC|DESC] list>]
         [ROWS|RANGE <window frame>])
        

        Still, we have to implement the runtime in Phoenix and the translation of Phoenix window function operator for the integration part, which is a considerable amount of work I think. Right now the difficulties include:
        1) Handle the boundaries of sliding windows if the table is ordered on the partition key. Not sure if we'll have a solution that do not have to return all row-level information to the client for region boundary rows.
        2) When the table is not ordered on the partition key. In this case, I think it will basically be a server-side aggregation (what we do for GROUP BY right now) plus a pure client side window building.
        A window with only ORDER BY key and no PARTITION key will be similar to case 2) for unordered case and will be somewhat straightforward for ordered case.
        3) Multiple windows. Things can get much more complicated with the following example (copied from calcite test case), thus I suggest we leave it for future improvement.

                select
                     "deptno",
                     "empid",
                     sum("salary" + "empid") over w as s,
                     min("salary") over w as m,
                     count(*) over w as c,
                     count(*) over w2 as c2,
                     count(*) over w11 as c11,
                     count(*) over w11dept as c11dept
                 from "hr"."emps"
                     window w as (order by "empid" rows 1 preceding),
                     w2 as (order by "empid" rows 2 preceding),
                     w11 as (order by "empid" rows between 1 preceding and 1 following),
                     w11dept as (partition by "deptno" order by "empid" rows between 1 preceding and 1 following)
        

        Think it might make sense to start with the simplest case: sliding window with ORDER BY key and ROWS/RANGE only, so that we can get all the basic facilities for sliding windows ready before we move onto more difficult problems with PARTITION. What do you think, James Taylor and Haoran Zhang ?

        Show
        maryannxue Maryann Xue added a comment - - edited Calcite has full support for the window function OVER clause as illustrated below, so I suppose we don't need to anything in calcite for now. <window function> OVER ([PARTITION BY <expression list>] [ORDER BY <expression [ASC|DESC] list>] [ROWS|RANGE <window frame>]) Still, we have to implement the runtime in Phoenix and the translation of Phoenix window function operator for the integration part, which is a considerable amount of work I think. Right now the difficulties include: 1) Handle the boundaries of sliding windows if the table is ordered on the partition key. Not sure if we'll have a solution that do not have to return all row-level information to the client for region boundary rows. 2) When the table is not ordered on the partition key. In this case, I think it will basically be a server-side aggregation (what we do for GROUP BY right now) plus a pure client side window building. A window with only ORDER BY key and no PARTITION key will be similar to case 2) for unordered case and will be somewhat straightforward for ordered case. 3) Multiple windows. Things can get much more complicated with the following example (copied from calcite test case), thus I suggest we leave it for future improvement. select "deptno" , "empid" , sum( "salary" + "empid" ) over w as s, min( "salary" ) over w as m, count(*) over w as c, count(*) over w2 as c2, count(*) over w11 as c11, count(*) over w11dept as c11dept from "hr" . "emps" window w as (order by "empid" rows 1 preceding), w2 as (order by "empid" rows 2 preceding), w11 as (order by "empid" rows between 1 preceding and 1 following), w11dept as (partition by "deptno" order by "empid" rows between 1 preceding and 1 following) Think it might make sense to start with the simplest case: sliding window with ORDER BY key and ROWS/RANGE only, so that we can get all the basic facilities for sliding windows ready before we move onto more difficult problems with PARTITION. What do you think, James Taylor and Haoran Zhang ?
        Hide
        jamestaylor James Taylor added a comment -

        I agree, Maryann Xue. Do you think this would make a good GSoC project or would it require too much in depth Phoenix and Calcite knowledge?

        Show
        jamestaylor James Taylor added a comment - I agree, Maryann Xue . Do you think this would make a good GSoC project or would it require too much in depth Phoenix and Calcite knowledge?
        Hide
        maryannxue Maryann Xue added a comment -

        I think it would. We can even divide the work into two parts:
        1) Runtime of sliding windows with ORDER BY key and ROWS/RANGE.
        2) Integration with Calcite-Phoenix (the translation of calcite Window operator into Phoenix runtime).
        We do the same thing with implementation of nested-loop join in Phoenix as well, as you can see we have a CorrelatePlan and then we have standalone tests to check this runtime implementation. Besides, we have the compiler implementation for the PhoenixCorrelate operator, and altogether we can run SQL queries in our end-to-end tests.
        Part 2 will be rather simple once part 1 is ready, and it does not require much knowledge of Calcite either. Simply think of it as writing compiler code in standalone Phoenix but without even having to worry about how it interacts with the compiler framework. The workload of this part is more about writing end-to-end test cases.

        Show
        maryannxue Maryann Xue added a comment - I think it would. We can even divide the work into two parts: 1) Runtime of sliding windows with ORDER BY key and ROWS/RANGE. 2) Integration with Calcite-Phoenix (the translation of calcite Window operator into Phoenix runtime). We do the same thing with implementation of nested-loop join in Phoenix as well, as you can see we have a CorrelatePlan and then we have standalone tests to check this runtime implementation. Besides, we have the compiler implementation for the PhoenixCorrelate operator, and altogether we can run SQL queries in our end-to-end tests. Part 2 will be rather simple once part 1 is ready, and it does not require much knowledge of Calcite either. Simply think of it as writing compiler code in standalone Phoenix but without even having to worry about how it interacts with the compiler framework. The workload of this part is more about writing end-to-end test cases.
        Hide
        maryannxue Maryann Xue added a comment -

        What would be useful here is to make the sliding window an independent module so that it can ported on PARTITION BY easily, both for client-side use and for server-side use.
        To achieve this, you may start with ORDER BY when table is unordered to figure out the basics for implementing sliding windows (since it will be purely client-side), then after that you can move onto ORDER BY when table is ordered and see if your original sliding window module can pushed to the server side and see how you'd deal with region boundary rows. This might not give you significant performance improvement in the ORDER BY case alone (without PARTITION BY), but it would definitely be very helpful for the PARTITION BY case (when ordered).

        Show
        maryannxue Maryann Xue added a comment - What would be useful here is to make the sliding window an independent module so that it can ported on PARTITION BY easily, both for client-side use and for server-side use. To achieve this, you may start with ORDER BY when table is unordered to figure out the basics for implementing sliding windows (since it will be purely client-side), then after that you can move onto ORDER BY when table is ordered and see if your original sliding window module can pushed to the server side and see how you'd deal with region boundary rows. This might not give you significant performance improvement in the ORDER BY case alone (without PARTITION BY), but it would definitely be very helpful for the PARTITION BY case (when ordered).
        Hide
        RCheungIT Haoran Zhang added a comment -

        Dear James Taylor and Maryann Xue,

        Thanks for both of you.
        I'm very interested in the part: Runtime of sliding windows with ORDER BY key and ROWS/RANGE.

        However, although a clear roadmap is provided, I‘m still confused on details.
        I think it's that's because I lack the knowledge about how to implement sliding window and window functions.

        Would you mind recommending me some further reading about them?
        I tried to search them on the Internet. Nevertheless, there are several meanings of sliding window even in the area of database and all the textbooks only introduce how to use window functions.

        Thank you very much.

        Show
        RCheungIT Haoran Zhang added a comment - Dear James Taylor and Maryann Xue , Thanks for both of you. I'm very interested in the part: Runtime of sliding windows with ORDER BY key and ROWS/RANGE. However, although a clear roadmap is provided, I‘m still confused on details. I think it's that's because I lack the knowledge about how to implement sliding window and window functions. Would you mind recommending me some further reading about them? I tried to search them on the Internet. Nevertheless, there are several meanings of sliding window even in the area of database and all the textbooks only introduce how to use window functions. Thank you very much.
        Hide
        jamestaylor James Taylor added a comment -

        Thanks, Haoran Zhang. You've convinced us that the scope of this work is too big for GSoC. Please take a look at some of the other JIRAs marked for gsoc2016 and see if anything with a smaller scope that you can wrap your hands around catches your eye.

        Show
        jamestaylor James Taylor added a comment - Thanks, Haoran Zhang . You've convinced us that the scope of this work is too big for GSoC. Please take a look at some of the other JIRAs marked for gsoc2016 and see if anything with a smaller scope that you can wrap your hands around catches your eye.
        Hide
        RCheungIT Haoran Zhang added a comment -

        Dear James Taylor,

        Thanks for you advice. This issue is probably too difficult for me. However, I will keep watching this issue. Hopefully, when somebody else resolves it, I can make an effort to help him. See you in another JIRA.

        Show
        RCheungIT Haoran Zhang added a comment - Dear James Taylor , Thanks for you advice. This issue is probably too difficult for me. However, I will keep watching this issue. Hopefully, when somebody else resolves it, I can make an effort to help him. See you in another JIRA.

          People

          • Assignee:
            Unassigned
            Reporter:
            jamestaylor James Taylor
          • Votes:
            10 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

            • Created:
              Updated:

              Development