Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-391

Change the default type of real values from FLOAT4 to FLOAT8 when parsing the user queries

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Function/UDF
    • Labels:
      None

      Description

      FLOAT4 is default parsed float type in tajo.
      but it can cause some trobles when we use it as FLOAT8.

      for examples.
      0.4(f) will changed to 0.40000000000005(d)

      so it cause different result from java.Math functions.

      1. TAJO-391.patch
        14 kB
        DaeMyung Kang

        Activity

        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-trunk-postcommit #594 (See https://builds.apache.org/job/Tajo-trunk-postcommit/594/)
        TAJO-391: Change the default type of real values from FLOAT4 to FLOAT8 when parsing the user queries. (DaeMyung Kang via jihoon) (jihoonson: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=b444061e2739f71de6d5ce39c2b8bc60b2fa0d9b)

        • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestInsertQuery.java
        • tajo-project/src/site/markdown/tajo-0.2.0-doc.md
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java
        • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/function/TestMathFunctions.java
        • CHANGES.txt
        • tajo-project/src/site/markdown/tajo-0.8.0-doc.md
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/benchmark/TPCH.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-trunk-postcommit #594 (See https://builds.apache.org/job/Tajo-trunk-postcommit/594/ ) TAJO-391 : Change the default type of real values from FLOAT4 to FLOAT8 when parsing the user queries. (DaeMyung Kang via jihoon) (jihoonson: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=b444061e2739f71de6d5ce39c2b8bc60b2fa0d9b ) tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestInsertQuery.java tajo-project/src/site/markdown/tajo-0.2.0-doc.md tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/function/TestMathFunctions.java CHANGES.txt tajo-project/src/site/markdown/tajo-0.8.0-doc.md tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/benchmark/TPCH.java
        Hide
        hyunsik Hyunsik Choi added a comment -

        Thanks, all guys.

        Show
        hyunsik Hyunsik Choi added a comment - Thanks, all guys.
        Hide
        jihoonson Jihoon Son added a comment -

        I've just committed the patch.
        DaeMyung, really thanks for your contribution!

        Show
        jihoonson Jihoon Son added a comment - I've just committed the patch. DaeMyung, really thanks for your contribution!
        Hide
        jihoonson Jihoon Son added a comment -

        I changed the issue title to represent the changes in this issue more clearly.

        Show
        jihoonson Jihoon Son added a comment - I changed the issue title to represent the changes in this issue more clearly.
        Hide
        jihoonson Jihoon Son added a comment -

        DaeMyung, thanks for your additional explanation.
        I finally understand your point.
        Sorry for my misunderstanding.
        I'll commit this patch.

        Show
        jihoonson Jihoon Son added a comment - DaeMyung, thanks for your additional explanation. I finally understand your point. Sorry for my misunderstanding. I'll commit this patch.
        Hide
        charsyam DaeMyung Kang added a comment -

        Jihoon Son
        Actually I didn't mean "every real value should be float8". I am sorry to make you confused.

        I investigated some other dbms's operations(mysql, postgresql)
        They parsed floating point type as double in query.((not mean select query from table.it is always determined)
        so. below example show the behaviors;

        #postgresql
        create table tbl_test(a real);
        insert into tbl_test values(0.4);
        select cbrt(a) from tbl_test1;
        ===> 0.736806303387834

        create table tbl_test2(a float);
        insert into tbl_test2 values(0.4);
        select cbrt(a) from tbl_test2;
        ===> 0.736806299728077

        select cbrt(0.4);
        ===> 0.736806299728077

        but tajo just parse floating point type as float4.
        so.it always causes upper casting to Float8 in some functions.

        I want to point out this Thank you for advice

        Show
        charsyam DaeMyung Kang added a comment - Jihoon Son Actually I didn't mean "every real value should be float8". I am sorry to make you confused. I investigated some other dbms's operations(mysql, postgresql) They parsed floating point type as double in query.((not mean select query from table.it is always determined) so. below example show the behaviors; #postgresql create table tbl_test(a real); insert into tbl_test values(0.4); select cbrt(a) from tbl_test1; ===> 0.736806303387834 create table tbl_test2(a float); insert into tbl_test2 values(0.4); select cbrt(a) from tbl_test2; ===> 0.736806299728077 select cbrt(0.4); ===> 0.736806299728077 but tajo just parse floating point type as float4. so.it always causes upper casting to Float8 in some functions. I want to point out this Thank you for advice
        Hide
        jihoonson Jihoon Son added a comment - - edited

        Basically, I agree with changing the default parsed type of real values from float4 to float8, because float8 covers a wider range of values. Also, the attached patch looks good.
        But, this issue seems to propose that every real value should be float8 for the more precise math function results.
        Hyunsik said that the precision lose problem is occurred only when the constant values are used as function parameters, but, in my opinion, it is hard to guess the reason.

        Currently, the results of math functions are tested by comparing with the exact values of them.
        If the precision problem of the real numbers bothers the testing math functions, it is better to use assertEquals(double expected, double actual, double delta) instead of assertEquals(double expected, double actual).

        Show
        jihoonson Jihoon Son added a comment - - edited Basically, I agree with changing the default parsed type of real values from float4 to float8, because float8 covers a wider range of values. Also, the attached patch looks good. But, this issue seems to propose that every real value should be float8 for the more precise math function results. Hyunsik said that the precision lose problem is occurred only when the constant values are used as function parameters, but, in my opinion, it is hard to guess the reason. Currently, the results of math functions are tested by comparing with the exact values of them. If the precision problem of the real numbers bothers the testing math functions, it is better to use assertEquals(double expected, double actual, double delta) instead of assertEquals(double expected, double actual).
        Hide
        jihoonson Jihoon Son added a comment -

        Right.
        My above comment means that supporting more precise types is required to get the exact results.
        Thus this is just a temporal solution to get more precise results.
        However, I can't understand why the problem is occurred for only when the function parameters are the constant values.
        The internal representations are same for whether the values are from user inputs or stored data.

        Show
        jihoonson Jihoon Son added a comment - Right. My above comment means that supporting more precise types is required to get the exact results. Thus this is just a temporal solution to get more precise results. However, I can't understand why the problem is occurred for only when the function parameters are the constant values. The internal representations are same for whether the values are from user inputs or stored data.
        Hide
        hyunsik Hyunsik Choi added a comment - - edited

        To deal with exact real numbers throughout query processing, Tajo has to support numeric type and lots of functions that takes numeric types as parameters. However, currently, Tajo does not have them. Even though Tajo uses more precise types (e.g., BigDecimal), it will lost precision during passing functions and operations. I think that, in this time, parsing real values via FLOAT8 is a realistic solution because FLOAT8 is the most precise real value type in Tajo.

        Show
        hyunsik Hyunsik Choi added a comment - - edited To deal with exact real numbers throughout query processing, Tajo has to support numeric type and lots of functions that takes numeric types as parameters. However, currently, Tajo does not have them. Even though Tajo uses more precise types (e.g., BigDecimal), it will lost precision during passing functions and operations. I think that, in this time, parsing real values via FLOAT8 is a realistic solution because FLOAT8 is the most precise real value type in Tajo.
        Hide
        jihoonson Jihoon Son added a comment -

        Wait.
        My point is that the same problem exists for float8.
        Is this solution just a temporal solution?

        Show
        jihoonson Jihoon Son added a comment - Wait. My point is that the same problem exists for float8. Is this solution just a temporal solution?
        Hide
        hyunsik Hyunsik Choi added a comment -

        +1

        If there is no objection, I'll commit it after for a while.

        Show
        hyunsik Hyunsik Choi added a comment - +1 If there is no objection, I'll commit it after for a while.
        Hide
        hyunsik Hyunsik Choi added a comment -

        This proposal is reasonable for me. I have also experienced the problem caused by downcast. The function type system works well when column references are passed to functions. However, it is likely to lose precision when constant values are used as function parameters. This is because real values are parsed via FLOAT4. This change only affects function call with constant parameters, so it does not cause performance degrade.

        Show
        hyunsik Hyunsik Choi added a comment - This proposal is reasonable for me. I have also experienced the problem caused by downcast. The function type system works well when column references are passed to functions. However, it is likely to lose precision when constant values are used as function parameters. This is because real values are parsed via FLOAT4. This change only affects function call with constant parameters, so it does not cause performance degrade.
        Hide
        jihoonson Jihoon Son added a comment -

        Right, but the precision problem is also occurred when we use float8.
        To store the exact floating point value, we should use other types such as BigDecimal which significantly degrade the performance.

        If we have to get the exact results, we can accept those overheads.
        However, the difference of sine values from float4 and float8 is very small, and I think that it can be currently ignored.

        Show
        jihoonson Jihoon Son added a comment - Right, but the precision problem is also occurred when we use float8. To store the exact floating point value, we should use other types such as BigDecimal which significantly degrade the performance. If we have to get the exact results, we can accept those overheads. However, the difference of sine values from float4 and float8 is very small, and I think that it can be currently ignored.
        Hide
        charsyam DaeMyung Kang added a comment -

        Jihoon Son Thank you for your advice. Yes. you are right. bug in Floating point., even though we use type casting, it can cause precision problem
        when casting small size to large size.

        so next code doesn't work with our expect.

        public static void main(String[] args) {
        float f = 0.4f;
        double d = 0.4;
        System.out.println((f));
        System.out.println(((double)f));
        System.out.println((d));
        System.out.println(((float)d));
        System.out.println(Math.sin(f));
        System.out.println(Math.sin((double)f));
        System.out.println(Math.sin(d));
        System.out.println(Math.sin((float)d));
        }

        0.4
        0.4000000059604645(only case when casting small size to large size)
        0.4
        0.4
        0.3894183477986018
        0.3894183477986018
        0.3894183423086505
        0.3894183477986018

        and currently tajo just parse FLOAT4 only in shell.
        Thank you.

        Show
        charsyam DaeMyung Kang added a comment - Jihoon Son Thank you for your advice. Yes. you are right. bug in Floating point., even though we use type casting, it can cause precision problem when casting small size to large size. so next code doesn't work with our expect. public static void main(String[] args) { float f = 0.4f; double d = 0.4; System.out.println((f)); System.out.println(((double)f)); System.out.println((d)); System.out.println(((float)d)); System.out.println(Math.sin(f)); System.out.println(Math.sin((double)f)); System.out.println(Math.sin(d)); System.out.println(Math.sin((float)d)); } 0.4 0.4000000059604645(only case when casting small size to large size) 0.4 0.4 0.3894183477986018 0.3894183477986018 0.3894183423086505 0.3894183477986018 and currently tajo just parse FLOAT4 only in shell. Thank you.
        Hide
        jihoonson Jihoon Son added a comment -

        I think that this is not a problem.
        You can use the type casting to get the right results from math functions.

        Show
        jihoonson Jihoon Son added a comment - I think that this is not a problem. You can use the type casting to get the right results from math functions.

          People

          • Assignee:
            charsyam DaeMyung Kang
            Reporter:
            charsyam DaeMyung Kang
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development