Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25510

Incorrect lineage for compare expressions in select statements

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • lineage
    • None

    Description

      Incorrect lineage is generated for the queries where compare expressions are present in select statements. For example:

      `Case-when` in select statement:

      Query: 

      select place, (case when city == "aa" then id else 0 end)/id from t1;
      

      Corresponding Lineage:

      {
        "edges": [
          {
            "sources": [
              2
            ],
            "targets": [
              0
            ],
            "edgeType": "PROJECTION"
          },
          {
            "sources": [
              3,
              4
            ],
            "targets": [
              1
            ],
            "expression": "(UDFToDouble(CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END) / UDFToDouble(t1.id))",
            "edgeType": "PROJECTION"
          }
        ],
        "vertices": [
          {
            "id": 0,
            "vertexType": "COLUMN",
            "vertexId": "place"
          },
          {
            "id": 1,
            "vertexType": "COLUMN",
            "vertexId": "_c1"
          },
          {
            "id": 2,
            "vertexType": "COLUMN",
            "vertexId": "default.t1.place"
          },
          {
            "id": 3,
            "vertexType": "COLUMN",
            "vertexId": "default.t1.city"
          },
          {
            "id": 4,
            "vertexType": "COLUMN",
            "vertexId": "default.t1.id"
          }
        ]
      }
      

      Expected Lineage:

      {
        "edges": [
          {
            "sources": [
              2
            ],
            "targets": [
              0
            ],
            "edgeType": "PROJECTION"
          },
          {
            "sources": [
              3
            ],
            "targets": [
              1
            ],
            "expression": "(UDFToDouble(CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END) / UDFToDouble(t1.id))",
            "edgeType": "PROJECTION"
          },
          {
            "sources": [
              4
            ],
            "targets": [
              1
            ],
            "expression": "CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END",
            "edgeType": "PREDICATE"
          }
        ],
        "vertices": [
          {
            "id": 0,
            "vertexType": "COLUMN",
            "vertexId": "place"
          },
          {
            "id": 1,
            "vertexType": "COLUMN",
            "vertexId": "_c1"
          },
          {
            "id": 2,
            "vertexType": "COLUMN",
            "vertexId": "default.t1.place"
          },
          {
            "id": 3,
            "vertexType": "COLUMN",
            "vertexId": "default.t1.id"
          },
          {
            "id": 4,
            "vertexType": "COLUMN",
            "vertexId": "default.t1.city"
          }
        ]
      }
      

       

      `IF` statement in select statement: 

      Query:

      select IF(city='aa',place,'FALSE') from t1;
      

      Corresponding lineage:

      {
        "edges": [
          {
            "sources": [
              1,
              2
            ],
            "targets": [
              0
            ],
            "expression": "if((UDFToString(t1.city) = 'aa'), t1.place, 'FALSE')",
            "edgeType": "PROJECTION"
          }
        ],
        "vertices": [
          {
            "id": 0,
            "vertexType": "COLUMN",
            "vertexId": "_c0"
          },
          {
            "id": 1,
            "vertexType": "COLUMN",
            "vertexId": "default.t1.city"
          },
          {
            "id": 2,
            "vertexType": "COLUMN",
            "vertexId": "default.t1.place"
          }
        ]
      }

      Expected Lineage:
      Projection edge for target `vertex 0` should have only `vertex 2` as source and there should be one predicate edge as well, where source would be `vertex 1` and target `vertex 0`. 

       
      The table under use above is: 

      select * from t1;

      t1.id  t1.place  t1.city 
      1      a         aa      
      2      b         bb      

      Attachments

        Activity

          People

            shivincible Shivangi
            shivincible Shivangi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: