Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
Full outer join returns too many rows. The query
{
"head" : {
"type" : "apache_drill_logical_plan",
"version" : 1,
"generator" :
},
"storage" : [
,
{ "type" : "classpath", "name" : "donuts-json" } ],
"query" : [ {
"op" : "scan",
"@id" : 1,
"memo" : "initial_scan",
"storageengine" : "donuts-json",
"selection" :
,
"ref" : "_MAP"
}, {
"op" : "project",
"input" : 1,
"@id" : 2,
"projections" : [
]
}, {
"op" : "project",
"input" : 2,
"@id" : 10,
"projections" : [
]
}, {
"op" : "scan",
"@id" : 3,
"memo" : "initial_scan",
"storageengine" : "donuts-json",
"selection" :
,
"ref" : "_MAP"
}, {
"op" : "project",
"input" : 3,
"@id" : 4,
"projections" : [
]
}, {
"op": "join",
"left": 10,
"right": 4,
"@id" : 5,
"type": "outer",
"conditions": [
]
}, {
"op" : "store",
"input" : 5,
"@id" : 6,
"memo" : "output sink",
"target" :
,
"partition" : null,
"storageEngine" : "queue"
} ]
}
returns
{ "deptId" : 31, "deptId1" : 31} { "deptId" : 33, "deptId1" : 33} { "deptId" : 33, "deptId1" : 33} { "deptId" : 34, "deptId1" : 34} { "deptId" : 34, "deptId1" : 34} { "deptId1" : null} { "deptId" : 31} { "deptId" : 33} { "deptId" : 34} { "deptId" : 35}but I think it should return
{ "deptId" : 31, "deptId1" : 31} { "deptId" : 33, "deptId1" : 33} { "deptId" : 33, "deptId1" : 33} { "deptId" : 34, "deptId1" : 34} { "deptId" : 34, "deptId1" : 34} { "deptId1" : null} { "deptId" : 35}because only 35 on the left is unmatched on the right;
{31, 33, 34}have at least one match.
Also, it's cosmetic, but the "outer" join type should be called "full" because "left" is also a kind of outer join, at least in SQL parlance.