Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.13.0
-
None
-
Ubuntu Mate 18.04
Apache Drill 1.14.0-SNAPSHOT
Description
Sample data: sample.json
Result of
select flatten(data) as d from dfs.root.`sample.json`
is
d |
---|
Honored Boy Scout |
Yawning Wolf |
Closed Queen |
Innocent Volunteer |
Junior Wing |
Lame Mantis |
Old Master |
Numb Pawn |
Hollow Guardian |
Twin Hurricane |
Helpless Avalange |
Let's try to get first 3 rows:
select flatten(data) as d from dfs.root.`sample.json` limit 3
Result has only 2 rows:
d |
---|
Honored Boy Scout |
Yawning Wolf |
Reason: Limit was pushed down below flatten and only 3 top rows from json was selected. In this 3 rows only 2nd has items in "data" field.
Let's try to get 3 rows from the middle:
select flatten(data) as d from dfs.root.`sample.json` limit 3 offset 5
Result is empty.
Reason: Limit and offset was pushed down below flatten and only 6, 7 and 8 row from json was selected. This 3 rows contains only 3 items in "data" field. After flatten limit and offset applies second time and reject all select items.
Error in org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
Attachments
Attachments
- sample.json
- 0.6 kB
- Oleg Zinoviev
Issue Links
Activity
GitHub user oleg-zinovev opened a pull request:
https://github.com/apache/drill/pull/1204
Test for DRILL-6099 was changed due it's incorrect behavior
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/oleg-zinovev/drill drill-6318
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1204.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1204
commit 5a6c96240d8e7d0409481f63036d63b572fe4d26
Author: Oleg <ozinoviev@...>
Date: 2018-04-10T08:26:25Z
Github user priteshm commented on the issue:
https://github.com/apache/drill/pull/1204
@ppadma can you review this?
Github user priteshm commented on the issue:
https://github.com/apache/drill/pull/1204
@arina-ielchiieva or @vdiravka can you review this?
Github user amansinha100 commented on the issue:
https://github.com/apache/drill/pull/1204
@oleg-zinovev good catch with the empty array input for Flatten. I had reviewed the original JIRA DRILL-6099 but did not consider the empty arrays. It would be good for @gparai to also take a look at this fix.
Github user gparai commented on a diff in the pull request:
https://github.com/apache/drill/pull/1204#discussion_r180538860
— Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java —
@@ -74,9 +74,12 @@ public boolean matches(RelOptRuleCall call) {
// mess up the schema since Convert_FromJson() is different from other regular functions in that it only knows
// the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have a way
// to know the output type.
+ // DRILL-6318:
— End diff –
Please remove JIRA ref here and below - others can get it via annotations.
Github user gparai commented on a diff in the pull request:
https://github.com/apache/drill/pull/1204#discussion_r180539313
— Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/flatten/TestFlattenPlanning.java —
@@ -66,8 +66,11 @@ public void testPushFilterPastProjectWithFlattenNeg() throws Exception {
@Test // DRILL-6099 : push limit past flatten(project)
public void testLimitPushdownPastFlatten() throws Exception {
final String query = "select rownum, flatten(complex) comp from cp.`store/json/test_flatten_mappify2.json` limit 1";
- final String[] expectedPatterns =
{".*Limit\\(fetch=\\[1\\]\\).*",".*Flatten.*",".*Limit\\(fetch=\\[1\\]\\).*"}
;
- final String[] excludedPatterns = null;
+ //DRILL-6318: limit should not push past flatten(project)
+ //P.S. Where was an error in this pattern. Even then Limit missing after Flatten it matches to plan-
- End diff –
-
Remove the P.S. and commented out pattern.
Github user gparai commented on a diff in the pull request:
https://github.com/apache/drill/pull/1204#discussion_r180539160
— Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java —
@@ -89,14 +92,17 @@ public void onMatch(RelOptRuleCall call) {
RelNode child = projectRel.getInput();
final RelNode limitUnderProject = limitRel.copy(limitRel.getTraitSet(), ImmutableList.of(child));
final RelNode newProject = projectRel.copy(projectRel.getTraitSet(), ImmutableList.of(limitUnderProject));
- if (DrillRelOptUtil.isProjectOutputRowcountUnknown(projectRel))
{
- //Preserve limit above the project since Flatten can produce more rows. Also mark it so we do not fire the rule again.
- final RelNode limitAboveProject = new DrillLimitRel(limitRel.getCluster(), limitRel.getTraitSet(), newProject,
- limitRel.getOffset(), limitRel.getFetch(), true);
- call.transformTo(limitAboveProject);
- }
else
{ - call.transformTo(newProject); - }+ call.transformTo(newProject);
+ //DRILL-6318:-
- End diff –
-
Cleanup commented out code.
Github user oleg-zinovev commented on the issue:
https://github.com/apache/drill/pull/1204
Reason for build error: "The job is the maximum time limit for jobs, and has been terminated."
What should I do next?
Github user kkhatua commented on the issue:
https://github.com/apache/drill/pull/1204
You could try re-pushing the last commit with a different commit ID (change a trailing text in the PR to have Git generate a different ID). I don't see an option within Travis to resubmit.
Github user arina-ielchiieva commented on the issue:
https://github.com/apache/drill/pull/1204
@oleg-zinovev could you please rebase to the latest master?
oleg-zinovev commented on issue #1204: DRILL-6318
URL: https://github.com/apache/drill/pull/1204#issuecomment-386322683
Rebased
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
asfgit closed pull request #1204: DRILL-6318
URL: https://github.com/apache/drill/pull/1204
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
index 2d33d3842a..79ba9b0ab6 100644
— a/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
+++ b/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java
@@ -67,9 +67,11 @@ public boolean matches(RelOptRuleCall call) {
// mess up the schema since Convert_FromJson() is different from other regular functions in that it only knows
// the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have a way
// to know the output type.
+ // Cannot pushdown limit and offset in to flatten as long as we don't know data distribution in flattened field
if (!limitRel.isPushDown() && (limitRel.getFetch() != null)
&& (!DrillRelOptUtil.isLimit0(limitRel.getFetch())
!DrillRelOptUtil.isProjectOutputSchemaUnknown(projectRel))) {
+!DrillRelOptUtil.isProjectOutputSchemaUnknown(projectRel))
+ && !DrillRelOptUtil.isProjectOutputRowcountUnknown(projectRel)) { return true; }return false;
@@ -82,14 +84,7 @@ public void onMatch(RelOptRuleCall call) {
RelNode child = projectRel.getInput();
final RelNode limitUnderProject = limitRel.copy(limitRel.getTraitSet(), ImmutableList.of(child));
final RelNode newProject = projectRel.copy(projectRel.getTraitSet(), ImmutableList.of(limitUnderProject));- if (DrillRelOptUtil.isProjectOutputRowcountUnknown(projectRel))
{
- //Preserve limit above the project since Flatten can produce more rows. Also mark it so we do not fire the rule again.
- final RelNode limitAboveProject = new DrillLimitRel(limitRel.getCluster(), limitRel.getTraitSet(), newProject,
- limitRel.getOffset(), limitRel.getFetch(), true);
- call.transformTo(limitAboveProject);
- }
else
{ - call.transformTo(newProject); - }+ call.transformTo(newProject);
}
};
diff --git a/exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java b/exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java
index 100d194cff..f22db7b371 100644
— a/exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java
+++ b/exec/java-exec/src/test/java/org/apache/drill/TestBugFixes.java
@@ -22,6 +22,7 @@
import org.apache.drill.common.exceptions.UserException;
import org.apache.drill.exec.planner.physical.PlannerSettings;
import org.apache.drill.test.BaseTestQuery;
+import org.junit.Assert;
import org.junit.BeforeClass;
import org.junit.Ignore;
import org.junit.Test;
@@ -300,4 +301,16 @@ public void testDRILL5269() throws Exception
}
+
+ @Test
+ public void testDRILL6318() throws Exception
}
diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/flatten/TestFlattenPlanning.java b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/flatten/TestFlattenPlanning.java
index 0e2d92c5c2..9731aa2591 100644
— a/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/flatten/TestFlattenPlanning.java
+++ b/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/flatten/TestFlattenPlanning.java
@@ -66,8 +66,9 @@ public void testPushFilterPastProjectWithFlattenNeg() throws Exception {
@Test // DRILL-6099 : push limit past flatten(project)
public void testLimitPushdownPastFlatten() throws Exception {
final String query = "select rownum, flatten(complex) comp from cp.`store/json/test_flatten_mappify2.json` limit 1";
- final String[] expectedPatterns =
{".*Limit\\(fetch=\\[1\\]\\).*",".*Flatten.*",".*Limit\\(fetch=\\[1\\]\\).*"}
;
- final String[] excludedPatterns = null;
+ //DRILL-6318: limit should not push past flatten(project)
+ final String[] expectedPatterns = {"(?s).*Limit.*Flatten.*Project.*"};
{"(?s).*Limit.*Flatten.*Limit.*"}
+ final String[] excludedPatterns =;
PlanTestBase.testPlanMatchingPatterns(query, expectedPatterns, excludedPatterns);
}
diff --git a/exec/java-exec/src/test/resources/jsoninput/bug6318.json b/exec/java-exec/src/test/resources/jsoninput/bug6318.json
new file mode 100644
index 0000000000..1fdef8e824
— /dev/null
+++ b/exec/java-exec/src/test/resources/jsoninput/bug6318.json
@@ -0,0 +1,12 @@
+[
+
,
+
,
+
,
+
,
+
,
+
,
+
,
+
,
+
,
+
+]
\ No newline at end of file
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
P.S. Sorry for my terrible English