[DRILL-5502] Parallelized external sort is slower compared to the single fragment scenario on some data sets - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.10.0
Fix Version/s: None
Component/s: Execution - Relational Operators
Labels:
None

Description

git.commit.id.abbrev=1e0a14c

The below query runs in a single fragment and completes in ~13 minutes

ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 62600000;
alter session set `planner.width.max_per_query` = 17;
select count(*) from (select * from dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by columns[0]) d where d.columns[0] = '4041054511';
+---------+
| EXPR$0  |
+---------+
| 0       |
+---------+
1 row selected (832.705 seconds)

Now I increased the parallelization to 10 and also increased the memory allocated to the sort by 10 times, so that each individual fragments still ends up getting the similar amount of memory. In this case however the query takes ~30 minutes to complete which is strange

ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 10;
alter session set `planner.memory.max_query_memory_per_node` = 626000000;
alter session set `planner.width.max_per_query` = 17;
select count(*) from (select * from dfs.`/drill/testdata/resource-manager/5kwidecolumns_500k.tbl` order by columns[0]) d where d.columns[0] = '4041054511';
+---------+
| EXPR$0  |
+---------+
| 0       |
+---------+
1 row selected (1845.508 seconds)

My data set contains wide columns (5k chars wide). I will try to reproduce this with a data set where the column width is < 256 bytes.

Attached the data profile and log file from both the scenarios. The data set is too large to attach to a jira

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

multiple_fragments.log
10/May/17 23:16
13.81 MB
Rahul Kumar Challapalli
multiple_fragments.sys.drill
10/May/17 23:16
38 kB
Rahul Kumar Challapalli
single_fragment.log
10/May/17 23:16
2.46 MB
Rahul Kumar Challapalli
single_fragment.sys.drill
10/May/17 23:16
13 kB
Rahul Kumar Challapalli

Activity

People

Assignee:: Paul Rogers

Reporter:: Rahul Kumar Challapalli

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/May/17 22:34

Updated:: 19/Jun/17 05:04