Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.7.1
-
None
Description
When processor is configured to use "Column for Value Partitioning" processor property it still calculates COUNT while it's not used further in code.
At line 292 code adds an explicit SELECT COUNT to the query that calculates the bounds of the partitions:
List<String> maxValueSelectColumns = new ArrayList<>(numMaxValueColumns + 1); maxValueSelectColumns.add("COUNT(*)");
At line 414 we see that the count is only used if no columns have been set for paging:
if (useColumnValsForPaging) { final long valueRangeSize = maxValueForPartitioning == null ? 0 : (maxValueForPartitioning - minValueForPartitioning + 1); numberOfFetches = (partitionSize == 0) ? 1 : (valueRangeSize / partitionSize) + (valueRangeSize % partitionSize == 0 ? 0 : 1); } else { numberOfFetches = (partitionSize == 0) ? 1 : (rowCount / partitionSize) + (rowCount % partitionSize == 0 ? 0 : 1); }
Since the SELECT COUNT sometimes takes too long to complete, we want to optimize this code and perform the SELECT COUNT only when property "Column for Value Partitioning" has not been set.
The idea is to use -1 value to minimize code changes:
if (useColumnValsForPaging) { maxValueSelectColumns.add("-1"); } else { maxValueSelectColumns.add("COUNT(*)"); }
Attachments
Issue Links
- is related to
-
NIFI-5855 Optimize GenerateTableFetch to remove unnecessary ORDER BY clause
- Resolved
- links to