Details
-
Bug
-
Status: Resolved
-
P1
-
Resolution: Fixed
-
None
-
None
Description
When there is only a single split key, splitKeysToFilters does not compute the correct result. For example, if the split key is "_id: 56", only the range filter "_id lower than or equal to 56" is produced. It should also include a filter "_id greater than 56". If this happens, the resulting PCollection includes only the data until the first split; the remainder is not included.
This can be remedied with the following few lines:
if (i == 0) {
// this is the first split in the list, the filter defines
// the range from the beginning up to this split
rangeFilter = String.format("{ $and: [ {\"_id\":{$lte:%s",}}
getFilterString(idType, splitKey));
filters.add(formatFilter(rangeFilter, additionalFilter));
// If there is only one split, also generate a range from the split to the end
if ( splitKeys.size() == 1) {
rangeFilter = String.format("{ $and: [ {\"_id\":{$gt:%s}}",getFilterString(idType, splitKey));
filters.add(formatFilter(rangeFilter, additionalFilter));
}
}
The corresponding test case in MongoDbIOTest should be updated to the following:
@Test
public void testSplitIntoFilters() throws Exception {
// A single split will result in two filters
ArrayList<Document> documents = new ArrayList<>();
documents.add(new Document("_id", 56));
List<String> filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);
assertEquals(2, filters.size());
assertEquals("{ $and: [ {\"_id\":{$lte:ObjectId(\"56\") ]}", filters.get(0));}}
assertEquals("{ $and: [ {\"_id\":{$gt:ObjectId(\"56\") ]}", filters.get(1));}}
// Add two more splits; now we should have 4 filters
documents.add(new Document("_id", 109));
documents.add(new Document("_id", 256));
filters = MongoDbIO.BoundedMongoDbSource.splitKeysToFilters(documents, null);
assertEquals(4, filters.size());
assertEquals("{ $and: [ {\"_id\":{$lte:ObjectId(\"56\") ]}", filters.get(0));}}
assertEquals("{ $and: [ {\"_id\"{$gt:ObjectId(\"56\"),$lte:ObjectId(\"109\") ]}",}}
{{ filters.get(1));}}
assertEquals("{ $and: [ {\"_id\":{$gt:ObjectId(\"109\"),$lte:ObjectId(\"256\") ]}",}}
{{ filters.get(2));}}
assertEquals("{ $and: [ {\"_id\":{$gt:ObjectId(\"256\") ]}", filters.get(3));}}
}
Attachments
Issue Links
- links to