[ASTERIXDB-1879] Filter doesn't filter out components after restart - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: STO - Storage
Labels:
None
Environment:
master
"git.commit.id": "05e4256""

Description

It's difficult for me to find the exact code, but here is the symptom

Previously, if we run a query with a high-selective time range (which is the filter field) like following

count(for $d in dataset twitter.ds_tweet 
  where $d.'create_at' >= datetime('2017-04-02T16:23:13.333Z') and $d.'create_at' < datetime('2017-04-03T16:23:13.333Z')  
  return $d)

The running time is fast. Now the running time is the same as scan query.

Here is the ddl

drop dataverse twitter if exists;
create dataverse twitter if not exists;
use dataverse twitter

create type typeUser if not exists as open {
    id: int64,
    name: string,
    screen_name : string,
    lang : string,
    location: string,
    create_at: date,
    description: string,
    followers_count: int32,
    friends_count: int32,
    statues_count: int64
}

create type typePlace if not exists as open{
    country : string,
    country_code : string,
    full_name : string,
    id : string,
    name : string,
    place_type : string,
    bounding_box : rectangle
}

create type typeGeoTag if not exists as open {
    stateID: int32,
    stateName: string,
    countyID: int32,
    countyName: string,
    cityID: int32?,
    cityName: string?
}

create type typeTweet if not exists as open{
    create_at : datetime,
    id: int64,
    "text": string,
    in_reply_to_status : int64,
    in_reply_to_user : int64,
    favorite_count : int64,
    coordinate: point?,
    retweet_count : int64,
    lang : string,
    is_retweet: boolean,
    hashtags : {{ string }} ?,
    user_mentions : {{ int64 }} ? ,
    user : typeUser,
    place : typePlace?,
    geo_tag: typeGeoTag
}

create dataset ds_tweet(typeTweet) if not exists primary key id using compaction policy prefix (("max-mergable-component-size"="134217728"),("max-tolerance-component-count"="10")
) with filter on create_at ;
create index text_idx if not exists on ds_tweet("text") type keyword;

The optimized logical plan is exactly the same as before. I'm wondering maybe it is the problem of the implementation?

Attachments

Issue Links

relates to

ASTERIXDB-1872 java.lang.IndexOutOfBoundsException when ingesting data using small memory component size

Resolved

Activity

People

Assignee:: Ian Maxon

Reporter:: Jianfeng Jia

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 09/Apr/17 21:04

Updated:: 13/Apr/17 22:48

Resolved:: 13/Apr/17 22:48