[DRILL-5358] Error if Parquet file changes during query - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.9.0
Fix Version/s: None
Component/s: Metadata, Storage - Parquet
Labels:
None

Description

We have a scenario where we generate our own parquet files
every X amount of seconds.
These files are in a structure based on date and it is only the file for today that gets updated

The process is as follows

1. generate parquet file in temp directory
2. When finished generation mv the file into a drill workspace/ (data/2017/03/10/data.parquet, ..)
3. Then restart the process

We have noticed that if the file is moved in while a query has started running
it will throw and error that the parquet magic number is incorrect
This is due to the file length being cached and reused so basically what seems to happen is

1. Drill plans the query
2. File gets changed under Drills feet
3. Drill executes query and tries to read and incorrect offset of the changed file

Is there anyway to fix this or avoid this scenario?
Another side effect of constantly generating a new file is that the metadata cache gets discarded for the whole workspace despite only one file changing
Is there a way to avoid that?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Tobias

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 16/Mar/17 08:22

Updated:: 16/Mar/17 08:22