Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 1.4
-
None
-
None
Description
- In impala
> create table ids(id string);
- In hive
hive> insert overwrite ids select id from ...;
- Impala - refresh and make sure the read works.
> refresh ids; > select count(*) from ids; Query: select count(*) from ids +----------+ | count(*) | +----------+ | 19899 | +----------+ Returned 1 row(s) in 0.80s
- In hive, replace again with a query that returns no results, i.e. overwrite existing file with empty file.
hive> insert overwrite table ids select id from ... where id < 0;
- In impala, reading from ids will now never return
> select count(*) from ids; Query: select count(*) from ids
The scanner thread repeatedly enqueues scan ranges because it thinks the file is longer than it is. We can see the same file being opened and closed repeatedly:
... I0217 12:10:46.580857 7373 disk-io-mgr-scan-range.cc:311] hdfsCloseFile() file=hdfs://nameservice1/user/hive/warehouse/alanj.db/fds/000000_0 I0217 12:10:46.583314 18907 disk-io-mgr-scan-range.cc:257] hdfsOpenFile() file=hdfs://nameservice1/user/hive/warehouse/alanj.db/fds/000000_0 I0217 12:10:46.583454 7373 disk-io-mgr-scan-range.cc:311] hdfsCloseFile() file=hdfs://nameservice1/user/hive/warehouse/alanj.db/fds/000000_0 I0217 12:10:46.585736 18907 disk-io-mgr-scan-range.cc:257] hdfsOpenFile() file=hdfs://nameservice1/user/hive/warehouse/alanj.db/fds/000000_0 I0217 12:10:46.585836 7373 disk-io-mgr-scan-range.cc:311] hdfsCloseFile() file=hdfs://nameservice1/user/hive/warehouse/alanj.db/fds/000000_0 I0217 12:10:46.587977 18907 disk-io-mgr-scan-range.cc:257] hdfsOpenFile() file=hdfs://nameservice1/user/hive/warehouse/alanj.db/fds/000000_0 I0217 12:10:46.588083 7373 disk-io-mgr-scan-range.cc:311] hdfsCloseFile() file=hdfs://nameservice1/user/hive/warehouse/alanj.db/fds/000000_0 I0217 12:10:46.590450 18907 disk-io-mgr-scan-range.cc:257] hdfsOpenFile() file=hdfs://nameservice1/user/hive/warehouse/alanj.db/fds/000000_0 ...
In this scenario, FDs are not leaked, but there may be another scenario in which close is not called and then FDs would be leaked.