Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
2.13.0, 2.14.0
-
None
-
None
-
Unknown
Description
camel-hdfs2 consumer overwriting data instead of appending them
There is probably bug in camel hdfs2 consumer.
In this project are two camel routes, one taking files from `test-source` and uploading them to hadoop hdfs,
another route watching folder in hadoop hdfs and downloading them to `test-dest` folder in this project.
It seems, that when downloading file from hdfs to local filesystem, it keeps writing chunks of data to begining of target file in test-source, instead of simply appending chunks, as I would expect.
From camel log i suppose, that each chunk of data from hadoop file is treated it was whole file.
Ruby script `generate_textfile.rb` can generate file `test.txt` with content
0 - line 1 - line 2 - line 3 - line 4 - line 5 - line ... ... 99999 - line
Scenario
- expecting running hadoop instance on localhost:8020
- run mvn camel:run
- copy test.txt into test-source
- see log and file test.txt in test-dest
- rest.txt in test-dest folder should contain only last x lines of original one.
Camel log
[localhost:8020/tmp/camel-test/] toFile INFO picked up file from hdfs with name test.txt [localhost:8020/tmp/camel-test/] toFile INFO file downloaded from hadoop [localhost:8020/tmp/camel-test/] toFile INFO picked up file from hdfs with name test.txt [localhost:8020/tmp/camel-test/] toFile INFO file downloaded from hadoop [localhost:8020/tmp/camel-test/] toFile INFO picked up file from hdfs with name test.txt [localhost:8020/tmp/camel-test/] toFile INFO file downloaded from hadoop [localhost:8020/tmp/camel-test/] toFile INFO picked up file from hdfs with name test.txt [localhost:8020/tmp/camel-test/] toFile INFO file downloaded from hadoop
Envoriment
- camel 2.14 and 2.13
- hadoop VirtualBox VM
- * downloaded from http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-2-x.html
- * tested with version 2.3.0-cdh5.1.0, r8e266e052e423af592871e2dfe09d54c03f6a0e8 which I couldn't find on download page
- hadoop docker image
- * https://github.com/sequenceiq/hadoop-docker
- * results were the same as with virtualbox vm
In case ov VirtualBox VM, by default it binds hdfs to `hdfs://quickstart.cloudera:8020` and it needs to be changed in `/etc/hadoop/conf/core-site.xml`. It should work when `fs.defaultFS` is set to `hdfs://0.0.0.0:8020`.
In case of docker hadoop image, first start docker container, figure out its ip address, and use it for camel hdfs component.
Here camel uri would be `hdfs:172.17.0.2:9000/tmp/camel-test`.
docker run -i -t sequenceiq/hadoop-docker:2.5.1 /etc/bootstrap.sh -bash Starting sshd: [ OK ] Starting namenodes on [966476255fc2] 966476255fc2: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-966476255fc2.out localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-966476255fc2.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-966476255fc2.out starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-966476255fc2.out localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-966476255fc2.out
see to which IP hdfs filesystem api is bound to inside docker container
bash-4.1# netstat -tulnp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name ... tcp 0 0 172.17.0.2:9000 0.0.0.0:* LISTEN - ...
There might be Exception because of hdfs permissions. It could be solved by setting hdfs filesystem permissions.
bash-4.1# /usr/local/hadoop/bin/hdfs dfs -chmod 777 /
Attachments
Attachments
Issue Links
- relates to
-
CAMEL-8150 camel-hdfs sending message per chunk, not per file
- Closed