Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-8040

camel-hdfs2 consumer overwriting data instead of appending them

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 2.13.0, 2.14.0
    • None
    • camel-hdfs
    • None
    • Unknown

    Description

      camel-hdfs2 consumer overwriting data instead of appending them

      There is probably bug in camel hdfs2 consumer.

      In this project are two camel routes, one taking files from `test-source` and uploading them to hadoop hdfs,
      another route watching folder in hadoop hdfs and downloading them to `test-dest` folder in this project.

      It seems, that when downloading file from hdfs to local filesystem, it keeps writing chunks of data to begining of target file in test-source, instead of simply appending chunks, as I would expect.
      From camel log i suppose, that each chunk of data from hadoop file is treated it was whole file.

      Ruby script `generate_textfile.rb` can generate file `test.txt` with content

      0 - line
      1 - line
      2 - line
      3 - line
      4 - line
      5 - line
      ...
      ...
      99999 - line
      

      Scenario

      • expecting running hadoop instance on localhost:8020
      • run mvn camel:run
      • copy test.txt into test-source
      • see log and file test.txt in test-dest
      • rest.txt in test-dest folder should contain only last x lines of original one.

      Camel log

      [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from hdfs with name test.txt
      [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from hadoop
      [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from hdfs with name test.txt
      [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from hadoop
      [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from hdfs with name test.txt
      [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from hadoop
      [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from hdfs with name test.txt
      [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from hadoop
      

      Envoriment

      In case ov VirtualBox VM, by default it binds hdfs to `hdfs://quickstart.cloudera:8020` and it needs to be changed in `/etc/hadoop/conf/core-site.xml`. It should work when `fs.defaultFS` is set to `hdfs://0.0.0.0:8020`.

      In case of docker hadoop image, first start docker container, figure out its ip address, and use it for camel hdfs component.
      Here camel uri would be `hdfs:172.17.0.2:9000/tmp/camel-test`.

       
      docker run -i -t sequenceiq/hadoop-docker:2.5.1 /etc/bootstrap.sh -bash
      
      Starting sshd:                                             [  OK  ]
      Starting namenodes on [966476255fc2]
      966476255fc2: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-966476255fc2.out
      localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-966476255fc2.out
      Starting secondary namenodes [0.0.0.0]
      0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-966476255fc2.out
      starting yarn daemons
      starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-966476255fc2.out
      localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-966476255fc2.out
      

      see to which IP hdfs filesystem api is bound to inside docker container

      bash-4.1# netstat -tulnp 
      Active Internet connections (only servers)
      Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
      ...
      tcp        0      0 172.17.0.2:9000             0.0.0.0:*                   LISTEN      -                   
      ...
      

      There might be Exception because of hdfs permissions. It could be solved by setting hdfs filesystem permissions.

      bash-4.1# /usr/local/hadoop/bin/hdfs dfs -chmod 777 /
      
      

      Attachments

        1. hdfs-reproducer.zip
          474 kB
          Josef Ludvíček

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            njiang Willem Jiang
            ludvicekj Josef Ludvíček
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment