Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: yarn
    • Labels:
      None
    • Flags:
      Patch

      Description

      Add a simple HDFS producer and related utilities. I've been using a version of this. Initial patch comes with a very basic test and it's own subproject setup which may or may not be what we want?

      Thanks!

      1. SAMZA-693-1.patch
        21 kB
        Eli Reisman
      2. SAMZA-693-2.patch
        21 kB
        Eli Reisman
      3. SAMZA-693-3.patch
        22 kB
        Eli Reisman
      4. SAMZA-693-4.patch
        44 kB
        Eli Reisman
      5. SAMZA-693-5.patch
        48 kB
        Eli Reisman
      6. SAMZA-693-6.patch
        52 kB
        Eli Reisman
      7. SAMZA-693-7.patch
        53 kB
        Eli Reisman

        Activity

        Hide
        initialcontext Eli Reisman added a comment -

        Thank you! I'll dig into the hello-samza contribution and some other updates ASAP

        Show
        initialcontext Eli Reisman added a comment - Thank you! I'll dig into the hello-samza contribution and some other updates ASAP
        Hide
        closeuris Yan Fang added a comment -

        Thank you, Eli Reisman. The new uploaded patch works. Committed to the master. Thank you again for the contribution!

        Show
        closeuris Yan Fang added a comment - Thank you, Eli Reisman . The new uploaded patch works. Committed to the master. Thank you again for the contribution!
        Hide
        initialcontext Eli Reisman added a comment -

        SAMZA-693-7 is the patch, rebased to trunk, with the a/ and b/ prefixes in the diff. Let me know if you have trouble with it, I can apply it with "patch -p1 < SAMZA-693-7.patch"

        I suppose I don't need to but I'm going to upload it to the RB issue as well. Thanks again for the reviews!

        Show
        initialcontext Eli Reisman added a comment - SAMZA-693 -7 is the patch, rebased to trunk, with the a/ and b/ prefixes in the diff. Let me know if you have trouble with it, I can apply it with "patch -p1 < SAMZA-693 -7.patch" I suppose I don't need to but I'm going to upload it to the RB issue as well. Thanks again for the reviews!
        Hide
        closeuris Yan Fang added a comment -

        Hi Eli Reisman, thank you for the patch. The RB looks good to me. Just want to do the testing before committing. BTW, could you remove "--no-prefix" when generating the patch? I still can not apply the patch with SAMZA-693-6.patch.

        Show
        closeuris Yan Fang added a comment - Hi Eli Reisman , thank you for the patch. The RB looks good to me. Just want to do the testing before committing. BTW, could you remove "--no-prefix" when generating the patch? I still can not apply the patch with SAMZA-693 -6.patch.
        Hide
        initialcontext Eli Reisman added a comment -

        Adds documentation links, new configuration page table entries, small refactor and cleanup post-review (see RB issue for details) should be ready to go now pending final look-over by committers.

        Will update the patch on the RB issue as well.

        Show
        initialcontext Eli Reisman added a comment - Adds documentation links, new configuration page table entries, small refactor and cleanup post-review (see RB issue for details) should be ready to go now pending final look-over by committers. Will update the patch on the RB issue as well.
        Hide
        initialcontext Eli Reisman added a comment -

        Updating the patch here (and once more on RB issue #35445. Adding docs/ Markdown file explaining usage, and a few more tweaks. passes 'gradle clean test'

        Thanks again for waiting!

        Show
        initialcontext Eli Reisman added a comment - Updating the patch here (and once more on RB issue #35445. Adding docs/ Markdown file explaining usage, and a few more tweaks. passes 'gradle clean test' Thanks again for waiting!
        Hide
        initialcontext Eli Reisman added a comment -

        Sorry forgot to run the patch with --no-prefix, here's the (otherwise identical) replacement copy.

        Show
        initialcontext Eli Reisman added a comment - Sorry forgot to run the patch with --no-prefix, here's the (otherwise identical) replacement copy.
        Hide
        initialcontext Eli Reisman added a comment -

        This is a big refactor. Includes:

        1) More (and better) unit tests

        2) Pluggable hdfs formats, compression codecs, and path bucketing

        3) Two default implementations of SequenceFile writers. One for binary data of any type, another for String output (text lines, JSON, whatever)

        4) Fixes some post-review issues (see Apache RB for this ticket for details)

        ...but does not remove the text job configuration properties files. Configuring the HDFS producer using real job props files has helped make the code more testable without loosening up encapsulation, and makes the tests more realistic. Hope thats OK.

        It passes tests 'gradle clean test' and is ready to go. I'll upload an update to the RB issue as well. Sorry it took so long!

        Show
        initialcontext Eli Reisman added a comment - This is a big refactor. Includes: 1) More (and better) unit tests 2) Pluggable hdfs formats, compression codecs, and path bucketing 3) Two default implementations of SequenceFile writers. One for binary data of any type, another for String output (text lines, JSON, whatever) 4) Fixes some post-review issues (see Apache RB for this ticket for details) ...but does not remove the text job configuration properties files. Configuring the HDFS producer using real job props files has helped make the code more testable without loosening up encapsulation, and makes the tests more realistic. Hope thats OK. It passes tests 'gradle clean test' and is ready to go. I'll upload an update to the RB issue as well. Sorry it took so long!
        Hide
        initialcontext Eli Reisman added a comment -

        Nearly done, sorry it's taking so long, been really busy will hopefully post the updated patch here and on RB today

        Show
        initialcontext Eli Reisman added a comment - Nearly done, sorry it's taking so long, been really busy will hopefully post the updated patch here and on RB today
        Hide
        closeuris Yan Fang added a comment -

        Thanks for the great work, Eli.

        About issue #2 above (the fatal git error) I have been running the patches with --no-prefix to remove the first (a/... b/...) path element in the diff. I can submit the next version without removing the leading path elements if that would help?

        This should solve the issue. Thank you!

        Show
        closeuris Yan Fang added a comment - Thanks for the great work, Eli. About issue #2 above (the fatal git error) I have been running the patches with --no-prefix to remove the first (a/... b/...) path element in the diff. I can submit the next version without removing the leading path elements if that would help? This should solve the issue. Thank you!
        Hide
        initialcontext Eli Reisman added a comment -

        Hey I've been reformatting the patch, refactoring the writer and compression codec to be more pluggable. I am still working through a couple issues while beefing up the unit tests, but will post the updated patch here and RB soon. I'll take a look at adding the doc too, good idea.

        About issue #2 above (the fatal git error) I have been running the patches with --no-prefix to remove the first (a/... b/...) path element in the diff. I can submit the next version without removing the leading path elements if that would help?

        Show
        initialcontext Eli Reisman added a comment - Hey I've been reformatting the patch, refactoring the writer and compression codec to be more pluggable. I am still working through a couple issues while beefing up the unit tests, but will post the updated patch here and RB soon. I'll take a look at adding the doc too, good idea. About issue #2 above (the fatal git error) I have been running the patches with --no-prefix to remove the first (a/... b/...) path element in the diff. I can submit the next version without removing the leading path elements if that would help?
        Hide
        closeuris Yan Fang added a comment -

        RB: https://reviews.apache.org/r/35445/ for other guys' convenience.

        Hi Eli Reisman, I think the code is very good.

        1. Just a few nits about the format in the RB.

        2. can not apply the patch in this JIRA, got

        fatal: git diff header lacks filename information when removing 1 leading pathname component (line 5)
        

        3. do you mind adding a doc? I think we can create a new directory under docs/learn/documentation/versioned/, called something like "Integrated Systems" (maybe a better name? ). And put the doc under that directory.

        4. in terms of making the output format pluggable, agreed with you. we can open a new ticket for that.

        Thank you!

        Show
        closeuris Yan Fang added a comment - RB: https://reviews.apache.org/r/35445/ for other guys' convenience. Hi Eli Reisman , I think the code is very good. 1. Just a few nits about the format in the RB. 2. can not apply the patch in this JIRA, got fatal: git diff header lacks filename information when removing 1 leading pathname component (line 5) 3. do you mind adding a doc? I think we can create a new directory under docs/learn/documentation/versioned/, called something like "Integrated Systems" (maybe a better name? ). And put the doc under that directory. 4. in terms of making the output format pluggable, agreed with you. we can open a new ticket for that. Thank you!
        Hide
        initialcontext Eli Reisman added a comment -

        Just uploaded via RBT, review is #35445. Thanks!

        Show
        initialcontext Eli Reisman added a comment - Just uploaded via RBT, review is #35445. Thanks!
        Hide
        closeuris Yan Fang added a comment -

        Can you try it again? It works for me now.

        Show
        closeuris Yan Fang added a comment - Can you try it again? It works for me now.
        Hide
        initialcontext Eli Reisman added a comment -

        Having some trouble with Review Board uploads. Thought it was my Mac browsers but tried on my Linux box and also no go, the diff upload just hangs every time? I'll try "rbt" tool next. sorry for the delay.

        Show
        initialcontext Eli Reisman added a comment - Having some trouble with Review Board uploads. Thought it was my Mac browsers but tried on my Linux box and also no go, the diff upload just hangs every time? I'll try "rbt" tool next. sorry for the delay.
        Hide
        initialcontext Eli Reisman added a comment -

        Fixed an integration test that was failing due to conflicting ZK versions; fixed a RAT warning. Passes Gradle "test" and "check". Will upload to ReviewBoard ASAP.

        Show
        initialcontext Eli Reisman added a comment - Fixed an integration test that was failing due to conflicting ZK versions; fixed a RAT warning. Passes Gradle "test" and "check". Will upload to ReviewBoard ASAP.
        Hide
        initialcontext Eli Reisman added a comment -

        Hi, I've been out of town and haven't looked into this or posted either patch on Review Board yet. I'll do that ASAP now that I'm back. I'll also take a look at why the integration test on this one is failing, thanks for applying the patch and sorry about the delay.

        Show
        initialcontext Eli Reisman added a comment - Hi, I've been out of town and haven't looked into this or posted either patch on Review Board yet. I'll do that ASAP now that I'm back. I'll also take a look at why the integration test on this one is failing, thanks for applying the patch and sorry about the delay.
        Hide
        navina Navina Ramesh added a comment -

        Eli Reisman : I applied your patch and as you said the integration test is failing. I am not able to exactly reproduce the error you see.

        However, I see a different error that is not related to Log4J.

        testShouldStartAndRestore FAILED
            java.lang.SecurityException: class "javax.servlet.ServletRegistration$Dynamic"'s signer information does not match signer information of other classes in the same package
                at java.lang.ClassLoader.checkCerts(ClassLoader.java:895)
                at java.lang.ClassLoader.preDefineClass(ClassLoader.java:665)
                at java.lang.ClassLoader.defineClass(ClassLoader.java:758)
                at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
                at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
                ...
        

        This usually happens when classes from same package are loaded from different jars and the jar file signatures don't match (perhaps??).

        I can take a look at the patch once it is available for review. Please note that the integration test basically brings up a zookeeper/kafka setup and tries to run a Samza job. It is a minimum requirement for a patch to work.

        Show
        navina Navina Ramesh added a comment - Eli Reisman : I applied your patch and as you said the integration test is failing. I am not able to exactly reproduce the error you see. However, I see a different error that is not related to Log4J. testShouldStartAndRestore FAILED java.lang.SecurityException: class "javax.servlet.ServletRegistration$Dynamic" 's signer information does not match signer information of other classes in the same package at java.lang. ClassLoader .checkCerts( ClassLoader .java:895) at java.lang. ClassLoader .preDefineClass( ClassLoader .java:665) at java.lang. ClassLoader .defineClass( ClassLoader .java:758) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) ... This usually happens when classes from same package are loaded from different jars and the jar file signatures don't match (perhaps??). I can take a look at the patch once it is available for review. Please note that the integration test basically brings up a zookeeper/kafka setup and tries to run a Samza job. It is a minimum requirement for a patch to work.
        Hide
        initialcontext Eli Reisman added a comment -

        Thank you for taking a look Navina. The issue seems to be a permissions issue for Log4J writing a file during the integration test in that trace. I'm on MacOSX & JDK7 if that helps. I also think I need to update something for my unit test (and maybe make it a bit more robust in general) and post one more patch before submitting to Review Board.

        Thanks for the quick response!

        Show
        initialcontext Eli Reisman added a comment - Thank you for taking a look Navina. The issue seems to be a permissions issue for Log4J writing a file during the integration test in that trace. I'm on MacOSX & JDK7 if that helps. I also think I need to update something for my unit test (and maybe make it a bit more robust in general) and post one more patch before submitting to Review Board. Thanks for the quick response!
        Hide
        navina Navina Ramesh added a comment -

        Yeah. It does seem unrelated. I can take a look at why it is failing.
        Meanwhile, it is much easier to review your changes through an RB.

        Show
        navina Navina Ramesh added a comment - Yeah. It does seem unrelated. I can take a look at why it is failing. Meanwhile, it is much easier to review your changes through an RB.
        Hide
        navina Navina Ramesh added a comment -

        Hi Eli Reisman ,
        Can you please put the patch in an RB for everyone to review?
        This feature request frequently comes up in the mailing list and will be very useful.

        Thanks!

        Show
        navina Navina Ramesh added a comment - Hi Eli Reisman , Can you please put the patch in an RB for everyone to review? This feature request frequently comes up in the mailing list and will be very useful. Thanks!
        Hide
        initialcontext Eli Reisman added a comment -

        Quick update to the patch. Having some strange issues with a (seemingly?) unrelated integration test, trace is like this:

        testShouldStartAndRestore FAILED
            java.lang.InstantiationError: org.apache.samza.checkpoint.CheckpointManager
                at org.apache.samza.coordinator.JobCoordinator$.apply(JobCoordinator.scala:74)
                at org.apache.samza.coordinator.JobCoordinator$.apply(JobCoordinator.scala:79)
                at org.apache.samza.job.local.ThreadJobFactory.getJob(ThreadJobFactory.scala:40)
                at org.apache.samza.job.JobRunner.run(JobRunner.scala:93)
                at org.apache.samza.test.integration.TestStatefulTask.startJob(TestStatefulTask.scala:324)
                at org.apache.samza.test.integration.TestStatefulTask.testShouldStartTaskForFirstTime(TestStatefulTask.scala:235)
                at org.apache.samza.test.integration.TestStatefulTask.testShouldStartAndRestore(TestStatefulTask.scala:230)
        
        137 tests completed, 1 failed
        :samza-test_2.10:test FAILED
        
        Show
        initialcontext Eli Reisman added a comment - Quick update to the patch. Having some strange issues with a (seemingly?) unrelated integration test, trace is like this: testShouldStartAndRestore FAILED java.lang.InstantiationError: org.apache.samza.checkpoint.CheckpointManager at org.apache.samza.coordinator.JobCoordinator$.apply(JobCoordinator.scala:74) at org.apache.samza.coordinator.JobCoordinator$.apply(JobCoordinator.scala:79) at org.apache.samza.job.local.ThreadJobFactory.getJob(ThreadJobFactory.scala:40) at org.apache.samza.job.JobRunner.run(JobRunner.scala:93) at org.apache.samza.test.integration.TestStatefulTask.startJob(TestStatefulTask.scala:324) at org.apache.samza.test.integration.TestStatefulTask.testShouldStartTaskForFirstTime(TestStatefulTask.scala:235) at org.apache.samza.test.integration.TestStatefulTask.testShouldStartAndRestore(TestStatefulTask.scala:230) 137 tests completed, 1 failed :samza-test_2.10:test FAILED
        Hide
        initialcontext Eli Reisman added a comment -

        patch included

        Show
        initialcontext Eli Reisman added a comment - patch included

          People

          • Assignee:
            initialcontext Eli Reisman
            Reporter:
            initialcontext Eli Reisman
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development