Kafka
  1. Kafka
  2. KAFKA-865

Mavenize and separate the client.

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: clients
    • Labels:
      None

      Description

      It seems that the java client for Kafka is also bundled with the server JAR file and this is generated using sbt package. This is difficult for java folks to work with because:

      1) Many java shops use maven and they want to specify the GAV of kafka in their pom and bang, the client jar and all its dependencies should be added to the application's classpath. I can't do that right now, because I need to run ./sbt eclipse, get the .JAR, add that to my classpath, add a whole lot of dependencies (log4j, slf4j, zkClient and so on) manually, which is a pain.

      There are 90 million maven central uploads/downloads in 2012 alone. Almost all the java shops out there have maven (either central or in house sonatype).

      2) Separation of concerns - keeping the server (core) and the client's classes together in same jar file, increases the size of the bundle for a client and also everytime the server's code changes and a release is performed, the client also needs to update their .JAR file. which is not very great. We don't want a ton of clients to update their .JAR file, just because a faster replication strategy for the kafka server cluster changed in a new release.

      Action items are to separate the client and server portions of Kafka, add it in a pom along with the compile time dependencies and upload it to Maven Central or if you have a LinkedIn externally exposed Nexus, over there.

      This will increase adoption of the Kafka framework.

        Activity

        Ashwanth Fernando created issue -
        Hide
        Scott Carey added a comment -

        I'm a big fan of Maven (It has some big flaws, but so does SBT and ant), but it is not necessary for this. SBT can publish artifacts to maven central with proper pom files.

        I'm not sure how much the client dependency tree differs from the server one at the moment – perhaps only with zookeeper. I do feel that it is important to have dependencies declared in the published pom be accurate and not contain cruft for different use cases (e.g. producer vs consumer may have different transitive dependencies, and thus should have different artifacts).

        Show
        Scott Carey added a comment - I'm a big fan of Maven (It has some big flaws, but so does SBT and ant), but it is not necessary for this. SBT can publish artifacts to maven central with proper pom files. I'm not sure how much the client dependency tree differs from the server one at the moment – perhaps only with zookeeper. I do feel that it is important to have dependencies declared in the published pom be accurate and not contain cruft for different use cases (e.g. producer vs consumer may have different transitive dependencies, and thus should have different artifacts).
        Hide
        Ashwanth Fernando added a comment -

        You are correct. Maven is not necessary to upload to a maven repo. sbt can be used as well. Just stating that kafka is not present on maven central right now and it needs to be pushed, server separate and client separate with the proper dependencies specified in the pom, so that integrating the java client with a java application becomes easy.

        Show
        Ashwanth Fernando added a comment - You are correct. Maven is not necessary to upload to a maven repo. sbt can be used as well. Just stating that kafka is not present on maven central right now and it needs to be pushed, server separate and client separate with the proper dependencies specified in the pom, so that integrating the java client with a java application becomes easy.
        Hide
        Esko Suomi added a comment -

        I guess I should point out that when it comes to dependencies, SBT is a bastardization of Ivy which on its own works really nice with Maven repositories, among other things.

        I do agree with several points here, especially the point about separating the server and client. As an example in our case we have to treat servers as really slowly evolving creatures and most of the bugs we've hit during our period of running Kafka have been exclusively client bugs. Because the two are tightly coupled, all we can do is to wait for the next release. If these two were separated, we could've remedied those client bugs with a quick update since the actual software we have can be updated as often as we like.

        Show
        Esko Suomi added a comment - I guess I should point out that when it comes to dependencies, SBT is a bastardization of Ivy which on its own works really nice with Maven repositories, among other things. I do agree with several points here, especially the point about separating the server and client. As an example in our case we have to treat servers as really slowly evolving creatures and most of the bugs we've hit during our period of running Kafka have been exclusively client bugs. Because the two are tightly coupled, all we can do is to wait for the next release. If these two were separated, we could've remedied those client bugs with a quick update since the actual software we have can be updated as often as we like.
        Ashwanth Fernando made changes -
        Field Original Value New Value
        Description It seems that the java client for Kafka is also bundled with the server JAR file and this is generated using sbt package. This is difficult for java folks to work with because:

        1) Many java shops use maven (and a lot of them have a Sonatype Nexus repository in house) for dependency management. They want to specify the GAV and bang, the client jar and all its dependencies should be added to the application's classpath. I can't do that right now, because I need to run ./sbt eclipse, get the .JAR, add that to my classpath, add a whole lot of dependencies (log4j, slf4j, zkClient and so on) manually, which is a pain.

        There are 90 million maven central uploads/downloads in 2012 alone. Almost all the java shops out there have maven (either central or in house sonatype).

        2) Separation of concerns - keeping the server (core) and the client's classes increases the size of the bundle for the client and also everytime the server's code changes and a release is performed, the client also needs to update their .JAR file. which is not very great. We don't want a ton of clients to update their .JAR file, just because a faster replication strategy for my kafka cluster changed in a new release.

        Action items are to separate the client portion of Kafka, add it in a pom along with the compile time dependencies and upload it to Maven Central or if you have a LinkedIn externally exposed Nexus, over there.

        This will increase adoption of the Kafka framework.
        It seems that the java client for Kafka is also bundled with the server JAR file and this is generated using sbt package. This is difficult for java folks to work with because:

        1) Many java shops use maven and they want to specify the GAV of kafka in their pom and bang, the client jar and all its dependencies should be added to the application's classpath. I can't do that right now, because I need to run ./sbt eclipse, get the .JAR, add that to my classpath, add a whole lot of dependencies (log4j, slf4j, zkClient and so on) manually, which is a pain.

        There are 90 million maven central uploads/downloads in 2012 alone. Almost all the java shops out there have maven (either central or in house sonatype).

        2) Separation of concerns - keeping the server (core) and the client's classes together in same jar file, increases the size of the bundle for a client and also everytime the server's code changes and a release is performed, the client also needs to update their .JAR file. which is not very great. We don't want a ton of clients to update their .JAR file, just because a faster replication strategy for the kafka server cluster changed in a new release.

        Action items are to separate the client and server portions of Kafka, add it in a pom along with the compile time dependencies and upload it to Maven Central or if you have a LinkedIn externally exposed Nexus, over there.

        This will increase adoption of the Kafka framework.
        Hide
        Joe Stein added a comment -

        Hi, does KAFKA-1018 clear this up or are there other updates/changes to make?

        Show
        Joe Stein added a comment - Hi, does KAFKA-1018 clear this up or are there other updates/changes to make?
        Hide
        Jay Kreps added a comment -

        I think we will fix this in the client rewrite, timeline would be early next year. Separating the existing code into two jars will be pretty hard.

        Proposal is here:
        https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite

        Show
        Jay Kreps added a comment - I think we will fix this in the client rewrite, timeline would be early next year. Separating the existing code into two jars will be pretty hard. Proposal is here: https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite

          People

          • Assignee:
            Unassigned
            Reporter:
            Ashwanth Fernando
          • Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development