Pig
  1. Pig
  2. PIG-2262

AvroStorage dependencies are missing from the release tarball

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: build, piggybank
    • Labels:
      None

      Description

      This makes AvroStorage hard to use, since users have to download the dependencies manually, or build Pig themselves.

      1. PIG-2262.patch
        3 kB
        Tom White
      2. PIG-2262.patch
        2 kB
        Tom White

        Activity

        Hide
        Tom White added a comment -

        Thanks Dmitriy. Fixing in Piggybank sounds like the right thing to do. I'm unassigning myself since I'm not working on this at the moment.

        Show
        Tom White added a comment - Thanks Dmitriy. Fixing in Piggybank sounds like the right thing to do. I'm unassigning myself since I'm not working on this at the moment.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Canceling patch to clear the review queue; let's solve this at the piggybank level.

        Show
        Dmitriy V. Ryaboy added a comment - Canceling patch to clear the review queue; let's solve this at the piggybank level.
        Hide
        Dmitriy V. Ryaboy added a comment -

        AvroStorage is currently in piggybank, one would think binding piggybank dependencies should happen in piggybank?

        I don't really want to push a bunch more unnecessary jars into the main jar when they aren't even required by anything in Pig proper.

        I know, I know, HBaseStorage. That was a mistake.

        Show
        Dmitriy V. Ryaboy added a comment - AvroStorage is currently in piggybank, one would think binding piggybank dependencies should happen in piggybank? I don't really want to push a bunch more unnecessary jars into the main jar when they aren't even required by anything in Pig proper. I know, I know, HBaseStorage. That was a mistake.
        Hide
        Tom White added a comment -

        Thanks for the review, Daniel.

        > 1. Pig don't automatically ship all classes in pig-withouthadoop.jar

        Ah, I didn't realize this. So the original patch is not the correct fix.

        > Further, these jars are not even in Pig distribution. They are ivy dependencies and will only be retrieved during compilation. My thinking is we need to bundle some popular jars (hbase.jar, avro.jar, etc) in lib so user knows where to find it when needed.

        I've attached a new patch to do this for AvroStorage, so users don't need to find the JARs themselves (this was the problem I was trying to solve).

        > Ideally Pig should be smart enough to ship jars when needed (as we do for jython.jar)

        This would be a nice extension.

        Show
        Tom White added a comment - Thanks for the review, Daniel. > 1. Pig don't automatically ship all classes in pig-withouthadoop.jar Ah, I didn't realize this. So the original patch is not the correct fix. > Further, these jars are not even in Pig distribution. They are ivy dependencies and will only be retrieved during compilation. My thinking is we need to bundle some popular jars (hbase.jar, avro.jar, etc) in lib so user knows where to find it when needed. I've attached a new patch to do this for AvroStorage, so users don't need to find the JARs themselves (this was the problem I was trying to solve). > Ideally Pig should be smart enough to ship jars when needed (as we do for jython.jar) This would be a nice extension.
        Hide
        Daniel Dai added a comment -

        There are a couple issues with this approach, actually most of issues are not specific to AvroStorage, it is how we deal with UDF dependent jars:

        1. Pig don't automatically ship all classes in pig-withouthadoop.jar
        We also need to make code change in JarManager.jar to denote the package to ship. Putting a jar into pig-withouthadoop.jar alone is equal to put this jar in classpath. This mechanism confusing and we shall stop putting more jars into pig-withouthadoop.jar

        2. Conflict with hadoop bundled jars
        Hadoop 20.204 bundles jackson-1.0.1, which is too old for AvroLoader. In frontend, we can force hadoop take our jackson-1.7.3 by setting flag HADOOP_USER_CLASSPATH_FIRST=true. But in the backend, seems hadoop always pick bundled jackson-1.0.1, which results a job failure.

        3. Do we need to bundle piggybank dependent jars?
        We don't even bundle hbase.jar though HbaseLoader is in builtin. Further, these jars are not even in Pig distribution. They are ivy dependencies and will only be retrieved during compilation. My thinking is we need to bundle some popular jars (hbase.jar, avro.jar, etc) in lib so user knows where to find it when needed. But we don't want to ship all those jars to the backend. Ideally Pig should be smart enough to ship jars when needed (as we do for jython.jar)

        Show
        Daniel Dai added a comment - There are a couple issues with this approach, actually most of issues are not specific to AvroStorage, it is how we deal with UDF dependent jars: 1. Pig don't automatically ship all classes in pig-withouthadoop.jar We also need to make code change in JarManager.jar to denote the package to ship. Putting a jar into pig-withouthadoop.jar alone is equal to put this jar in classpath. This mechanism confusing and we shall stop putting more jars into pig-withouthadoop.jar 2. Conflict with hadoop bundled jars Hadoop 20.204 bundles jackson-1.0.1, which is too old for AvroLoader. In frontend, we can force hadoop take our jackson-1.7.3 by setting flag HADOOP_USER_CLASSPATH_FIRST=true. But in the backend, seems hadoop always pick bundled jackson-1.0.1, which results a job failure. 3. Do we need to bundle piggybank dependent jars? We don't even bundle hbase.jar though HbaseLoader is in builtin. Further, these jars are not even in Pig distribution. They are ivy dependencies and will only be retrieved during compilation. My thinking is we need to bundle some popular jars (hbase.jar, avro.jar, etc) in lib so user knows where to find it when needed. But we don't want to ship all those jars to the backend. Ideally Pig should be smart enough to ship jars when needed (as we do for jython.jar)
        Hide
        Tom White added a comment -

        This patch fixes the the problem.

        Show
        Tom White added a comment - This patch fixes the the problem.

          People

          • Assignee:
            Unassigned
            Reporter:
            Tom White
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development