Pig
  1. Pig
  2. PIG-2433

Jython import module not working if module path is in classpath

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.12.0
    • Component/s: impl
    • Labels:
      None

      Description

      This is a hole of PIG-1824. If the path of python module is in classpath, job die with the message could not instantiate 'org.apache.pig.scripting.jython.JythonFunction'.

      Here is my observation:
      If the path of python module is in classpath, fileEntry we got in JythonScriptEngine:236 is _pyclasspath_/script$py.class instead of the script itself. Thus we cannot locate the script and skip the script in job.xml.

      For example:

      register 'scriptB.py' using org.apache.pig.scripting.jython.JythonScriptEngine as pig
      
      A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
      B = foreach A generate pig.square(a0);
      
      dump B;
      
      scriptB.py:
      
      #!/usr/bin/python
      import scriptA
      @outputSchema("x:{t:(num:double)}")
      def sqrt(number):
       return (number ** .5)
      @outputSchema("x:{t:(num:long)}")
      def square(number):
       return long(scriptA.square(number))
      
      scriptA.py:
      
      #!/usr/bin/python
      def square(number):
       return (number * number)
      

      When we register scriptB.py, we use jython library to figure out the dependent modules scriptB relies on, in this case, scriptA. However, if current directory is in classpath, instead of scriptA.py, we get _pyclasspath/scriptA.class. Then we try to put __pyclasspath/script$py.class into job.jar, Pig complains __pyclasspath_/script$py.class does not exist.

      This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 20.x, the test still success because MiniCluster will take local classpath so it can still find scriptA.py even if it is not in job.jar. However, the script will fail in real cluster and MiniMRYarnCluster of hadoop 23.

      1. bad.log
        155 kB
        Cheolsoo Park
      2. good.log
        202 kB
        Cheolsoo Park
      3. PIG-2433.patch
        11 kB
        Rohini Palaniswamy
      4. PIG-2433-1.patch
        13 kB
        Rohini Palaniswamy
      5. TEST-org.apache.pig.test.TestScriptUDF.txt
        331 kB
        Cheolsoo Park

        Issue Links

          Activity

          Daniel Dai created issue -
          Daniel Dai made changes -
          Field Original Value New Value
          Link This issue relates to PIG-1824 [ PIG-1824 ]
          Daniel Dai made changes -
          Link This issue is related to PIG-2347 [ PIG-2347 ]
          Rohini Palaniswamy made changes -
          Attachment PIG-2433.patch [ 12550504 ]
          Rohini Palaniswamy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Rohini Palaniswamy [ rohini ]
          Fix Version/s 0.12 [ 12323380 ]
          Rohini Palaniswamy made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Rohini Palaniswamy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Cheolsoo Park made changes -
          Cheolsoo Park made changes -
          Attachment bad.log [ 12563609 ]
          Attachment good.log [ 12563610 ]
          Rohini Palaniswamy made changes -
          Attachment PIG-2433-1.patch [ 12563658 ]
          Rohini Palaniswamy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Rohini Palaniswamy
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development