Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-876

slider destroy instance fail UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0

    Details

    • Type: Bug
    • Status: Reopened
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: Slider 0.61, Slider 0.70
    • Fix Version/s: None
    • Component/s: process
    • Labels:
    • Environment:

      ubuntu 14.04, hadoop 2.6.0, python 2.7, slider 0.6.1, zookeeper 3.4.6

      Description

      Hello,

      i install 0.61.0 slider

      after install application MEMCACHED for test slider under hadoop 2.6.0
      on hdfs

      i can create instance cl1 on hdfs
      i can start
      i can stop

      but when i destroy
      ./bin/slider destroy cl1
      i get exception python on utf8 (i have 2.7) and i must break destroy because it is hanging for ever. check log below

      after this fail, the cl1 instance is removed from cluster, if try to launch again, i get exit code 73 application in use on start/stop/destroy . i dont see any process actif.

      i test also with the version 0.70.1 slider, i get same error. i saw pyhton 2.6
      seem required by slider, but i use about 10 component in my ecosystem under my cluster all are working. hadoop 2.6.0 seem support python 2.7.

      Thanks for information and solution if possible, i consider slider interessing.
      for application deployment with hdfs

      i dont see any jira about this issue

      log below
      hduser@stargate:/usr/local/slider$ ./bin/slider destroy cl1
      2015-05-15 01:33:08,218 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
      2015-05-15 01:33:08,603 [main] INFO zk.BlockingZKWatcher - waiting for ZK event
      2015-05-15 01:33:08,607 [main-SendThread(stargate:2180)] WARN zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
      Exception in thread Thread-2:
      Traceback (most recent call last):
      File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
      self.run()
      File "/usr/lib/python2.7/threading.py", line 763, in run
      self._target(*self.args, **self._kwargs)
      File "/usr/local/slider-0.61.0-incubating/bin/slider.py", line 168, in print_output
      (line, done) = read(src, line)
      File "/usr/local/slider-0.61.0-incubating/bin/slider.py", line 146, in read
      o = c.decode('utf-8')
      File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
      return codecs.utf_8_decode(input, errors, True)
      UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0: unexpected end of data

      1. screenshot-1.png
        108 kB
        JP Bordenave
      2. screenshot-2-SliderInRunningMode.png
        189 kB
        JP Bordenave
      3. screenshot-AllNodeActive.png
        189 kB
        JP Bordenave

        Activity

        Hide
        yanghaogn Yang Hao added a comment -

        SLIDER-737 is fixed in Slider-0.70, so the user's enviroment may not have the feature

        Show
        yanghaogn Yang Hao added a comment - SLIDER-737 is fixed in Slider-0.70, so the user's enviroment may not have the feature
        Hide
        yanghaogn Yang Hao added a comment -

        The env has been set at SLIDER-737.
        The problem should be reproduced to help solve the decode error?

        Show
        yanghaogn Yang Hao added a comment - The env has been set at SLIDER-737 . The problem should be reproduced to help solve the decode error?
        Hide
        oarmand Olivier Armand added a comment -

        Steve, I'm having the same issue also with a fr_FR.UTF-8 OS.
        "Connection refused" is "Connexion refusée" in French, the é is probably the mis-interpreted character.

        Show
        oarmand Olivier Armand added a comment - Steve, I'm having the same issue also with a fr_FR.UTF-8 OS. "Connection refused" is "Connexion refusée" in French, the é is probably the mis-interpreted character.
        Hide
        stevel@apache.org Steve Loughran added a comment -

        I'm glad it's fixed —and I did think that ZK was the root cause— that doesn't mean that you should have got a python parse error. It means something came out of the spawned JVM which came in as a character encoding, probably UTF8, which couldn't be decoded into UTF8.

        so while you had a minor config problem, you found a real bug in the python launcher script.

        for that reason I'm going to re-open it

        Show
        stevel@apache.org Steve Loughran added a comment - I'm glad it's fixed —and I did think that ZK was the root cause— that doesn't mean that you should have got a python parse error. It means something came out of the spawned JVM which came in as a character encoding, probably UTF8, which couldn't be decoded into UTF8. so while you had a minor config problem, you found a real bug in the python launcher script. for that reason I'm going to re-open it
        Hide
        jpbordi JP Bordenave added a comment -

        issue solved, because error port number
        set 2180 instead of 2181 in slider-client.xml
        KR
        JP

        Show
        jpbordi JP Bordenave added a comment - issue solved, because error port number set 2180 instead of 2181 in slider-client.xml KR JP
        Hide
        jpbordi JP Bordenave added a comment - - edited

        Hello,

        i am very sorry, now the issue is solve, i understand in wrong way the context of error with utf8

        it just a problem wrong port number for zookeeper in slider-client.xml
        me bad

        destroy is working, but i don't understand why it not fail before.

        hduser@stargate:/usr/local/slider$ ./bin/slider destroy cl1
        2015-05-15 17:53:33,772 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
        2015-05-15 17:53:34,074 [main] INFO zk.BlockingZKWatcher - waiting for ZK event
        2015-05-15 17:53:34,183 [main-EventThread] INFO zk.BlockingZKWatcher - ZK binding callback received
        2015-05-15 17:53:35,484 [main] INFO client.SliderClient - Destroyed cluster cl1
        2015-05-15 17:53:35,487 [main] INFO util.ExitUtil - Exiting with status 0

        Thank for your support

        issue is solved and i close
        KR
        JP

        Show
        jpbordi JP Bordenave added a comment - - edited Hello, i am very sorry, now the issue is solve, i understand in wrong way the context of error with utf8 it just a problem wrong port number for zookeeper in slider-client.xml me bad destroy is working, but i don't understand why it not fail before. hduser@stargate:/usr/local/slider$ ./bin/slider destroy cl1 2015-05-15 17:53:33,772 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032 2015-05-15 17:53:34,074 [main] INFO zk.BlockingZKWatcher - waiting for ZK event 2015-05-15 17:53:34,183 [main-EventThread] INFO zk.BlockingZKWatcher - ZK binding callback received 2015-05-15 17:53:35,484 [main] INFO client.SliderClient - Destroyed cluster cl1 2015-05-15 17:53:35,487 [main] INFO util.ExitUtil - Exiting with status 0 Thank for your support issue is solved and i close KR JP
        Hide
        jpbordi JP Bordenave added a comment -

        i was restarted my nodes

        but it get same issue, and all my node active, was not this problem, i return to investigate zookeeper
        et slider configuration

        hduser@stargate:/usr/local/slider$ ./bin/slider create cl1 --template appConfig.json --resources resources.json
        2015-05-15 17:28:06,718 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
        2015-05-15 17:28:07,325 [main] INFO agent.AgentClientProvider - Validating app definition .slider/package/MEMCACHED/memcached.zip
        2015-05-15 17:28:07,325 [main] INFO agent.AgentUtils - Reading metainfo at .slider/package/MEMCACHED/memcached.zip
        2015-05-15 17:28:07,460 [main] INFO tools.SliderUtils - Reading metainfo.xml of size 2029
        2015-05-15 17:28:07,561 [main] ERROR tools.CoreFileSystem - Dir hdfs://stargate:9000/user/hduser/.slider/cluster/cl1 exists: hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/app_config.json 854
        hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/confdir 0
        hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/database 0
        hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/generated 0
        hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/history 0
        hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/internal.json 1257
        hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/resources.json 401
        hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/snapshot 0
        hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/tmp 0
        2015-05-15 17:28:07,562 [main] ERROR main.ServiceLauncher - Application Instance dir already exists: hdfs://stargate:9000/user/hduser/.slider/cluster/cl1
        2015-05-15 17:28:07,563 [main] INFO util.ExitUtil - Exiting with status 75

        hduser@stargate:/usr/local/slider$ ./bin/slider start cl1

        2015-05-15 17:28:16,171 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
        2015-05-15 17:28:16,886 [main] INFO agent.AgentUtils - Reading metainfo at .slider/package/MEMCACHED/memcached.zip
        2015-05-15 17:28:16,903 [main] INFO tools.SliderUtils - Reading metainfo.xml of size 2029
        2015-05-15 17:28:17,016 [main] INFO launch.AbstractLauncher - Log include patterns:
        2015-05-15 17:28:17,016 [main] INFO launch.AbstractLauncher - Log exclude patterns:
        2015-05-15 17:28:18,274 [main] INFO slideram.SliderAMClientProvider - Loading all dependencies for AM.
        2015-05-15 17:28:18,275 [main] INFO tools.SliderUtils - Loading all dependencies from /usr/local/slider-0.61.0-incubating/lib
        2015-05-15 17:28:23,544 [main] INFO agent.AgentClientProvider - Automatically uploading the agent tarball at hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/tmp/application_1431703599364_0001/agent
        2015-05-15 17:28:23,643 [main] INFO agent.AgentClientProvider - Validating app definition .slider/package/MEMCACHED/memcached.zip
        2015-05-15 17:28:23,643 [main] INFO agent.AgentUtils - Reading metainfo at .slider/package/MEMCACHED/memcached.zip
        2015-05-15 17:28:23,647 [main] INFO tools.SliderUtils - Reading metainfo.xml of size 2029
        2015-05-15 17:28:23,656 [main] INFO Configuration.deprecation - slider.registry.path is deprecated. Instead, use hadoop.registry.zk.root
        2015-05-15 17:28:23,658 [main] INFO launch.AppMasterLauncher - Submitting application to Resource Manager
        2015-05-15 17:28:23,846 [main] INFO impl.YarnClientImpl - Submitted application application_1431703599364_0001
        2015-05-15 17:28:23,848 [main] INFO util.ExitUtil - Exiting with status 0

        hduser@stargate:/usr/local/slider$ ./bin/slider stop cl1

        2015-05-15 17:28:32,385 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
        2015-05-15 17:28:32,693 [main] INFO client.SliderClient - Cluster cl1 is in a pre-running state ACCEPTED. Force killing it
        2015-05-15 17:28:32,694 [main] INFO client.SliderYarnClientImpl - Killing application 1431703599364 - Forced stop of cl1: stop command issued
        2015-05-15 17:28:32,706 [main] INFO util.ExitUtil - Exiting with status 0

        hduser@stargate:/usr/local/slider$ ./bin/slider destroy cl1
        2015-05-15 17:28:49,170 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
        2015-05-15 17:28:49,501 [main] INFO zk.BlockingZKWatcher - waiting for ZK event
        2015-05-15 17:28:49,505 [main-SendThread(stargate:2180)] WARN zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
        Exception in thread Thread-2:
        Traceback (most recent call last):
        File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
        self.run()
        File "/usr/lib/python2.7/threading.py", line 763, in run
        self._target(*self.args, **self._kwargs)
        File "/usr/local/slider-0.61.0-incubating/bin/slider.py", line 168, in print_output
        (line, done) = read(src, line)
        File "/usr/local/slider-0.61.0-incubating/bin/slider.py", line 146, in read
        o = c.decode('utf-8')
        File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
        UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0: unexpected end of data

        Show
        jpbordi JP Bordenave added a comment - i was restarted my nodes but it get same issue, and all my node active, was not this problem, i return to investigate zookeeper et slider configuration hduser@stargate:/usr/local/slider$ ./bin/slider create cl1 --template appConfig.json --resources resources.json 2015-05-15 17:28:06,718 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032 2015-05-15 17:28:07,325 [main] INFO agent.AgentClientProvider - Validating app definition .slider/package/MEMCACHED/memcached.zip 2015-05-15 17:28:07,325 [main] INFO agent.AgentUtils - Reading metainfo at .slider/package/MEMCACHED/memcached.zip 2015-05-15 17:28:07,460 [main] INFO tools.SliderUtils - Reading metainfo.xml of size 2029 2015-05-15 17:28:07,561 [main] ERROR tools.CoreFileSystem - Dir hdfs://stargate:9000/user/hduser/.slider/cluster/cl1 exists: hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/app_config.json 854 hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/confdir 0 hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/database 0 hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/generated 0 hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/history 0 hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/internal.json 1257 hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/resources.json 401 hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/snapshot 0 hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/tmp 0 2015-05-15 17:28:07,562 [main] ERROR main.ServiceLauncher - Application Instance dir already exists: hdfs://stargate:9000/user/hduser/.slider/cluster/cl1 2015-05-15 17:28:07,563 [main] INFO util.ExitUtil - Exiting with status 75 hduser@stargate:/usr/local/slider$ ./bin/slider start cl1 2015-05-15 17:28:16,171 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032 2015-05-15 17:28:16,886 [main] INFO agent.AgentUtils - Reading metainfo at .slider/package/MEMCACHED/memcached.zip 2015-05-15 17:28:16,903 [main] INFO tools.SliderUtils - Reading metainfo.xml of size 2029 2015-05-15 17:28:17,016 [main] INFO launch.AbstractLauncher - Log include patterns: 2015-05-15 17:28:17,016 [main] INFO launch.AbstractLauncher - Log exclude patterns: 2015-05-15 17:28:18,274 [main] INFO slideram.SliderAMClientProvider - Loading all dependencies for AM. 2015-05-15 17:28:18,275 [main] INFO tools.SliderUtils - Loading all dependencies from /usr/local/slider-0.61.0-incubating/lib 2015-05-15 17:28:23,544 [main] INFO agent.AgentClientProvider - Automatically uploading the agent tarball at hdfs://stargate:9000/user/hduser/.slider/cluster/cl1/tmp/application_1431703599364_0001/agent 2015-05-15 17:28:23,643 [main] INFO agent.AgentClientProvider - Validating app definition .slider/package/MEMCACHED/memcached.zip 2015-05-15 17:28:23,643 [main] INFO agent.AgentUtils - Reading metainfo at .slider/package/MEMCACHED/memcached.zip 2015-05-15 17:28:23,647 [main] INFO tools.SliderUtils - Reading metainfo.xml of size 2029 2015-05-15 17:28:23,656 [main] INFO Configuration.deprecation - slider.registry.path is deprecated. Instead, use hadoop.registry.zk.root 2015-05-15 17:28:23,658 [main] INFO launch.AppMasterLauncher - Submitting application to Resource Manager 2015-05-15 17:28:23,846 [main] INFO impl.YarnClientImpl - Submitted application application_1431703599364_0001 2015-05-15 17:28:23,848 [main] INFO util.ExitUtil - Exiting with status 0 hduser@stargate:/usr/local/slider$ ./bin/slider stop cl1 2015-05-15 17:28:32,385 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032 2015-05-15 17:28:32,693 [main] INFO client.SliderClient - Cluster cl1 is in a pre-running state ACCEPTED. Force killing it 2015-05-15 17:28:32,694 [main] INFO client.SliderYarnClientImpl - Killing application 1431703599364 - Forced stop of cl1: stop command issued 2015-05-15 17:28:32,706 [main] INFO util.ExitUtil - Exiting with status 0 hduser@stargate:/usr/local/slider$ ./bin/slider destroy cl1 2015-05-15 17:28:49,170 [main] INFO client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032 2015-05-15 17:28:49,501 [main] INFO zk.BlockingZKWatcher - waiting for ZK event 2015-05-15 17:28:49,505 [main-SendThread(stargate:2180)] WARN zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self._ target(*self. args, **self. _kwargs) File "/usr/local/slider-0.61.0-incubating/bin/slider.py", line 168, in print_output (line, done) = read(src, line) File "/usr/local/slider-0.61.0-incubating/bin/slider.py", line 146, in read o = c.decode('utf-8') File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0: unexpected end of data
        Hide
        jpbordi JP Bordenave added a comment -

        Hello,

        ok i must check the status of my cluster/nodes on port 50070, i have 3 node alive but when i go application:8088, it tell me 1 node lost and 2 alive, and it is the one where i start job on slider,
        may be it is origin of issue, i investigate on it.
        thx

        KR
        JP

        Show
        jpbordi JP Bordenave added a comment - Hello, ok i must check the status of my cluster/nodes on port 50070, i have 3 node alive but when i go application:8088, it tell me 1 node lost and 2 alive, and it is the one where i start job on slider, may be it is origin of issue, i investigate on it. thx KR JP
        Hide
        jpbordi JP Bordenave added a comment -

        ok i set my LC_CTYPE and LC_ALL
        fr_FR.UTF-8

        i get same issue, i check my configuration zk, may be i miss something for slider

        KR
        JP

        Show
        jpbordi JP Bordenave added a comment - ok i set my LC_CTYPE and LC_ALL fr_FR.UTF-8 i get same issue, i check my configuration zk, may be i miss something for slider KR JP
        Hide
        jpbordi JP Bordenave added a comment - - edited

        Hello,

        thanks for your answer

        from your request,

        hduser@stargate:~/falcon$ echo $LANG
        fr_FR.UTF-8

        my LC_CTYPE is empty, i must check why
        hduser@stargate:/usr/local/slider$ echo $LC_CTYPE

        my cluster is in activity with lot of other component for other test, i am in multi node context, with hbase,accumulo,storm,spark, pig, map reduce, tez, zookeeper.
        start/stop multinode and cluster work fine until now.

        my cluster demo, is ubuntu 14.04 1 pc master namenode 24GB ram/1 TB +2 pc slave datanode/16 GB ram by node, 512 DD GB ram

        slider are my last module for finish my ecosystem hadoop demo

        i continu search what upon with my configuration and check process
        KR
        JP

        Show
        jpbordi JP Bordenave added a comment - - edited Hello, thanks for your answer from your request, hduser@stargate:~/falcon$ echo $LANG fr_FR.UTF-8 my LC_CTYPE is empty, i must check why hduser@stargate:/usr/local/slider$ echo $LC_CTYPE my cluster is in activity with lot of other component for other test, i am in multi node context, with hbase,accumulo,storm,spark, pig, map reduce, tez, zookeeper. start/stop multinode and cluster work fine until now. my cluster demo, is ubuntu 14.04 1 pc master namenode 24GB ram/1 TB +2 pc slave datanode/16 GB ram by node, 512 DD GB ram slider are my last module for finish my ecosystem hadoop demo i continu search what upon with my configuration and check process KR JP
        Hide
        stevel@apache.org Steve Loughran added a comment -

        JP: If this is a Linux system, could you also go:

        echo $LC_CTYPE
        

        This will tell us the locale that python is running as

        Show
        stevel@apache.org Steve Loughran added a comment - JP: If this is a Linux system, could you also go: echo $LC_CTYPE This will tell us the locale that python is running as
        Hide
        stevel@apache.org Steve Loughran added a comment -

        (looking at the .py code, it's happening while reading input:

         c = pipe.read(1)
          if c != "":
            o = c.decode('utf-8')    ** HERE **
        

        presumably the data being read in (from the child process) is not in UTF-8 form, so the attempt to decode it is failing. How are we going to handle this? Work out the language from the environment variables?

        Show
        stevel@apache.org Steve Loughran added a comment - (looking at the .py code, it's happening while reading input: c = pipe.read(1) if c != "": o = c.decode('utf-8') ** HERE ** presumably the data being read in (from the child process) is not in UTF-8 form, so the attempt to decode it is failing. How are we going to handle this? Work out the language from the environment variables?
        Hide
        stevel@apache.org Steve Loughran added a comment -

        This is interesting, looks like >1 problem is arising.

        1. the unicode message looks related to this; it implies that something isn't working with the slider.py script on your system.

        can you do an echo $LANG to tell us what language your system thinks it is?

        2. The last Log4J report was zookeeper-related; it says it was trying to reconnect but failing. I wonder if something is up with ZK.

        If you try this again and it hangs, can you do a kill -QUIT against the slider process (which jps -v will help you find), so we can get a stack trace of where the client is hanging.

        3. Finally, if you look at our exit codes doc you can see our list of errors. #73 is "application in use". Have a look at the YARN Resource manager web UI —is it still listed as running?

        Show
        stevel@apache.org Steve Loughran added a comment - This is interesting, looks like >1 problem is arising. 1. the unicode message looks related to this ; it implies that something isn't working with the slider.py script on your system. can you do an echo $LANG to tell us what language your system thinks it is? 2. The last Log4J report was zookeeper-related; it says it was trying to reconnect but failing. I wonder if something is up with ZK. If you try this again and it hangs, can you do a kill -QUIT against the slider process (which jps -v will help you find), so we can get a stack trace of where the client is hanging. 3. Finally, if you look at our exit codes doc you can see our list of errors. #73 is "application in use". Have a look at the YARN Resource manager web UI —is it still listed as running?

          People

          • Assignee:
            Unassigned
            Reporter:
            jpbordi JP Bordenave
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development