Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6246

spark-ec2 can't handle clusters with > 100 nodes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.3.0
    • 1.5.0
    • EC2
    • None

    Description

      This appears to be a new restriction, perhaps resulting from our upgrade of boto. Maybe it's a new restriction from EC2. Not sure yet.

      We didn't have this issue around the Spark 1.1.0 time frame from what I can remember. I'll track down where the issue is and when it started.

      Attempting to launch a cluster with 100 slaves yields the following:

      Spark AMI: ami-35b1885c
      Launching instances...
      Launched 100 slaves in us-east-1c, regid = r-9c408776
      Launched master in us-east-1c, regid = r-92408778
      Waiting for AWS to propagate instance metadata...
      Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
      ERROR:boto:<?xml version="1.0" encoding="UTF-8"?>
      <Response><Errors><Error><Code>InvalidRequest</Code><Message>101 exceeds the maximum number of instance IDs that can be specificied (100). Please specify fewer than 100 instance IDs.</Message></Error></Errors><RequestID>217fd6ff-9afa-4e91-86bc-ab16fcc442d8</RequestID></Response>
      Traceback (most recent call last):
        File "./ec2/spark_ec2.py", line 1338, in <module>
          main()
        File "./ec2/spark_ec2.py", line 1330, in main
          real_main()
        File "./ec2/spark_ec2.py", line 1170, in real_main
          cluster_state='ssh-ready'
        File "./ec2/spark_ec2.py", line 795, in wait_for_cluster_state
          statuses = conn.get_all_instance_status(instance_ids=[i.id for i in cluster_instances])
        File "/path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py", line 737, in get_all_instance_status
          InstanceStatusSet, verb='POST')
        File "/path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py", line 1204, in get_object
          raise self.ResponseError(response.status, response.reason, body)
      boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
      <?xml version="1.0" encoding="UTF-8"?>
      <Response><Errors><Error><Code>InvalidRequest</Code><Message>101 exceeds the maximum number of instance IDs that can be specificied (100). Please specify fewer than 100 instance IDs.</Message></Error></Errors><RequestID>217fd6ff-9afa-4e91-86bc-ab16fcc442d8</RequestID></Response>
      

      This problem seems to be with get_all_instance_status(), though I am not sure if other methods are affected too.

      Attachments

        Activity

          People

            alyaxey Alex Slusarenko
            nchammas Nicholas Chammas
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: