Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-955

Hadoop doesn't schedule the tasks close to the data

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 0.6.1
    • Component/s: None
    • Labels:
      None

      Description

      Hadoop relies on locations for data in input splits being represented as hostnames and not ip addresses. Currently in my testing tasks are more often then not being scheduled on a node that does not contain the data requested.

      1. CASSANDRA-955.patch
        1 kB
        Johan Oskarsson

        Activity

        Hide
        johanoskarsson Johan Oskarsson added a comment -

        Looks up the hostname from the ipaddress in order to have Hadoop schedule the tasks as close to the data as possible. This is done from where the Hadoop job is scheduled and not on Cassandra.

        Show
        johanoskarsson Johan Oskarsson added a comment - Looks up the hostname from the ipaddress in order to have Hadoop schedule the tasks as close to the data as possible. This is done from where the Hadoop job is scheduled and not on Cassandra.
        Hide
        jbellis Jonathan Ellis added a comment -

        +1

        Show
        jbellis Jonathan Ellis added a comment - +1
        Hide
        jbellis Jonathan Ellis added a comment -

        (can you commit to 0.6 and trunk?)

        Show
        jbellis Jonathan Ellis added a comment - (can you commit to 0.6 and trunk?)
        Hide
        johanoskarsson Johan Oskarsson added a comment -

        Committed to 0.6 branch and trunk.

        Show
        johanoskarsson Johan Oskarsson added a comment - Committed to 0.6 branch and trunk.
        Hide
        hudson Hudson added a comment -

        Integrated in Cassandra #400 (See http://hudson.zones.apache.org/hudson/job/Cassandra/400/)
        Provide correct locations so that Hadoop can schedule map tasks close to the data. Patch by johan, review by jbellis.

        Show
        hudson Hudson added a comment - Integrated in Cassandra #400 (See http://hudson.zones.apache.org/hudson/job/Cassandra/400/ ) Provide correct locations so that Hadoop can schedule map tasks close to the data. Patch by johan, review by jbellis.

          People

          • Assignee:
            johanoskarsson Johan Oskarsson
            Reporter:
            johanoskarsson Johan Oskarsson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development