Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6646

Spark 2.0: Rearchitecting Spark for Mobile Platforms

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Project Infra
    • Labels:
      None
    • Target Version/s:

      Description

      Mobile computing is quickly rising to dominance, and by the end of 2017, it is estimated that 90% of CPU cycles will be devoted to mobile hardware. Spark’s project goal can be accomplished only when Spark runs efficiently for the growing population of mobile users.

      Designed and optimized for modern data centers and Big Data applications, Spark is unfortunately not a good fit for mobile computing today. In the past few months, we have been prototyping the feasibility of a mobile-first Spark architecture, and today we would like to share with you our findings. This ticket outlines the technical design of Spark’s mobile support, and shares results from several early prototypes.

      Mobile friendly version of the design doc: https://databricks.com/blog/2015/04/01/spark-2-rearchitecting-spark-for-mobile.html

        Activity

        Hide
        yuecong1104 Cong Yue added a comment -

        Very cool idea. Current smartphone has much better performance than the servers 5-8 years ago.
        But in mobile networks, the data transferring speed between nodes can not be as stable as servers.
        So parallel computing can have the benefits from CPUs, but the bottleneck will be in the mobile networks.

        Show
        yuecong1104 Cong Yue added a comment - Very cool idea. Current smartphone has much better performance than the servers 5-8 years ago. But in mobile networks, the data transferring speed between nodes can not be as stable as servers. So parallel computing can have the benefits from CPUs, but the bottleneck will be in the mobile networks.
        Hide
        sandyr Sandy Ryza added a comment -

        This seems like a good opportunity to finally add a DataFrame registerTempTablet API.

        Show
        sandyr Sandy Ryza added a comment - This seems like a good opportunity to finally add a DataFrame registerTempTablet API.
        Hide
        rxin Reynold Xin added a comment -

        Sandy Ryza That's an excellent idea. I haven't thought of that yet. But now I think about it, there will be a lot of room for optimizations using DataFrame on tablets.

        Show
        rxin Reynold Xin added a comment - Sandy Ryza That's an excellent idea. I haven't thought of that yet. But now I think about it, there will be a lot of room for optimizations using DataFrame on tablets.
        Hide
        yuu.ishikawa@gmail.com Yu Ishikawa added a comment -

        That sounds very interesting! We should support a deploying function a trained machine learning model to smartphone.

        Show
        yuu.ishikawa@gmail.com Yu Ishikawa added a comment - That sounds very interesting! We should support a deploying function a trained machine learning model to smartphone.
        Hide
        tdas Tathagata Das added a comment -

        I have been working on running NetworkWordcount on our IPhone prototype, and I was pleasantly surprised with the performance I was getting. The network bandwidth is definitely less, and there is a higher cost of shuffling data, but its still quite good. Though the task launch latencies are higher, so streaming applications will require slightly higher batch sizes. But overall you will be surprised. I will post numbers when I can compile them in graphs.

        Show
        tdas Tathagata Das added a comment - I have been working on running NetworkWordcount on our IPhone prototype, and I was pleasantly surprised with the performance I was getting. The network bandwidth is definitely less, and there is a higher cost of shuffling data, but its still quite good. Though the task launch latencies are higher, so streaming applications will require slightly higher batch sizes. But overall you will be surprised. I will post numbers when I can compile them in graphs.
        Hide
        rahulkumar-aws Rahul Kumar added a comment -

        Love this idea, what about "private cloud in pocket" store data on smart phone, do processing on it, small mobile based web server that power cool visualization reports. Lot of time our smart phones are idle we can share resources 4 GB RAM, quadcore processer, LTE network not bad for a single node in cluster.

        Show
        rahulkumar-aws Rahul Kumar added a comment - Love this idea, what about "private cloud in pocket" store data on smart phone, do processing on it, small mobile based web server that power cool visualization reports. Lot of time our smart phones are idle we can share resources 4 GB RAM, quadcore processer, LTE network not bad for a single node in cluster.
        Hide
        freeman-lab Jeremy Freeman added a comment -

        Very promising Tathagata Das! We should evaluate the performance of streaming machine learning algorithms. In general I think running Spark in javascript via scala.js and node.js is extremely appealing, will make integration with visualization very straightforward.

        Show
        freeman-lab Jeremy Freeman added a comment - Very promising Tathagata Das ! We should evaluate the performance of streaming machine learning algorithms. In general I think running Spark in javascript via scala.js and node.js is extremely appealing, will make integration with visualization very straightforward.
        Hide
        srowen Sean Owen added a comment -

        Concept: smartphone app that lets you find the nearest Spark cluster to join. Swipe left/right on photos from the worker nodes to indicate which ones you want to join. Only problem is this must be called SparkR to be taken seriously, so think it will have to be rolled into the R library.

        Show
        srowen Sean Owen added a comment - Concept: smartphone app that lets you find the nearest Spark cluster to join. Swipe left/right on photos from the worker nodes to indicate which ones you want to join. Only problem is this must be called SparkR to be taken seriously, so think it will have to be rolled into the R library.
        Hide
        sandyr Sandy Ryza added a comment -

        Sean Owen I like the way you think. I know a lot of good nodes out there looking for love or at least a casual shutdown hookup.

        Show
        sandyr Sandy Ryza added a comment - Sean Owen I like the way you think. I know a lot of good nodes out there looking for love or at least a casual shutdown hookup.
        Hide
        ilikerps Aaron Davidson added a comment -

        Please help, I tried putting spark on iphone but it ignited and now no phone.

        Show
        ilikerps Aaron Davidson added a comment - Please help, I tried putting spark on iphone but it ignited and now no phone.
        Hide
        pzecevic Petar Zecevic added a comment -

        Good one

        Show
        pzecevic Petar Zecevic added a comment - Good one
        Hide
        kamalbanga Kamal Banga added a comment -

        We want Spark for Apple Watch. That will be the real breakthrough!

        Show
        kamalbanga Kamal Banga added a comment - We want Spark for Apple Watch. That will be the real breakthrough!
        Hide
        CodingCat Nan Zhu added a comment -

        super cool, Spark enables Bigger than Bigger Data in mobile phones

        Show
        CodingCat Nan Zhu added a comment - super cool, Spark enables Bigger than Bigger Data in mobile phones
        Hide
        stevel@apache.org Steve Loughran added a comment -

        Obviously the barrier will be data source access; talking to remote data is going to run up bills.

        1. couchdb has an offline mode, so its RDD/Dataframe support would allow spark-mobile to work in embedded mode.
        2. Hadoop 2.8 add hardware CRC on ARM parts for HDFS (HADOOP-11660). A MiniHDFSCluster could be instantiated locally to benefit from this.
        3. alternatively, mDNS could be used to discover and dynamically build up an HDFS cluster from nearby devices, MANET-style. The limited connectivity guarantees of moving devices means that a block size of <1536 bytes would be appropriate; probably 1KB blocks are safest.
        4. Those nodes on the network with limited CPU power but access to external power supplies, such as toasters and coffee machines, could have a role as the persistent co-ordinators of work and HDFS Namenodes, as well as being used as the preferred routers of wifi packets.
        5. It may be necessary to extend the hadoop s3:// filesystem with the notion of monthly data quotas. Possibly even roaming and non-roaming quotas. The S3 client would need to query the runtime to determine whether it was at home vs roaming & use the relevant quota. Apps could then set something like
          fs.s3.quota.home=15GB
          fs.s3.quota.roaming=2GB
          

          Dealing with use abroad would be more complex, as if a cost value were to be included, exchange rates would have to be dynamically assessed.

        6. It may be interesting consider the notion of having devices publish some of their data (photos, healthkit history, movement history) to other devices nearby. If one phone could enumerate those nearby *and submit work to them*, the bandwidth problems could be addressed.
        Show
        stevel@apache.org Steve Loughran added a comment - Obviously the barrier will be data source access; talking to remote data is going to run up bills. couchdb has an offline mode, so its RDD/Dataframe support would allow spark-mobile to work in embedded mode. Hadoop 2.8 add hardware CRC on ARM parts for HDFS ( HADOOP-11660 ). A MiniHDFSCluster could be instantiated locally to benefit from this. alternatively, mDNS could be used to discover and dynamically build up an HDFS cluster from nearby devices, MANET-style. The limited connectivity guarantees of moving devices means that a block size of <1536 bytes would be appropriate; probably 1KB blocks are safest. Those nodes on the network with limited CPU power but access to external power supplies, such as toasters and coffee machines, could have a role as the persistent co-ordinators of work and HDFS Namenodes, as well as being used as the preferred routers of wifi packets. It may be necessary to extend the hadoop s3:// filesystem with the notion of monthly data quotas. Possibly even roaming and non-roaming quotas. The S3 client would need to query the runtime to determine whether it was at home vs roaming & use the relevant quota. Apps could then set something like fs.s3.quota.home=15GB fs.s3.quota.roaming=2GB Dealing with use abroad would be more complex, as if a cost value were to be included, exchange rates would have to be dynamically assessed. It may be interesting consider the notion of having devices publish some of their data (photos, healthkit history, movement history) to other devices nearby. If one phone could enumerate those nearby * and submit work to them *, the bandwidth problems could be addressed.
        Hide
        sparks Evan Sparks added a comment -

        Guys - you're clearly ignoring prior work. The database community solved this problem 20 years ago with the Gubba project - a mature prototype can be seen here.

        Additionally, everyone knows that joins don't scale on iOS, and you'll never be able to build indexes on this platform.

        Show
        sparks Evan Sparks added a comment - Guys - you're clearly ignoring prior work. The database community solved this problem 20 years ago with the Gubba project - a mature prototype can be seen here . Additionally, everyone knows that joins don't scale on iOS, and you'll never be able to build indexes on this platform.
        Hide
        vinayshukla@gmail.com Vinay Shukla added a comment -

        This use case can benefit from running Spark inside a Mobile App Server. An App server that takes care of horizontal issues such as security, networking, etc will allow Spark to focus on the real hard problem of data processing in a lightening fast manner.

        There is another idea of using having Spark leverage parallel quantum computing but I suppose that calls for another JIRA.

        Show
        vinayshukla@gmail.com Vinay Shukla added a comment - This use case can benefit from running Spark inside a Mobile App Server. An App server that takes care of horizontal issues such as security, networking, etc will allow Spark to focus on the real hard problem of data processing in a lightening fast manner. There is another idea of using having Spark leverage parallel quantum computing but I suppose that calls for another JIRA.
        Hide
        deenar Deenar Toraskar added a comment -

        maybe Spark 2.0 should be branded i-Spark

        Show
        deenar Deenar Toraskar added a comment - maybe Spark 2.0 should be branded i-Spark
        Hide
        matei Matei Zaharia added a comment -

        Not to rain on the parade here, but I worry that focusing on mobile phones is short-sighted. Does this design present a path forward for the Internet of Things as well? You'd want something that runs on Arduino, Raspberry Pi, etc. We already have MQTT input in Spark Streaming so we could consider using MQTT to replace Netty for shuffle as well. Has anybody benchmarked that?

        Show
        matei Matei Zaharia added a comment - Not to rain on the parade here, but I worry that focusing on mobile phones is short-sighted. Does this design present a path forward for the Internet of Things as well? You'd want something that runs on Arduino, Raspberry Pi, etc. We already have MQTT input in Spark Streaming so we could consider using MQTT to replace Netty for shuffle as well. Has anybody benchmarked that?
        Hide
        tdas Tathagata Das added a comment -

        I vehemently disagree. I dont think we should choose names that subtly indicates Spark runs on IPhone only. That is frankly not true. We want to embrace all platforms without any bias.

        Show
        tdas Tathagata Das added a comment - I vehemently disagree. I dont think we should choose names that subtly indicates Spark runs on IPhone only. That is frankly not true. We want to embrace all platforms without any bias.
        Hide
        niviksha Venkat Krishnamurthy added a comment -

        I'm looking forward to the release that targets smart watches. It could have the pleasant side effect of making time stand still while executors crunch away in the background, obviating any need for performance tuning.

        Show
        niviksha Venkat Krishnamurthy added a comment - I'm looking forward to the release that targets smart watches. It could have the pleasant side effect of making time stand still while executors crunch away in the background, obviating any need for performance tuning.
        Hide
        srowen Sean Owen added a comment -

        I feel like people aren't taking this seriously. What do you think this is, some kind of joke?

        _OK can we resolve this one? _

        Show
        srowen Sean Owen added a comment - I feel like people aren't taking this seriously. What do you think this is, some kind of joke? _OK can we resolve this one? _
        Hide
        rxin Reynold Xin added a comment -

        Alright – given the size of the task, I am not sure if I have enough cycle to do it at the moment. Let's revisit next year April 1st.

        Show
        rxin Reynold Xin added a comment - Alright – given the size of the task, I am not sure if I have enough cycle to do it at the moment. Let's revisit next year April 1st.
        Hide
        mbonaci Marko Bonaci added a comment -

        Wait a minute, don't postpone this one just yet. Hardest problems often give the biggest yields.
        Other players in the space, spurred (and a bit frightened) by your announcement, already started acting.

        Nobody wants to be left behind, so strategies are being worked on:
        http://app.go.cloudera.com/e/es.aspx?s=1465054361&e=177939

        Cloudera Wearables tm

        Show
        mbonaci Marko Bonaci added a comment - Wait a minute, don't postpone this one just yet. Hardest problems often give the biggest yields. Other players in the space, spurred (and a bit frightened) by your announcement, already started acting. Nobody wants to be left behind, so strategies are being worked on: http://app.go.cloudera.com/e/es.aspx?s=1465054361&e=177939 Cloudera Wearables tm

          People

          • Assignee:
            rxin Reynold Xin
            Reporter:
            rxin Reynold Xin
          • Votes:
            9 Vote for this issue
            Watchers:
            38 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development