TensorFlowOnSpark (TFoS) was released at github for distributed TensorFlow training and inference on Apache Spark clusters. TFoS is designed to:
- Easily migrate all existing TensorFlow programs with minimum code change;
- Support all TensorFlow functionalities: synchronous/asynchronous training, model/data parallelism, inference and TensorBoard;
- Easily integrate with your existing data processing pipelines (ex. Spark SQL) and machine learning algorithms (ex. MLlib);
- Be easily deployed on cloud or on-premise: CPU & GPU, Ethernet and Infiniband.
We propose to merge TFoS into Apache Spark as a scalable deep learning library to:
- Make deep learning easy for Apache Spark community: Familiar pipeline API for training and inference; Enable TensorFlow training/inference on existing Spark clusters.
- Further simplify data scientist experience: Ensure compatibility b/w Apache Spark and TFoS; Reduce steps for installation.
- Help Apache Spark evolutions on deep learning: Establish a design pattern for additional frameworks (ex. Caffe, CNTK); Structured streaming for DL training/inference.