Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-8670

Umbrella: TensorFlow integration

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Done
    • None
    • 2.7
    • ml
    • None

    Description

       

      What is the goal?

      TensorFlow on Apache Ignite should consists of three major components: Ignite Dataset that provides an ability to feed training data from Apache Ignite, IGFS Plugin that allows to use Apache Ignite File System for checkpointing and communication with TensorBoard, and Distributed Training that makes it possible to run model training instantly inside Apache Ignite cluster to minimize data transfers and provide so called Zero ETL.

       

      Ignite Dataset

      Ignite Dataset represents an integration between Apache Ignite and TensorFlow that allows to use Apache Ignite as a data source for neural network training, inference and all other computations supported by TensorFlow. Using of Ignite Dataset has a lot of advantages, just a few of them: TensorFlow gets a fast access to distributed database that can contain training data and data for inference; objects feeded by Ignite Dataset can have any structure thus all preprocessing can be done in TensorFlow pipeline; SSL, Windows and distributed training are also supported.

      For now Ignite Dataset is a part of TensorFlow, so you don’t need to install any third-party packages and you can use it out of the box. The integration is based on tf.data from TensorFlow side and Binary Client Protocol from Apache Ignite side.

       

      IGFS Plugin

      In addition to database functionality Apache Ignite provides a distributed file system called IGFS. IGFS delivers a similar functionality to Hadoop HDFS, but only in-memory. IGFS Plugin for TensorFlow allows to use IGFS for checkpointing (for reliability and fault-tolerance) and for communication with TensorBoard (even when TensorBoard runs in a different process or machine).

      For now IGFS Plugin is a part of TensorFlow, so you don’t need to install any third-party packages and you can use it out of the box. The integration is based on custom filesystem plugin from TensorFlow side and IGFS Native API from Apache Ignite side.

       

      Distributed Training

      Distributed training allows to utilize computational resources of the whole cluster and thus speed up training of deep learning model. TensorFlow is a machine learning framework that natively supports distributed neural network training, inference and other computations.

      Distributed Training in TensorFlow on Apache Ignite is based on standalone client mode of distributed multi-worker training. Standalone client mode assumes that we have a cluster of workers with started TensorFlow servers and we have a client that actually contains model code. When the client calls tf.estimator.train_and_evaluate TensorFlow uses specified distribution strategy to distribute computations across workers so that most computationally intensive part performs on workers.

      Attachments

        Issue Links

          Activity

            People

              chief Yury Babak
              chief Yury Babak
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: