Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-12849

Add New BinaryObject Vectorizer for SparseVectors and Integer Coordinates

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.8
    • Fix Version/s: 3.0
    • Component/s: ml
    • Labels:
      None
    • Ignite Flags:
      Docs Required, Release Notes Required

      Description

      A. DenseVector-based BinaryObjectVectorizer
      When using existing caches as a source of Datasets, the BinaryObjectVectorizer is used.
      The existing BinaryObjectVectorizer only supports the creation of a SparseVector.
      The LUDecomposition utility that supports gaussian factorization for models like GMM have a "Singularity indicator" for which a SparseVector and its null handling will set a matrix column calculation to be zero/0.0 which is below the minimum check value (1e-11) and thus indicate a matrix is not square.

      This null handling of the SparseMatrix will restrict the use of some algorithms like Gaussian Mixture Models where any Vector dimension that is null will incorrectly signal that a matrix is not square.

      It would be great if we could:

      • Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this singularity trigger and enable use of GMM Trainer.

      B. CacheBasedDatasets not treated as Temporary Cache
      When using a cache-based dataset, the close() method destroys the Ignite cache. This means that there is no ability to re-use the data loaded into this dataset.

      It would be great if we could:

      • Not destroy the Ignite Cache holding the dataset on close (of one step in an ML processing flow)
      • Allow for "attaching" to this prior, pre-calculated dataset in subsequent use.

      C. Vector Visibility
      Vectors (unlike other value types, e.g. BinaryObjects) are not visible in standard mechanisms, like the Ignite Web Console, where the toString() method does not present any information about the embedded vector values.

      It would be great if we could:

      • have a Vector.toString() method implementation that presented some information about what is actually in the Vector.

      I have implemented the above items and have used them at a customer where I needed these capabilities (or at least it dramatically reduced the cost and increased the value of the solution).

      It would be great if the community was supportive of this expansion/improvement of the Ignite ML library.

      Thanks,
      Glenn

        Attachments

        1. DenseIntBinaryObjectVectorizer.java
          5 kB
          Glenn Wiebe
        2. DenseStringBinaryObjectVectorizer.java
          5 kB
          Glenn Wiebe

          Activity

            People

            • Assignee:
              zaleslaw Alexey Zinoviev
              Reporter:
              ggwiebe Glenn Wiebe
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: