Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-7328

Improve Labeled Dataset loading from txt file

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • None
    • None
    • ml
    • None

    Description

      1. Wouldn't it be better to parse rows in-place (not to save them as strings at first)? In current implementation we will be needed to keep a dataset in memory twice and it might be a problem for big datasets.

      2. What about the case when a dataset contains not only a numerical data? Do we consider this case or for such purposes some other "DatasetLoader" will be used?

      3. Just an idea, in case we don't want to fall on bad data (99% of cases) would be great to understand the quality of loaded dataset such as number of missed rows/values.

      4. Does a situation when a row doesn't contain required number of columns should be considered as "bad data" and don't break parsing with IndexOutOfBoundException?

      Attachments

        Activity

          People

            zaleslaw Alexey Zinoviev
            zaleslaw Alexey Zinoviev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: