Bigtop
  1. Bigtop
  2. BIGTOP-1212

Pick or build a framework for building fake data sets

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: blueprints
    • Labels:
      None

      Description

      • We've already seen that the mahout smoke tests are fragile with respect to requiring many external input data sets.
      • Also in BigPetStore BIGTOP-1089 , we are building custom fake data generators so that we can build arbitrarily large data sets of customer transactions with patterns in them.

      So – lest either (1) build a framework or (2) adopt one, that is modular enough to extend for different smoke test scenarios.

      ADVANTAGES:

      • VM tests can run the exact same smokes that real tests run , and just generate smaller input data sets. Right now, we cant do this with static external data sets .
      • We can start eliminating fragile external dependencies of smoke tests (i.e. the mahout ones), and replace them with own data sets on the fly, no need for wgetting them from 3rd parties
      • BigPetStore can focus on demo'ing the bigtop based hadoop ecosystem deployment, rather than on generating data.

        Issue Links

          Activity

          Hide
          jay vyas added a comment -

          We are using BigPetStores data set generator currently for the purpose of generating a rich data set of arbitrary size.

          The BigPetStore TransactionInputFormat can be modified to generate other types of data if needed.

          Maybe we can have a broader data set generator in the future, or convert that into a generic framework for producing fake data sets.

          We can open another JIRA to take the custom input splits in bigpetstore and make them more generic, possibly, if we have interest in doing so.

          Show
          jay vyas added a comment - We are using BigPetStores data set generator currently for the purpose of generating a rich data set of arbitrary size. The BigPetStore TransactionInputFormat can be modified to generate other types of data if needed. Maybe we can have a broader data set generator in the future, or convert that into a generic framework for producing fake data sets. We can open another JIRA to take the custom input splits in bigpetstore and make them more generic, possibly, if we have interest in doing so.

            People

            • Assignee:
              Unassigned
              Reporter:
              jay vyas
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development