Details
-
New Feature
-
Status: To Do
-
Minor
-
Resolution: Unresolved
-
None
Description
As a user, I would like to have an out of the box feature in Audio Data Loader and Audio transforms in MXNet, that would allow me :
- to be able to load audio (only .wav files supported currently) files and make a Gluon AudioDataset (NDArrays),
- apply some popular audio transforms on the audio data( example scaling, MEL, MFCC etc.),
- load the Dataset using Gluon's DataLoader, train a neural network ( Ex: MLP) with this transformed audio dataset,
- perform a simple audio data related task such as sounds classification - 1 audio clip with 1 label( Multiclass sound classification problem).
- Provide an end to end example for a task (Urban Sounds Classification) including:
- reading audio files from a folder location (can be extended to S3 bucket later) and load it into the AudioDataset
- apply audio transforms
- train a model - neural network with the AudioDataset or DataLoader
- perform the multi class classification - conduct inference