Scalable Sound Sensing for People-Centric Applications on Mobile ...

Here, I develop a convolutional neural network based on traditional key classification pipelines to create a key classifier that performs better ...

On the Environmental Impact of Deep Generative Models for Audio
Data. Our audio-visual data is obtained from broadcast television news videos. It consists about 100 speakers and 104,881 examples of speech, which is about ...
ADVERTIMENT. La consulta d'aquesta tesi queda condicionada a l ...
The concept of Fuzzy Song Sets is defined to flexibly capture music similarities between pairs of songs. Innovative internal representations and their ...
Most methods for spatial audio processing, be it ML- based or not, use a scene representation as input that is either composed of ?raw? microphone signals or ...
;ignature redacted - DSpace@MIT
This study introduces the Audio Scanning Network (ASNet), designed to leverage abundant information for achieving stable and effective au- dio classification.
Robust Speech Recognition via Large-Scale Weak Supervision
... information source to supplement audio-visual clues that we extracted form raw video data. Another related problem, which has to be mentioned, is the video ...
Harmonic Analysis of Musical Audio using Deep Neural Networks
We explore frame-level audio feature learning for chord recognition using artificial neural networks. We present the argument that chroma vectors ...
modality attention for end-to-end audio-visual speech recognition
In this paper, we propose a novel decoding al- gorithm for streaming End-to-end (E2E) auto- matic speech recognition (ASR) models, the double decoder.
Scalable Data Management for Music Recommendation Services
They have shown high learning capabilities for open domain dialogue with huge amounts of data and also for domain adaptation in task-oriented ...
An overview of machine learning and other data-based methods for ...
We find that the dataset used to pre-train audio models has a significant effect on downstream performance. To the best of our knowledge ...
Automatic Annotation of Formula 1 Races for Content-Based Video ...
An easier way to obtain well annotated data for sound event detection is creation of synthetic mixtures using isolated sound events - possibly allowing ...
In this implementation, they are used to send raw audio samples from the audio splitter and receive transcribed text. Other protocols such ...
Double Decoder: Improving latency for Streaming End-to-end ASR ...
data, the additional data embedded into raw data (e.g., YUV sequences) before compression have a great risk of loss after quantization. Thus, it is ...