Source author record

György Fazekas

György Fazekas appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Sound eess.AS Computer Vision Information Retrieval Artificial Intelligence Computation and Language eess.IV Multimedia

Catalog footprint

What is connected

5works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Contrastive Audio-Language Learning for Music

As one of the most intuitive interfaces known to humans, natural language has the potential to mediate many tasks that involve human-computer interaction, especially in application-focused fields like Music Information Retrieval. In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain. To this end, we propose MusCALL, a framework for Music Contrastive Audio-Language Learning. Our approach consists of a dual-encoder architecture that learns the alignment between pairs of music audio and descriptive sentences, producing multimodal embeddings that can be used for text-to-audio and audio-to-text retrieval out-of-the-box. Thanks to this property, MusCALL can be transferred to virtually any task that can be cast as text-based retrieval. Our experiments show that our method performs significantly better than the baselines at retrieving audio that matches a textual description and, conversely, text that matches an audio query. We also demonstrate that the multimodal alignment capability of our model can be successfully extended to the zero-shot transfer scenario for genre classification and auto-tagging on two public datasets.

preprint2022arXiv

Seeing Sounds, Hearing Shapes: a gamified study to evaluate sound-sketches

Sound-shape associations, a subset of cross-modal associations between the auditory and visual domain, have been studied mainly in the context of matching a set of purposefully crafted shapes to sounds. Recent studies have explored how humans represent sound through free-form sketching and how a graphical sketch input could be used for sound production. In this paper, the potential of communicating sound characteristics through these free-form sketches is investigated in a gamified study that was conducted with eighty-two participants at two online exhibition events. The results show that participants managed to recognise sounds at a higher rate than the random baseline would suggest, however it appeared difficult to visually encode nuanced timbral differences.

preprint2021arXiv

A Comparison of Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging

In this paper, we empirically investigate the effect of audio preprocessing on music tagging with deep neural networks. We perform comprehensive experiments involving audio preprocessing using different time-frequency representations, logarithmic magnitude compression, frequency weighting, and scaling. We show that many commonly used input preprocessing techniques are redundant except magnitude compression.

preprint2020arXiv

A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis

In recent years, Markov logic networks (MLNs) have been proposed as a potentially useful paradigm for music signal analysis. Because all hidden Markov models can be reformulated as MLNs, the latter can provide an all-encompassing framework that reuses and extends previous work in the field. However, just because it is theoretically possible to reformulate previous work as MLNs, does not mean that it is advantageous. In this paper, we analyse some proposed examples of MLNs for musical analysis and consider their practical disadvantages when compared to formulating the same musical dependence relationships as (dynamic) Bayesian networks. We argue that a number of practical hurdles such as the lack of support for sequences and for arbitrary continuous probability distributions make MLNs less than ideal for the proposed musical applications, both in terms of easy of formulation and computational requirements due to their required inference algorithms. These conclusions are not specific to music, but apply to other fields as well, especially when sequential data with continuous observations is involved. Finally, we show that the ideas underlying the proposed examples can be expressed perfectly well in the more commonly used framework of (dynamic) Bayesian networks.

preprint2020arXiv

Optical Music Recognition: State of the Art and Major Challenges

Optical Music Recognition (OMR) is concerned with transcribing sheet music into a machine-readable format. The transcribed copy should allow musicians to compose, play and edit music by taking a picture of a music sheet. Complete transcription of sheet music would also enable more efficient archival. OMR facilitates examining sheet music statistically or searching for patterns of notations, thus helping use cases in digital musicology too. Recently, there has been a shift in OMR from using conventional computer vision techniques towards a deep learning approach. In this paper, we review relevant works in OMR, including fundamental methods and significant outcomes, and highlight different stages of the OMR pipeline. These stages often lack standard input and output representation and standardised evaluation. Therefore, comparing different approaches and evaluating the impact of different processing methods can become rather complex. This paper provides recommendations for future work, addressing some of the highlighted issues and represents a position in furthering this important field of research.

György Fazekas

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Contrastive Audio-Language Learning for Music

Seeing Sounds, Hearing Shapes: a gamified study to evaluate sound-sketches

A Comparison of Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging

A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis

Optical Music Recognition: State of the Art and Major Challenges