Source author record

Arshdeep Singh

Arshdeep Singh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Machine Learning Sound Artificial Intelligence Computational Complexity Data Structures and Algorithms Databases

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

1-D CNN based Acoustic Scene Classification via Reducing Layer-wise Dimensionality

This paper presents an alternate representation framework to commonly used time-frequency representation for acoustic scene classification (ASC). A raw audio signal is represented using a pre-trained convolutional neural network (CNN) using its various intermediate layers. The study assumes that the representations obtained from the intermediate layers lie in low-dimensions intrinsically. To obtain low-dimensional embeddings, principal component analysis is performed, and the study analyzes that only a few principal components are significant. However, the appropriate number of significant components are not known. To address this, an automatic dictionary learning framework is utilized that approximates the underlying subspace. Further, the low-dimensional embeddings are aggregated in a late-fusion manner in the ensemble framework to incorporate hierarchical information learned at various intermediate layers. The experimental evaluation is performed on publicly available DCASE 2017 and 2018 ASC datasets on a pre-trained 1-D CNN, SoundNet. Empirically, it is observed that deeper layers show more compression ratio than others. At 70% compression ratio across different datasets, the performance is similar to that obtained without performing any dimensionality reduction. The proposed framework outperforms the time-frequency representation based methods.

preprint2022arXiv

A Passive Similarity based CNN Filter Pruning for Efficient Acoustic Scene Classification

We present a method to develop low-complexity convolutional neural networks (CNNs) for acoustic scene classification (ASC). The large size and high computational complexity of typical CNNs is a bottleneck for their deployment on resource-constrained devices. We propose a passive filter pruning framework, where a few convolutional filters from the CNNs are eliminated to yield compressed CNNs. Our hypothesis is that similar filters produce similar responses and give redundant information allowing such filters to be eliminated from the network. To identify similar filters, a cosine distance based greedy algorithm is proposed. A fine-tuning process is then performed to regain much of the performance lost due to filter elimination. To perform efficient fine-tuning, we analyze how the performance varies as the number of fine-tuning training examples changes. An experimental evaluation of the proposed framework is performed on the publicly available DCASE 2021 Task 1A baseline network trained for ASC. The proposed method is simple, reduces computations per inference by 27%, with 25% fewer parameters, with less than 1% drop in accuracy.

preprint2022arXiv

Continual Learning For On-Device Environmental Sound Classification

Continuously learning new classes without catastrophic forgetting is a challenging problem for on-device environmental sound classification given the restrictions on computation resources (e.g., model size, running memory). To address this issue, we propose a simple and efficient continual learning method. Our method selects the historical data for the training by measuring the per-sample classification uncertainty. Specifically, we measure the uncertainty by observing how the classification probability of data fluctuates against the parallel perturbations added to the classifier embedding. In this way, the computation cost can be significantly reduced compared with adding perturbation to the raw data. Experimental results on the DCASE 2019 Task 1 and ESC-50 dataset show that our proposed method outperforms baseline continual learning methods on classification accuracy and computational efficiency, indicating our method can efficiently and incrementally learn new classes without the catastrophic forgetting problem for on-device environmental sound classification.

preprint2022arXiv

Low-complexity CNNs for Acoustic Scene Classification

This technical report describes the SurreyAudioTeam22s submission for DCASE 2022 ASC Task 1, Low-Complexity Acoustic Scene Classification (ASC). The task has two rules, (a) the ASC framework should have maximum 128K parameters, and (b) there should be a maximum of 30 millions multiply-accumulate operations (MACs) per inference. In this report, we present low-complexity systems for ASC that follow the rules intended for the task.

preprint2020arXiv

Batching and Matching for Food Delivery in Dynamic Road Networks

Given a stream of food orders and available delivery vehicles, how should orders be assigned to vehicles so that the delivery time is minimized? Several decisions have to be made: (1) assignment of orders to vehicles, (2) grouping orders into batches to cope with limited vehicle availability, and (3) adapting to dynamic positions of delivery vehicles. We show that the minimization problem is not only NP-hard but inapproximable in polynomial time. To mitigate this computational bottleneck, we develop an algorithm called FoodMatch, which maps the vehicle assignment problem to that of minimum weight perfect matching on a bipartite graph. To further reduce the quadratic construction cost of the bipartite graph, we deploy best-first search to only compute a subgraph that is highly likely to contain the minimum matching. The solution quality is further enhanced by reducing batching to a graph clustering problem and anticipating dynamic positions of vehicles through angular distance. Extensive experiments on food-delivery data from large metropolitan cities establish that FoodMatch is substantially better than baseline strategies on a number of metrics, while being efficient enough to handle real-world workloads.

Arshdeep Singh

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

1-D CNN based Acoustic Scene Classification via Reducing Layer-wise Dimensionality

A Passive Similarity based CNN Filter Pruning for Efficient Acoustic Scene Classification

Continual Learning For On-Device Environmental Sound Classification

Low-complexity CNNs for Acoustic Scene Classification

Batching and Matching for Food Delivery in Dynamic Road Networks