Source author record

Tom Denton

Tom Denton appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.CO eess.AS math.RT Sound Machine Learning math.QA

Catalog footprint

What is connected

10works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Perch 2.0: The Bittern Lesson for Bioacoustics

Perch is a performant pre-trained model for bioacoustics. It was trained in supervised fashion, providing both off-the-shelf classification scores for thousands of vocalizing species as well as strong embeddings for transfer learning. In this new release, Perch 2.0, we expand from training exclusively on avian species to a large multi-taxa dataset. The model is trained with self-distillation using a prototype-learning classifier as well as a new source-prediction training criterion. Perch 2.0 obtains state-of-the-art performance on the BirdSet and BEANS benchmarks. It also outperforms specialized marine models on marine transfer learning tasks, despite having almost no marine training data. We present hypotheses as to why fine-grained species classification is a particularly robust pre-training task for bioacoustics.

preprint2022arXiv

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While this new generation of codecs is capable of synthesizing high-fidelity speech, their use of recurrent or convolutional layers often restricts their effective receptive fields, which prevents them from compressing speech efficiently. We propose to further reduce the bitrate of neural speech codecs through the use of pretrained Transformers, capable of exploiting long-range dependencies in the input signal due to their inductive bias. As such, we use a pretrained Transformer in tandem with a convolutional encoder, which is trained end-to-end with a quantizer and a generative adversarial net decoder. Our numerical experiments show that supplementing the convolutional encoder of a neural speech codec with Transformer speech embeddings yields a speech codec with a bitrate of $600\,\mathrm{bps}$ that outperforms the original neural speech codec in synthesized speech quality when trained at the same bitrate. Subjective human evaluations suggest that the quality of the resulting codec is comparable or better than that of conventional codecs operating at three to four times the rate.

preprint2021arXiv

Generative Speech Coding with Predictive Variance Regularization

The recent emergence of machine-learning based generative models for speech suggests a significant reduction in bit rate for speech codecs is possible. However, the performance of generative models deteriorates significantly with the distortions present in real-world input signals. We argue that this deterioration is due to the sensitivity of the maximum likelihood criterion to outliers and the ineffectiveness of modeling a sum of independent signals with a single autoregressive model. We introduce predictive-variance regularization to reduce the sensitivity to outliers, resulting in a significant increase in performance. We show that noise reduction to remove unwanted signals can significantly increase performance. We provide extensive subjective performance evaluations that show that our system based on generative modeling provides state-of-the-art coding performance at 3 kb/s for real-world speech signals at reasonable computational complexity.

preprint2021arXiv

Handling Background Noise in Neural Speech Generation

Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing stage when extracting features and target clean speech during training is shown to be the best performing strategy.

preprint2016arXiv

Combinatorics of the zeta map on rational Dyck paths

An $(a,b)$-Dyck path $P$ is a lattice path from $(0,0)$ to $(b,a)$ that stays above the line $y=\frac{a}{b}x$. The zeta map is a curious rule that maps the set of $(a,b)$-Dyck paths into itself; it is conjecturally bijective, and we provide progress towards proof of bijectivity in this paper, by showing that knowing zeta of $P$ and zeta of $¶$ conjugate is enough to recover $P$. Our method begets an area-preserving involution $χ$ on the set of $(a,b)$-Dyck paths when $ζ$ is a bijection, as well as a new method for calculating $ζ^{-1}$ on classical Dyck paths. For certain nice $(a,b)$-Dyck paths we give an explicit formula for $ζ^{-1}$ and $χ$ and for additional $(a,b)$-Dyck paths we discuss how to compute $ζ^{-1}$ and $χ$ inductively. We also explore Armstrong's skew length statistic and present two new combinatorial methods for calculating the zeta map involving lasers and interval intersections. We provide a combinatorial statistic $δ$ that can be used to recursively compute $ζ^{-1}$ and show that $δ$ is computable from $ζ(P)$ in the Fuss-Catalan case.

preprint2013arXiv

Algebraic and Affine Pattern Avoidance

We investigate various connections between the 0-Hecke monoid, Catalan monoid, and pattern avoidance in permutations, providing new tools for approaching pattern avoidance in an algebraic framework. In particular, we characterize containment of a class of `long' patterns as equivalent to the existence of a corresponding factorization. We then generalize some of our constructions to the affine setting.

preprint2012arXiv

Canonical Decompositions of Affine Permutations, Affine Codes, and Split $k$-Schur Functions

We study the unique maximal decomposition of an arbitrary affine permutation into a product of cyclically decreasing elements, providing a new perspective on work of Thomas Lam. This decomposition is closely related to the affine code, which generalizes the $k$-bounded partition associated to Grassmannian elements. We also show that the affine code readily encodes a number of basic combinatorial properties of an affine permutation. As an application, we prove a new special case of the Littlewood-Richardson Rule for $k$-Schur functions, using the canonical decomposition to control for which permutations appear in the expansion of the $k$-Schur function in noncommuting variables over the affine nil-Coxeter algebra.

preprint2011arXiv

Excursions into Algebra and Combinatorics at $q=0$

We explore combinatorics associated with the degenerate Hecke algebra at $q=0$, obtaining a formula for a system of orthogonal idempotents, and also exploring various pattern avoidance results. Generalizing constructions for the 0-Hecke algebra, we explore the representation theory of $\JJ$-trivial monoids. We then discuss two-tensors of crystal bases for $U_q(\tilde{\mathfrak{sl}_2})$, establishing a complementary result to one of Bandlow, Schilling, and Thiéry on affine crystals arising from promotion operators. Finally, we give a computer implementation of Stembridge's local axioms for simply-laced crystal bases.

preprint2011arXiv

On the representation theory of finite J-trivial monoids

In 1979, Norton showed that the representation theory of the 0-Hecke algebra admits a rich combinatorial description. Her constructions rely heavily on some triangularity property of the product, but do not use explicitly that the 0-Hecke algebra is a monoid algebra. The thesis of this paper is that considering the general setting of monoids admitting such a triangularity, namely J-trivial monoids, sheds further light on the topic. This is a step to use representation theory to automatically extract combinatorial structures from (monoid) algebras, often in the form of posets and lattices, both from a theoretical and computational point of view, and with an implementation in Sage. Motivated by ongoing work on related monoids associated to Coxeter systems, and building on well-known results in the semi-group community (such as the description of the simple modules or the radical), we describe how most of the data associated to the representation theory (Cartan matrix, quiver) of the algebra of any J-trivial monoid M can be expressed combinatorially by counting appropriate elements in M itself. As a consequence, this data does not depend on the ground field and can be calculated in O(n^2), if not O(nm), where n=|M| and m is the number of generators. Along the way, we construct a triangular decomposition of the identity into orthogonal idempotents, using the usual Möbius inversion formula in the semi-simple quotient (a lattice), followed by an algorithmic lifting step. Applying our results to the 0-Hecke algebra (in all finite types), we recover previously known results and additionally provide an explicit labeling of the edges of the quiver. We further explore special classes of J-trivial monoids, and in particular monoids of order preserving regressive functions on a poset, generalizing known results on the monoids of nondecreasing parking functions.

preprint2010arXiv

A Combinatorial Formula for Orthogonal Idempotents in the $0$-Hecke Algebra of the Symmetric Group

Building on the work of P.N. Norton, we give combinatorial formulae for two maximal decompositions of the identity into orthogonal idempotents in the $0$-Hecke algebra of the symmetric group, $\mathbb{C}H_0(S_N)$. This construction is compatible with the branching from $S_{N-1}$ to $S_{N}$.

Tom Denton

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Perch 2.0: The Bittern Lesson for Bioacoustics

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

Generative Speech Coding with Predictive Variance Regularization

Handling Background Noise in Neural Speech Generation

Combinatorics of the zeta map on rational Dyck paths

Algebraic and Affine Pattern Avoidance

Canonical Decompositions of Affine Permutations, Affine Codes, and Split $k$-Schur Functions

Excursions into Algebra and Combinatorics at $q=0$

On the representation theory of finite J-trivial monoids

A Combinatorial Formula for Orthogonal Idempotents in the $0$-Hecke Algebra of the Symmetric Group