Source author record

Kuan-Yu Chen

Kuan-Yu Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Artificial Intelligence physics.ins-det eess.AS hep-ex Information Retrieval Sound Machine Learning Multimedia

Catalog footprint

What is connected

12works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification

Large language models (LLMs) have recently been adapted to tabular prediction by serializing structured features into natural language, but their performance in low-data regimes remains limited compared to gradient-boosted decision trees (GBDTs). In this work, we revisit the boosting paradigm, traditionally associated with tree ensembles, and ask whether it can be applied as a general training principle for LLM fine-tuning. We propose BoostLLM, a framework that transforms parameter-efficient fine-tuning into a multi-round residual optimization process by training sequential PEFT adapters as weak learners. To incorporate tabular inductive bias, BoostLLM integrates decision-tree paths as a second input view alongside raw features; analysis reveals that the path view acts as a structured teacher in early training steps before the model shifts toward feature-driven representations. Empirically, BoostLLM achieves consistent improvements over standard fine-tuning across multiple LLM backbones and datasets, matching or surpassing XGBoost across a wide range of shot counts and outperforming GPT-4o-based methods with a 4B model. We further show that the framework scales: pairing with stronger tree models and extended boosting horizons yields additional gains under appropriate stabilization. These results suggest that boosting can serve as a general training principle for LLM fine-tuning, particularly in low-data regimes for structured data.

preprint2022arXiv

First Results from the Taiwan Axion Search Experiment with Haloscope at 19.6 $μ$eV

This Letter reports on the first results from the Taiwan Axion Search Experiment with Haloscope, a search for axions using a microwave cavity at frequencies between 4.70750 and 4.79815 GHz. Apart from the non-axion signals, no candidates with a significance more than 3.355 were found. The experiment excludes models with the axion-two-photon coupling $\left|g_{aγγ}\right|\gtrsim 8.2\times 10^{-14}$ GeV$^{-1}$, a factor of eleven above the benchmark KSVZ model, reaching a sensitivity three orders of magnitude better than any existing limits in the mass range 19.4687 < $m_a$ < 19.8436 $μ$eV. It is also the first time that a haloscope-type experiment places constraints on $g_{aγγ}$ in this mass region.

preprint2022arXiv

Non-autoregressive Transformer-based End-to-end ASR using BERT

Transformer-based models have led to significant innovation in classical and practical subjects as varied as speech processing, natural language processing, and computer vision. On top of the Transformer, attention-based end-to-end automatic speech recognition (ASR) models have recently become popular. Specifically, non-autoregressive modeling, which boasts fast inference and performance comparable to conventional autoregressive methods, is an emerging research topic. In the context of natural language processing, the bidirectional encoder representations from Transformers (BERT) model has received widespread attention, partially due to its ability to infer contextualized word representations and to enable superior performance for downstream tasks while needing only simple fine-tuning. Motivated by the success, we intend to view speech recognition as a downstream task of BERT, thus an ASR system is expected to be deduced by performing fine-tuning. Consequently, to not only inherit the advantages of non-autoregressive ASR models but also enjoy the benefits of a pre-trained language model (e.g., BERT), we propose a non-autoregressive Transformer-based end-to-end ASR model based on BERT. We conduct a series of experiments on the AISHELL-1 dataset that demonstrate competitive or superior results for the model when compared to state-of-the-art ASR systems.

preprint2022arXiv

Taiwan Axion Search Experiment with Haloscope: CD102 Analysis Details

This paper presents the analysis of the data acquired during the first physics run of the Taiwan Axion Search Experiment with Haloscope (TASEH), a search for axions using a microwave cavity at frequencies between 4.70750 and 4.79815 GHz. The data were collected from October 13, 2021 to November 15, 2021, and are referred to as the CD102 data. The analysis of the TASEH CD102 data excludes models with the axion-two-photon coupling $|g_{aγγ}| \gtrsim 8.2\times 10^{-14}$ GeV$^{-1}$, a factor of eleven above the benchmark KSVZ model for the mass range 19.4687 < ma < 19.8436 $μ$eV.

preprint2022arXiv

Taiwan Axion Search Experiment with Haloscope: Designs and operations

We report on a holoscope axion search experiment near $19.6\ {\rm μeV}$ from the TASEH collaboration. The experiment is carried out via a frequency-tunable cavity detector with a volume $V = 0.234\ {\rm liter}$ in a magnetic field $B_0 = 8\ {\rm T}$. With a signal receiver that has a system noise temperature $T_{\rm sys} \cong 2.2\ {\rm K}$ and experiment time about 1 month, the search excludes values of the axion-photon coupling constant $g_{\rm aγγ} \gtrsim 8.1 \times 10^{-14} \ {\rm GeV}^{-1}$, a factor of 11 above the KSVZ model, at the 95\% confidence level in the mass range of $19.4687-19.8436\ {\rm μeV}$. We present the experimental setup and procedures to accomplish this search.

preprint2021arXiv

Speech Recognition by Simply Fine-tuning BERT

We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations. Our assumption is that given a history context sequence, a powerful LM can narrow the range of possible choices and the speech signal can be used as a simple clue. Hence, comparing to conventional ASR systems that train a powerful acoustic model (AM) from scratch, we believe that speech recognition is possible by simply fine-tuning a BERT model. As an initial study, we demonstrate the effectiveness of the proposed idea on the AISHELL dataset and show that stacking a very simple AM on top of BERT can yield reasonable performance.

preprint2020arXiv

An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering

In a spoken multiple-choice question answering (SMCQA) task, given a passage, a question, and multiple choices all in the form of speech, the machine needs to pick the correct choice to answer the question. While the audio could contain useful cues for SMCQA, usually only the auto-transcribed text is utilized in system development. Thanks to the large-scaled pre-trained language representation models, such as the bidirectional encoder representations from transformers (BERT), systems with only auto-transcribed text can still achieve a certain level of performance. However, previous studies have evidenced that acoustic-level statistics can offset text inaccuracies caused by the automatic speech recognition systems or representation inadequacy lurking in word embedding generators, thereby making the SMCQA system robust. Along the line of research, this study concentrates on designing a BERT-based SMCQA framework, which not only inherits the advantages of contextualized language representations learned by BERT, but integrates the complementary acoustic-level information distilled from audio with the text-level information. Consequently, an audio-enriched BERT-based SMCQA framework is proposed. A series of experiments demonstrates remarkable improvements in accuracy over selected baselines and SOTA systems on a published Chinese SMCQA dataset.

preprint2020arXiv

Investigation of Sentiment Controllable Chatbot

Conventional seq2seq chatbot models attempt only to find sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences. In this paper, we investigate four models to scale or adjust the sentiment of the chatbot response: a persona-based model, reinforcement learning, a plug and play model, and CycleGAN, all based on the seq2seq model. We also develop machine-evaluated metrics to estimate whether the responses are reasonable given the input. These metrics, together with human evaluation, are used to analyze the performance of the four models in terms of different aspects; reinforcement learning and CycleGAN are shown to be very attractive.

preprint2016arXiv

Improved Spoken Document Summarization with Coverage Modeling Techniques

Extractive summarization aims at selecting a set of indicative sentences from a source document as a summary that can express the major theme of the document. A general consensus on extractive summarization is that both relevance and coverage are critical issues to address. The existing methods designed to model coverage can be characterized by either reducing redundancy or increasing diversity in the summary. Maximal margin relevance (MMR) is a widely-cited method since it takes both relevance and redundancy into account when generating a summary for a given document. In addition to MMR, there is only a dearth of research concentrating on reducing redundancy or increasing diversity for the spoken document summarization task, as far as we are aware. Motivated by these observations, two major contributions are presented in this paper. First, in contrast to MMR, which considers coverage by reducing redundancy, we propose two novel coverage-based methods, which directly increase diversity. With the proposed methods, a set of representative sentences, which not only are relevant to the given document but also cover most of the important sub-themes of the document, can be selected automatically. Second, we make a step forward to plug in several document/sentence representation methods into the proposed framework to further enhance the summarization performance. A series of empirical evaluations demonstrate the effectiveness of our proposed methods.

preprint2016arXiv

Learning to Distill: The Essence Vector Modeling Framework

In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations, our major contributions in this paper are twofold. First, we propose a novel unsupervised paragraph embedding method, named the essence vector (EV) model, which aims at not only distilling the most representative information from a paragraph but also excluding the general background information to produce a more informative low-dimensional vector representation for the paragraph. Second, in view of the increasing importance of spoken content processing, an extension of the EV model, named the denoising essence vector (D-EV) model, is proposed. The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition.

preprint2016arXiv

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Word embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words, sentences and documents in context. Celebrated methods can be categorized as prediction-based and count-based methods according to the training objectives and model architectures. Their pros and cons have been extensively analyzed and evaluated in recent studies, but there is relatively less work continuing the line of research to develop an enhanced learning method that brings together the advantages of the two model families. In addition, the interpretation of the learned word representations still remains somewhat opaque. Motivated by the observations and considering the pressing need, this paper presents a novel method for learning the word representations, which not only inherits the advantages of classic word embedding methods but also offers a clearer and more rigorous interpretation of the learned word representations. Built upon the proposed word embedding method, we further formulate a translation-based language modeling framework for the extractive speech summarization task. A series of empirical evaluations demonstrate the effectiveness of the proposed word representation learning and language modeling techniques in extractive speech summarization.

preprint2015arXiv

Leveraging Word Embeddings for Spoken Document Summarization

Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly favorite research subject because of its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies investigating its use in extractive text or speech summarization. A common thread of leveraging word embeddings in the summarization process is to represent the document (or sentence) by averaging the word embeddings of the words occurring in the document (or sentence). Then, intuitively, the cosine similarity measure can be employed to determine the relevance degree between a pair of representations. Beyond the continued efforts made to improve the representation of words, this paper focuses on building novel and efficient ranking models based on the general word embedding methods for extractive speech summarization. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.

Kuan-Yu Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification

First Results from the Taiwan Axion Search Experiment with Haloscope at 19.6 $μ$eV

Non-autoregressive Transformer-based End-to-end ASR using BERT

Taiwan Axion Search Experiment with Haloscope: CD102 Analysis Details

Taiwan Axion Search Experiment with Haloscope: Designs and operations

Speech Recognition by Simply Fine-tuning BERT

An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering

Investigation of Sentiment Controllable Chatbot

Improved Spoken Document Summarization with Coverage Modeling Techniques

Learning to Distill: The Essence Vector Modeling Framework

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization

Leveraging Word Embeddings for Spoken Document Summarization