Source author record

Tsui-Wei Weng

Tsui-Wei Weng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation Computational Engineering, Finance, and Science Computer Vision Artificial Intelligence Computation and Language math.NA math.OC math.PR Robotics

Catalog footprint

What is connected

7works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LLM Agents Already Know When to Call Tools -- Even Without Reasoning

Tool-augmented LLM agents tend to call tools indiscriminately, even when the model can answer directly. Each unnecessary call wastes API fees and latency, yet no existing benchmark systematically studies when a tool call is actually needed. We propose When2Tool, a benchmark of 18 environments (15 single-hop, 3 multi-hop) spanning three categories of tool necessity -- computational scale, knowledge boundaries, and execution reliability -- each with controlled difficulty levels that create a clear decision boundary between tool-necessary and tool-unnecessary tasks. We evaluate two families of training-free baselines: Prompt-only (varying the prompt to discourage unnecessary calls) and Reason-then-Act (requiring the model to reason about tool necessity before acting). Both provide limited control: Prompt-only suppresses necessary calls alongside unnecessary ones, and Reason-then-Act still incurs a disproportionate accuracy cost on hard tasks. To understand why these baselines fail, we probe the models' hidden states and find that tool necessity is linearly decodable from the pre-generation representation with AUROC 0.89--0.96 across six models, substantially exceeding the model's own verbalized reasoning. This reveals that models already know when tools are needed, but fail to act on this knowledge during generation. Building on this finding, we propose Probe&Prefill, which uses a lightweight linear probe to read the hidden-state signal and prefills the model's response with a steering sentence. Across all models tested, Probe&Prefill reduces tool calls by 48% with only 1.7% accuracy loss, while the best baseline at comparable accuracy only reduces 6% of tool calls, or achieves a similar tool call reduction but incurs a 5$\times$ higher accuracy loss. Our code is available at https://github.com/Trustworthy-ML-Lab/when2tool

preprint2022arXiv

On the Equivalence between Neural Network and Support Vector Machine

Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) \citep{jacot2018neural}. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK \citep{arora2019exact}. However, the equivalence is only known for ridge regression currently \citep{arora2019harnessing}, while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalences between NNs and a broad family of $\ell_2$ regularized KMs with finite-width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) \textit{non-vacuous} generalization bound of NN via the corresponding KM; (ii) \textit{non-trivial} robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression. Our code for the experiments is available at \url{https://github.com/leslie-CH/equiv-nn-svm}.

preprint2022arXiv

Quantifying Safety of Learning-based Self-Driving Control Using Almost-Barrier Functions

Path-tracking control of self-driving vehicles can benefit from deep learning for tackling longstanding challenges such as nonlinearity and uncertainty. However, deep neural controllers lack safety guarantees, restricting their practical use. We propose a new approach of learning almost-barrier functions, which approximately characterizes the forward invariant set for the system under neural controllers, to quantitatively analyze the safety of deep neural controllers for path-tracking. We design sampling-based learning procedures for constructing candidate neural barrier functions, and certification procedures that utilize robustness analysis for neural networks to identify regions where the barrier conditions are fully satisfied. We use an adversarial training loop between learning and certification to optimize the almost-barrier functions. The learned barrier can also be used to construct online safety monitors through reachability analysis. We demonstrate effectiveness of our methods in quantifying safety of neural controllers in various simulation environments, ranging from simple kinematic models to the TORCS simulator with high-fidelity vehicle dynamics simulation.

preprint2021arXiv

Fast Training of Provably Robust Neural Networks by SingleProp

Recent works have developed several methods of defending neural networks against adversarial attacks with certified guarantees. However, these techniques can be computationally costly due to the use of certification during training. We develop a new regularizer that is both more efficient than existing certified defenses, requiring only one additional forward propagation through a network, and can be used to train networks with similar certified accuracy. Through experiments on MNIST and CIFAR-10 we demonstrate improvements in training speed and comparable certified accuracy compared to state-of-the-art certified defenses.

preprint2021arXiv

On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning

Model-agnostic meta-learning (MAML) has emerged as one of the most successful meta-learning techniques in few-shot learning. It enables us to learn a meta-initialization} of model parameters (that we call meta-model) to rapidly adapt to new tasks using a small amount of labeled training data. Despite the generalization power of the meta-model, it remains elusive that how adversarial robustness can be maintained by MAML in few-shot learning. In addition to generalization, robustness is also desired for a meta-model to defend adversarial examples (attacks). Toward promoting adversarial robustness in MAML, we first study WHEN a robustness-promoting regularization should be incorporated, given the fact that MAML adopts a bi-level (fine-tuning vs. meta-update) learning procedure. We show that robustifying the meta-update stage is sufficient to make robustness adapted to the task-specific fine-tuning stage even if the latter uses a standard training protocol. We also make additional justification on the acquired robustness adaptation by peering into the interpretability of neurons' activation maps. Furthermore, we investigate HOW robust regularization can efficiently be designed in MAML. We propose a general but easily-optimized robustness-regularized meta-learning framework, which allows the use of unlabeled data augmentation, fast adversarial attack generation, and computationally-light fine-tuning. In particular, we for the first time show that the auxiliary contrastive learning task can enhance the adversarial robustness of MAML. Finally, extensive experiments are conducted to demonstrate the effectiveness of our proposed methods in robust few-shot learning.

preprint2016arXiv

A Big-Data Approach to Handle Many Process Variations: Tensor Recovery and Applications

Fabrication process variations are a major source of yield degradation in the nano-scale design of integrated circuits (IC), microelectromechanical systems (MEMS) and photonic circuits. Stochastic spectral methods are a promising technique to quantify the uncertainties caused by process variations. Despite their superior efficiency over Monte Carlo for many design cases, these algorithms suffer from the curse of dimensionality; i.e., their computational cost grows very fast as the number of random parameters increases. In order to solve this challenging problem, this paper presents a high-dimensional uncertainty quantification algorithm from a big-data perspective. Specifically, we show that the huge number of (e.g., $1.5 \times 10^{27}$) simulation samples in standard stochastic collocation can be reduced to a very small one (e.g., $500$) by exploiting some hidden structures of a high-dimensional data array. This idea is formulated as a tensor recovery problem with sparse and low-rank constraints; and it is solved with an alternating minimization approach. Numerical results show that our approach can simulate efficiently some ICs, as well as MEMS and photonic problems with over 50 independent random parameters, whereas the traditional algorithm can only handle several random parameters.

preprint2016arXiv

A Big-Data Approach to Handle Process Variations: Uncertainty Quantification by Tensor Recovery

Stochastic spectral methods have become a popular technique to quantify the uncertainties of nano-scale devices and circuits. They are much more efficient than Monte Carlo for certain design cases with a small number of random parameters. However, their computational cost significantly increases as the number of random parameters increases. This paper presents a big-data approach to solve high-dimensional uncertainty quantification problems. Specifically, we simulate integrated circuits and MEMS at only a small number of quadrature samples, then, a huge number of (e.g., $1.5 \times 10^{27}$) solution samples are estimated from the available small-size (e.g., $500$) solution samples via a low-rank and tensor-recovery method. Numerical results show that our algorithm can easily extend the applicability of tensor-product stochastic collocation to IC and MEMS problems with over 50 random parameters, whereas the traditional algorithm can only handle several random parameters.

Tsui-Wei Weng

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

LLM Agents Already Know When to Call Tools -- Even Without Reasoning

On the Equivalence between Neural Network and Support Vector Machine

Quantifying Safety of Learning-based Self-Driving Control Using Almost-Barrier Functions

Fast Training of Provably Robust Neural Networks by SingleProp

On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning

A Big-Data Approach to Handle Many Process Variations: Tensor Recovery and Applications

A Big-Data Approach to Handle Process Variations: Uncertainty Quantification by Tensor Recovery