Source author record

Kevin Wang

Kevin Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Science and Game Theory Multiagent Systems cond-mat.str-el Computation and Language Computer Vision cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.stat-mech cs.CY Graphics Human-Computer Interaction quant-ph Symbolic Computation

Catalog footprint

What is connected

10works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Neurosymbolic LoRA: Why and When to Tune Weights vs. Rewrite Prompts

Large language models (LLMs) can be adapted either through numerical updates that alter model parameters or symbolic manipulations that work on discrete prompts or logical constraints. While numerical fine-tuning excels at injecting new factual knowledge, symbolic updates offer flexible control of style and alignment without retraining. We introduce a neurosymbolic LoRA framework that dynamically combines these two complementary strategies. Specifically, we present a unified monitoring signal and a reward-based classifier to decide when to employ LoRA for deeper factual reconstruction and when to apply TextGrad for token-level edits. Our approach remains memory-efficient by offloading the symbolic transformations to an external LLM only when needed. Additionally, the refined prompts produced during symbolic editing serve as high-quality, reusable training data, an important benefit in data-scarce domains like mathematical reasoning. Extensive experiments across multiple LLM backbones show that neurosymbolic LoRA consistently outperforms purely numerical or purely symbolic baselines, demonstrating superior adaptability and improved performance. Our findings highlight the value of interleaving numerical and symbolic updates to unlock a new level of versatility in language model fine-tuning.

preprint2023arXiv

ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education

With the rapid evolution of Natural Language Processing (NLP), Large Language Models (LLMs) like ChatGPT have emerged as powerful tools capable of transforming various sectors. Their vast knowledge base and dynamic interaction capabilities represent significant potential in improving education by operating as a personalized assistant. However, the possibility of generating incorrect, biased, or unhelpful answers are a key challenge to resolve when deploying LLMs in an education context. This work introduces an innovative architecture that combines the strengths of ChatGPT with a traditional information retrieval based chatbot framework to offer enhanced student support in higher education. Our empirical evaluations underscore the high promise of this approach.

preprint2022arXiv

Anytime PSRO for Two-Player Zero-Sum Games

Policy space response oracles (PSRO) is a multi-agent reinforcement learning algorithm that has achieved state-of-the-art performance in very large two-player zero-sum games. PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next. We propose anytime double oracle (ADO), a tabular double oracle algorithm for 2-player zero-sum games that is guaranteed to converge to a Nash equilibrium while decreasing exploitability from one iteration to the next. Unlike DO, in which the restricted distribution is based on the restricted game formed by each player's strategy sets, ADO finds the restricted distribution for each player that minimizes its exploitability against any policy in the full, unrestricted game. We also propose a method of finding this restricted distribution via a no-regret algorithm updated against best responses, called RM-BR DO. Finally, we propose anytime PSRO (APSRO), a version of ADO that calculates best responses via reinforcement learning. In experiments on Leduc poker and random normal form games, we show that our methods achieve far lower exploitability than DO and PSRO and decrease exploitability monotonically.

preprint2022arXiv

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might need to add all deterministic policies before converging. In this work, we introduce \emph{Self-Play PSRO (SP-PSRO)}, a method that adds an approximately optimal stochastic policy to the population in each iteration. Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well. As a result, SP-PSRO empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.

preprint2022arXiv

Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search

Learning efficient and interpretable policies has been a challenging task in reinforcement learning (RL), particularly in the visual RL setting with complex scenes. While neural networks have achieved competitive performance, the resulting policies are often over-parameterized black boxes that are difficult to interpret and deploy efficiently. More recent symbolic RL frameworks have shown that high-level domain-specific programming logic can be designed to handle both policy learning and symbolic planning. However, these approaches rely on coded primitives with little feature learning, and when applied to high-dimensional visual scenes, they can suffer from scalability issues and perform poorly when images have complex object interactions. To address these challenges, we propose \textit{Differentiable Symbolic Expression Search} (DiffSES), a novel symbolic learning approach that discovers discrete symbolic policies using partially differentiable optimization. By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions, while also incorporating the strengths of neural networks for feature learning and optimization. Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more and scalable than state-of-the-art symbolic RL methods, with a reduced amount of symbolic prior knowledge.

preprint2022arXiv

Video-Specific Autoencoders for Exploring, Editing and Transmitting Videos

We study video-specific autoencoders that allow a human user to explore, edit, and efficiently transmit videos. Prior work has independently looked at these problems (and sub-problems) and proposed different formulations. In this work, we train a simple autoencoder (from scratch) on multiple frames of a specific video. We observe: (1) latent codes learned by a video-specific autoencoder capture spatial and temporal properties of that video; and (2) autoencoders can project out-of-sample inputs onto the video-specific manifold. These two properties allow us to explore, edit, and efficiently transmit a video using one learned representation. For e.g., linear operations on latent codes allow users to visualize the contents of a video. Associating latent codes of a video and manifold projection enables users to make desired edits. Interpolating latent codes and manifold projection allows the transmission of sparse low-res frames over a network.

preprint2022arXiv

XDO: A Double Oracle Algorithm for Extensive-Form Games

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games. Experiment code is available at https://github.com/indylab/nxdo.

preprint2019arXiv

Hierarchy of relaxation timescales in local random Liouvillians

To characterize the generic behavior of open quantum systems, we consider random, purely dissipative Liouvillians with a notion of locality. We find that the positivity of the map implies a sharp separation of the relaxation timescales according to the locality of observables. Specifically, we analyze a spin-1/2 system of size $\ell$ with up to $n$-body Lindblad operators, which are $n$-local in the complexity-theory sense. Without locality ($n=\ell$), the complex Liouvillian spectrum densely covers a "lemon"-shaped support, in agreement with recent findings [Phys. Rev. Lett. 123, 140403;arXiv:1905.02155]. However, for local Liouvillians ($n<\ell$), we find that the spectrum is composed of several dense clusters with random matrix spacing statistics, each featuring a lemon-shaped support wherein all eigenvectors correspond to $n$-body decay modes. This implies a hierarchy of relaxation timescales of $n$-body observables, which we verify to be robust in the thermodynamic limit.

preprint2014arXiv

Temperature Gated Thermal Rectifier

Heat flow control is essential for widespread applications of heating, cooling, energy conversion and utilization. Here we demonstrate the first observation of temperature-gated thermal rectification in vanadium dioxide beams, in which an environment temperature actively modulates asymmetric heat flow. In this three terminal device, there are two switchable states, which can be accessed by global heating: Rectifier state and Resistor state. In the Rectifier state, up to 28% thermal rectification is observed. In the Resistor state, the thermal rectification is significantly suppressed (below 4%). This temperature-gated rectifier can have substantial implications ranging from autonomous thermal management of micro/nanoscale devices to thermal energy conversion and storage.

preprint2011arXiv

Field-effect modulation of conductance in VO2 nanobeam transistors with HfO2 as the gate dielectric

We study field-effect transistors realized from VO2 nanobeams with HfO2 as the gate dielectric. When heated up from low to high temperatures, VO2 undergoes an insulator-to-metal transition. We observe a change in conductance (~ 6 percent) of our devices induced by gate voltage when the system is in the insulating phase. The response is reversible and hysteretic, and the area of hysteresis loop becomes larger as the rate of gate sweep is slowed down. A phase lag exists between the response of the conductance and the gate voltage. This indicates the existence of a memory of the system and we discuss its possible origins.

Kevin Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Neurosymbolic LoRA: Why and When to Tune Weights vs. Rewrite Prompts

ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education

Anytime PSRO for Two-Player Zero-Sum Games

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search

Video-Specific Autoencoders for Exploring, Editing and Transmitting Videos

XDO: A Double Oracle Algorithm for Extensive-Form Games

Hierarchy of relaxation timescales in local random Liouvillians

Temperature Gated Thermal Rectifier

Field-effect modulation of conductance in VO2 nanobeam transistors with HfO2 as the gate dielectric