Source author record

Truong Nguyen

Truong Nguyen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Computation and Language eess.IV Machine Learning Emerging Technologies math.NA Numerical Analysis

Catalog footprint

What is connected

12works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BSO: Safety Alignment Is Density Ratio Matching

Aligning language models for both helpfulness and safety typically requires complex pipelines-separate reward and cost models, online reinforcement learning, and primal-dual updates. Recent direct preference optimization approaches simplify training but incorporate safety through ad-hoc modifications such as multi-stage procedures or heuristic margin terms, lacking a principled derivation. We show that the likelihood ratio of the optimal safe policy admits a closed-form decomposition that reduces safety alignment to a density ratio matching problem. Minimizing Bregman divergences between the data and model ratios yields Bregman Safety Optimization (BSO), a family of single-stage loss functions, each induced by a convex generator, that provably recover the optimal safe policy. BSO is both general and simple: it requires no auxiliary models, introduces only one hyperparameter beyond standard preference optimization, and recovers existing safety-aware methods as special cases. Experiments across safety alignment benchmarks show that BSO consistently improves the safety-helpfulness trade-off.

preprint2026arXiv

CTPD: Cross Tokenizer Preference Distillation

While knowledge distillation has seen widespread use in pre-training and instruction tuning, its application to aligning language models with human preferences remains underexplored, particularly in the more realistic cross-tokenizer setting. The incompatibility of tokenization schemes between teacher and student models has largely prevented fine-grained, white-box distillation of preference information. To address this gap, we propose Cross-Tokenizer Preference Distillation (CTPD), the first unified framework for transferring human-aligned behavior between models with heterogeneous tokenizers. CTPD introduces three key innovations: (1) Aligned Span Projection, which maps teacher and student tokens to shared character-level spans for precise supervision transfer; (2) a cross-tokenizer adaptation of Token-level Importance Sampling (TIS-DPO) for improved credit assignment; and (3) a Teacher-Anchored Reference, allowing the student to directly leverage the teacher's preferences in a DPO-style objective. Our theoretical analysis grounds CTPD in importance sampling, and experiments across multiple benchmarks confirm its effectiveness, with significant performance gains over existing methods. These results establish CTPD as a practical and general solution for preference distillation across diverse tokenization schemes, opening the door to more accessible and efficient alignment of language models.

preprint2026arXiv

Establishing Robust Retinal Eye Tracking: A Weakly Supervised Algorithmic Framework

Retinal image-based eye tracking is widely used in ophthalmic imaging and vision science, and is a promising path to deliver higher gaze accuracy than the pupil- and cornea-based approaches commonly used in modern AR/VR devices. Nevertheless, existing retinal tracking algorithms still primarily rely on classical template-matching registration, which can be insufficiently robust to retinal feature variability and real-world imaging conditions. In this work, we propose a novel weakly-supervised, learning-based framework for robust retinal eye tracking. Initial studies demonstrate high accuracy, achieving the 95th-percentile gaze error < 0.45 deg across a cohort of 6 participants.

preprint2026arXiv

Spectral Flattening Is All Muon Needs: How Orthogonalization Controls Learning Rate and Convergence

Muon orthogonalizes the momentum buffer before each update, replacing its singular values with ones via Newton-Schulz iterations. This simple change lets Muon tolerate far larger learning rates and converge faster than other optimizers, but why? We show that the mechanism is spectral flattening, and develop two results around it. First, we prove that Muon's maximal stable step size scales with the average singular value of the gradient rather than the largest, which bottlenecks standard gradient descent. Second, we recast Muon as a preconditioned gradient method and show, under a Kronecker-factored curvature model, that it improves the effective convergence factor, with the improvement controlled by the spectrum of the gradient covariance. Extensive experiments validate both results: Muon remains stable at learning rates that cause SGD to diverge within the first few iterations, and reaches accuracy milestones several epochs earlier even at identical step sizes. Taken together, our results offer a principled, geometric explanation for Muon's empirical success.

preprint2026arXiv

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

Direct Preference Optimization (DPO) is a widely used RL-free method for aligning language models from pairwise preferences, but it models preferences over full sequences even though generation is driven by per-token decisions. Existing token-level extensions typically decompose a sequence-level Bradley-Terry objective across timesteps, leaving per-prefix (state-wise) optimality implicit. We study how to recover token-level preference optimality using only standard sequence-level pairwise comparisons. We introduce Token-level Bregman Preference Optimization (TBPO), which posits a token-level Bradley-Terry preference model over next-token actions conditioned on the prefix, and derive a Bregman-divergence density-ratio matching objective that generalizes the logistic/DPO loss while preserving the optimal policy induced by the token-level model and maintaining DPO-like simplicity. We introduce two instantiations: TBPO-Q, which explicitly learns a lightweight state baseline, and TBPO-A, which removes the baseline through advantage normalization. Across instruction following, helpfulness/harmlessness, and summarization benchmarks, TBPO improves alignment quality and training stability and increases output diversity relative to strong sequence-level and token-level baselines.

preprint2025arXiv

Universal Vessel Segmentation for Multi-Modality Retinal Images

We identify two major limitations in the existing studies on retinal vessel segmentation: (1) Most existing works are restricted to one modality, i.e., the Color Fundus (CF). However, multi-modality retinal images are used every day in the study of the retina and diagnosis of retinal diseases, and the study of vessel segmentation on other modalities is scarce; (2) Even though a few works extended their experiments to new modalities such as the Multi-Color Scanning Laser Ophthalmoscopy (MC), these works still require fine-tuning a separate model for the new modality. The fine-tuning will require extra training data, which is difficult to acquire. In this work, we present a novel universal vessel segmentation model (URVSM) for multi-modality retinal images. In addition to performing the study on a much wider range of image modalities, we also propose a universal model to segment the vessels in all these commonly used modalities. While being much more versatile compared with existing methods, our universal model also demonstrates comparable performance to the state-of-the-art fine-tuned methods. To the best of our knowledge, this is the first work that achieves modality-agnostic retinal vessel segmentation and the first to study retinal vessel segmentation in several novel modalities.

preprint2020arXiv

Domain Decomposition Parabolic Monge-Ampère Approach for Fast Generation of Adaptive Moving Meshes

A fast method is presented for adaptive moving mesh generation in multi-dimensions using a domain decomposition parabolic Monge-Ampère approach. The domain decomposition procedure employed here is non-iterative and involves splitting the computational domain into overlapping subdomains. An adaptive mesh on each subdomain is then computed as the image of the solution of the $L^2$ optimal mass transfer problem using a parabolic Monge-Ampère method. The domain decomposition approach allows straightforward implementation for the parallel computation of adaptive meshes which helps to reduce computational time significantly. Results are presented to show the numerical convergence of the domain decomposition solution to the single domain solution. Several numerical experiments are given to demonstrate the performance and efficiency of the proposed method. The numerical results indicate that the domain decomposition parabolic Monge-Ampère method is more efficient than the standard implementation of the parabolic Monge-Ampère method on the whole domain, in particular when computing adaptive meshes in three spatial dimensions.

preprint2016arXiv

Context Matters: Refining Object Detection in Video with Recurrent Neural Networks

Given the vast amounts of video available online, and recent breakthroughs in object detection with static images, object detection in video offers a promising new frontier. However, motion blur and compression artifacts cause substantial frame-level variability, even in videos that appear smooth to the eye. Additionally, video datasets tend to have sparsely annotated frames. We present a new framework for improving object detection in videos that captures temporal context and encourages consistency of predictions. First, we train a pseudo-labeler, that is, a domain-adapted convolutional neural network for object detection. The pseudo-labeler is first trained individually on the subset of labeled frames, and then subsequently applied to all frames. Then we train a recurrent neural network that takes as input sequences of pseudo-labeled frames and optimizes an objective that encourages both accuracy on the target frame and consistency across consecutive frames. The approach incorporates strong supervision of target frames, weak-supervision on context frames, and regularization via a smoothness penalty. Our approach achieves mean Average Precision (mAP) of 68.73, an improvement of 7.1 over the strongest image-based baselines for the Youtube-Video Objects dataset. Our experiments demonstrate that neighboring frames can provide valuable information, even absent labels.

preprint2016arXiv

Detecting Temporally Consistent Objects in Videos through Object Class Label Propagation

Object proposals for detecting moving or static video objects need to address issues such as speed, memory complexity and temporal consistency. We propose an efficient Video Object Proposal (VOP) generation method and show its efficacy in learning a better video object detector. A deep-learning based video object detector learned using the proposed VOP achieves state-of-the-art detection performance on the Youtube-Objects dataset. We further propose a clustering of VOPs which can efficiently be used for detecting objects in video in a streaming fashion. As opposed to applying per-frame convolutional neural network (CNN) based object detection, our proposed method called Objects in Video Enabler thRough LAbel Propagation (OVERLAP) needs to classify only a small fraction of all candidate proposals in every video frame through streaming clustering of object proposals and class-label propagation. Source code will be made available soon.

preprint2015arXiv

Beyond Semantic Image Segmentation : Exploring Efficient Inference in Video

We explore the efficiency of the CRF inference module beyond image level semantic segmentation. The key idea is to combine the best of two worlds of semantic co-labeling and exploiting more expressive models. Similar to [Alvarez14] our formulation enables us perform inference over ten thousand images within seconds. On the other hand, it can handle higher-order clique potentials similar to [vineet2014] in terms of region-level label consistency and context in terms of co-occurrences. We follow the mean-field updates for higher order potentials similar to [vineet2014] and extend the spatial smoothness and appearance kernels [DenseCRF13] to address video data inspired by [Alvarez14]; thus making the system amenable to perform video semantic segmentation most effectively.

preprint2015arXiv

Semantic Video Segmentation : Exploring Inference Efficiency

We explore the efficiency of the CRF inference beyond image level semantic segmentation and perform joint inference in video frames. The key idea is to combine best of two worlds: semantic co-labeling and more expressive models. Our formulation enables us to perform inference over ten thousand images within seconds and makes the system amenable to perform video semantic segmentation most effectively. On CamVid dataset, with TextonBoost unaries, our proposed method achieves up to 8% improvement in accuracy over individual semantic image segmentation without additional time overhead. The source code is available at https://github.com/subtri/video_inference

preprint2014arXiv

Improving Streaming Video Segmentation with Early and Mid-Level Visual Processing

Despite recent advances in video segmentation, many opportunities remain to improve it using a variety of low and mid-level visual cues. We propose improvements to the leading streaming graph-based hierarchical video segmentation (streamGBH) method based on early and mid level visual processing. The extensive experimental analysis of our approach validates the improvement of hierarchical supervoxel representation by incorporating motion and color with effective filtering. We also pose and illuminate some open questions towards intermediate level video analysis as further extension to streamGBH. We exploit the supervoxels as an initialization towards estimation of dominant affine motion regions, followed by merging of such motion regions in order to hierarchically segment a video in a novel motion-segmentation framework which aims at subsequent applications such as foreground recognition.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint