Source author record

Bin Kang

Bin Kang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Biological Physics eess.IV physics.chem-ph physics.optics

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AgentSteerTTS: A Multi-Agent Closed-Loop Framework for Composite-Instruction Text-to-Speech

While existing text-to-speech (TTS) models exhibit high expressiveness, fine-grained control over composite instructions remains challenging due to the structural mismatch between discrete textual intents and continuous acoustic realizations. Inspired by human cognitive decoupling, we introduce AgentSteerTTS, a multi-agent closed-loop framework designed for intent-faithful expressive control of composite instructions. First, in our framework, an adversarial disentanglement agent mitigates speaker-emotion leakage by learning separable identity and emotion-prosody subspaces with leakage-suppressing regularization. Next, a Dual-Stream Anchoring Controller grounds abstract intents using a large-scale acoustic prototype library: a Retrieval Agent selects expressive anchors, while a Synthesis Agent fuses them into continuous control vectors via gated attention. Finally, a Fast-Slow Feedback Agent refines output intensity through latent gradient correction and resolves semantic-acoustic mismatches using high-level perceptual critique. Experiments on a composite-instruction benchmark and public test sets show that AgentSteerTTS yields consistent and significant improvements to the baselines, demonstrating the effectiveness of the proposed method.

preprint2022arXiv

Decrypting material performance by wide-field femtosecond interferometric imaging of energy carrier evolution

Energy carrier evolution is crucial for material performance. Ultrafast microscopy has been widely applied to visualize the spatiotemporal evolution of energy carriers. However, direct imaging of small amounts of energy carriers on nanoscale remains difficult due to extremely weak transient signals. Here we present a method for ultrasensitive and high-throughput imaging of energy carrier evolution in space and time. This method combines femtosecond pump-probe techniques with interferometric scattering microscopy (iSCAT), named Femto-iSCAT. The interferometric principle and unique spatially-modulated contrast enhancement increase the transient image contrast by >2 orders of magnitude and enable the exploration of new science. We address three important and challenging problems: transport of different energy carriers at various interfaces, heterogeneous hot electron distribution and relaxation in single plasmonic resonators, and distinct structure-dependent edge state dynamics of carriers and excitons in optoelectronic semiconductors. Femto-iSCAT holds great potential as a universal tool for ultrasensitive imaging of energy carrier evolution in space and time.

preprint2022arXiv

Thermodynamics and thermoeconomics of cell division in presence of exogenous materials in nucleus

Cell division is an essential biological process, and regulation of cell division is of relevance for many important fields of biology and medicine. Introducing exogenous substances, such as nanoparticles, into the nucleus, has been experimentally studied to regulate the division of cells. Herein we considered this phenomenon from a general view of energetics. Through analyzing the thermodynamics during the cell division process, we investigated the optimal symmetry for cell division and the effect of nanoparticles on the energy barriers. The presence of nanoparticles inside cell nucleus might arrest cells before cytokinesis or other stages, thereby regulate the cell division.

preprint2020arXiv

BiCANet: Bi-directional Contextual Aggregating Network for Image Semantic Segmentation

Exploring contextual information in convolution neural networks (CNNs) has gained substantial attention in recent years for semantic segmentation. This paper introduces a Bi-directional Contextual Aggregating Network, called BiCANet, for semantic segmentation. Unlike previous approaches that encode context in feature space, BiCANet aggregates contextual cues from a categorical perspective, which is mainly consist of three parts: contextual condensed projection block (CCPB), bi-directional context interaction block (BCIB), and muti-scale contextual fusion block (MCFB). More specifically, CCPB learns a category-based mapping through a split-transform-merge architecture, which condenses contextual cues with different receptive fields from intermediate layer. BCIB, on the other hand, employs dense skipped-connections to enhance the class-level context exchanging. Finally, MCFB integrates multi-scale contextual cues by investigating short- and long-ranged spatial dependencies. To evaluate BiCANet, we have conducted extensive experiments on three semantic segmentation datasets: PASCAL VOC 2012, Cityscapes, and ADE20K. The experimental results demonstrate that BiCANet outperforms recent state-of-the-art networks without any postprocess techniques. Particularly, BiCANet achieves the mIoU score of 86.7%, 82.4% and 38.66% on PASCAL VOC 2012, Cityscapes and ADE20K testset, respectively.

preprint2020arXiv

TEA: Temporal Excitation and Aggregation for Action Recognition

Temporal modeling is key for action recognition in videos. It normally considers both short-range motions and long-range aggregations. In this paper, we propose a Temporal Excitation and Aggregation (TEA) block, including a motion excitation (ME) module and a multiple temporal aggregation (MTA) module, specifically designed to capture both short- and long-range temporal evolution. In particular, for short-range motion modeling, the ME module calculates the feature-level temporal differences from spatiotemporal features. It then utilizes the differences to excite the motion-sensitive channels of the features. The long-range temporal aggregations in previous works are typically achieved by stacking a large number of local temporal convolutions. Each convolution processes a local temporal window at a time. In contrast, the MTA module proposes to deform the local convolution to a group of sub-convolutions, forming a hierarchical residual architecture. Without introducing additional parameters, the features will be processed with a series of sub-convolutions, and each frame could complete multiple temporal aggregations with neighborhoods. The final equivalent receptive field of temporal dimension is accordingly enlarged, which is capable of modeling the long-range temporal relationship over distant frames. The two components of the TEA block are complementary in temporal modeling. Finally, our approach achieves impressive results at low FLOPs on several action recognition benchmarks, such as Kinetics, Something-Something, HMDB51, and UCF101, which confirms its effectiveness and efficiency.

Bin Kang

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

AgentSteerTTS: A Multi-Agent Closed-Loop Framework for Composite-Instruction Text-to-Speech

Decrypting material performance by wide-field femtosecond interferometric imaging of energy carrier evolution

Thermodynamics and thermoeconomics of cell division in presence of exogenous materials in nucleus

BiCANet: Bi-directional Contextual Aggregating Network for Image Semantic Segmentation

TEA: Temporal Excitation and Aggregation for Action Recognition