Source author record

Fang Wu

Fang Wu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computational Engineering, Finance, and Science Artificial Intelligence astro-ph.CO cs.CY physics.soc-ph Computer Vision Quantitative Methods Robotics

Catalog footprint

What is connected

9works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training

Diffusion models have achieved remarkable success across a wide range of generative tasks, yet their training paradigm largely treats injected noise as uniformly informative. In this work, we challenge this assumption and introduce NoiseRater, a meta-learning framework for instance-level noise valuation in diffusion model training. We propose a parametric noise rater that assigns importance scores to individual noise realizations conditioned on data and timestep, enabling adaptive reweighting of the training objective. The rater is trained via bilevel optimization to improve downstream validation performance after inner-loop diffusion updates. To enable efficient deployment, we further design a decoupled two-stage pipeline that transitions from soft weighting during meta-training to hard noise selection during standard training. Extensive experiments on FFHQ and ImageNet demonstrate that not all noise samples contribute equally, and that prioritizing informative noise improves both training efficiency and generation quality. Our results establish noise valuation as a complementary and previously underexplored axis for improving diffusion model training. Our code is available at: https://anonymous.4open.science/r/NoiseRater-DEB116.

preprint2026arXiv

Proteo-R1: Reasoning Foundation Models for De Novo Protein Design

Deep learning in \emph{de novo} protein design has achieved atomic-level fidelity. However, existing models remain largely non-deliberative: they directly synthesize molecular geometries without explicitly reasoning about which residues or interactions are functionally essential. As a result, design decisions are entangled with continuous sampling dynamics, limiting interpretability, controllability, and systematic reuse of biochemical knowledge. We introduce \textbf{Proteo-R1}, a reasoning-guided protein design framework that explicitly decouples \emph{molecular understanding} from \emph{geometric generation}. Proteo-R1 adopts a dual-expert architecture in which a multimodal large language model (MLLM) serves as an \emph{understanding expert}, analyzing protein sequences, structures, and textual context to identify key functional residues that govern binding and specificity. These residue-level decisions are then passed as hard constraints to a separate diffusion-based \emph{generation expert}, which performs conditional co-design while respecting the fixed interaction anchors. This factorization mirrors how human experts approach molecular engineering: first, reasoning about critical interactions, then optimizing geometry subject to those constraints. By operationalizing reasoning as explicit residue-level commitments rather than latent textual guidance, Proteo-R1 achieves stable, interpretable, and modular integration of LLM reasoning with state-of-the-art geometric generative models. Code, data, and demos are available at https://smiles724.github.io/r1/.

preprint2023arXiv

DiffMD: A Geometric Diffusion Model for Molecular Dynamics Simulations

Molecular dynamics (MD) has long been the de facto choice for simulating complex atomistic systems from first principles. Recently deep learning models become a popular way to accelerate MD. Notwithstanding, existing models depend on intermediate variables such as the potential energy or force fields to update atomic positions, which requires additional computations to perform back-propagation. To waive this requirement, we propose a novel model called DiffMD by directly estimating the gradient of the log density of molecular conformations. DiffMD relies on a score-based denoising diffusion generative model that perturbs the molecular structure with a conditional noise depending on atomic accelerations and treats conformations at previous timeframes as the prior distribution for sampling. Another challenge of modeling such a conformation generation process is that a molecule is kinetic instead of static, which no prior works have strictly studied. To solve this challenge, we propose an equivariant geometric Transformer as the score function in the diffusion process to calculate corresponding gradients. It incorporates the directions and velocities of atomic motions via 3D spherical Fourier-Bessel representations. With multiple architectural improvements, we outperform state-of-the-art baselines on MD17 and isomers of C7O2H10 datasets. This work contributes to accelerating material and drug discovery.

preprint2023arXiv

Molformer: Motif-based Transformer on 3D Heterogeneous Molecular Graphs

Procuring expressive molecular representations underpins AI-driven molecule design and scientific discovery. The research mainly focuses on atom-level homogeneous molecular graphs, ignoring the rich information in subgraphs or motifs. However, it has been widely accepted that substructures play a dominant role in identifying and determining molecular properties. To address such issues, we formulate heterogeneous molecular graphs (HMGs), and introduce a novel architecture to exploit both molecular motifs and 3D geometry. Precisely, we extract functional groups as motifs for small molecules and employ reinforcement learning to adaptively select quaternary amino acids as motif candidates for proteins. Then HMGs are constructed with both atom-level and motif-level nodes. To better accommodate those HMGs, we introduce a variant of Transformer named Molformer, which adopts a heterogeneous self-attention layer to distinguish the interactions between multi-level nodes. Besides, it is also coupled with a multi-scale mechanism to capture fine-grained local patterns with increasing contextual scales. An attentive farthest point sampling algorithm is also proposed to obtain the molecular representations. We validate Molformer across a broad range of domains, including quantum chemistry, physiology, and biophysics. Extensive experiments show that Molformer outperforms or achieves the comparable performance of several state-of-the-art baselines. Our work provides a promising way to utilize informative motifs from the perspective of multi-level graph construction.

preprint2022arXiv

Towards Collaborative Simultaneous Localization and Mapping: a Survey of the Current Research Landscape

Motivated by the tremendous progress we witnessed in recent years, this paper presents a survey of the scientific literature on the topic of Collaborative Simultaneous Localization and Mapping (C-SLAM), also known as multi-robot SLAM. With fleets of self-driving cars on the horizon and the rise of multi-robot systems in industrial applications, we believe that Collaborative SLAM will soon become a cornerstone of future robotic applications. In this survey, we introduce the basic concepts of C-SLAM and present a thorough literature review. We also outline the major challenges and limitations of C-SLAM in terms of robustness, communication, and resource management. We conclude by exploring the area's current trends and promising research avenues.

preprint2012arXiv

Kinematics of the Compact Symmetric Object OQ 208 revisited

Aims. A long timeline kinematic study of the archetypal CSO OQ 208 sheds light on the physical properties of the most compact radio sources. Methods. Archival data from the VLBA at 15 GHz over a time span of 13.6 yr are used to investigate the kinematics of the radio source. The flux density monitoring data obtained at the Michigan 26-meter radio telescope are also used as supplementary information. Results. At 8.4 and 15 GHz, the two lobes are resolved into two sub-components, identified as hotspots. A knotty jet is linked with the NE hotspot and traces back toward the geometric center. The core is too weak to be detected. Significant flux density variation is found in the primary hotspots with the maximum level of 62% (NE1) and 19% (SW1). The peak in the flux density of NE1 leads that of SW1 by approximately 5.00 yr, suggesting that the northeast lobe is advancing and the southwest lobe is receding. This light travel difference indicates a radial distance difference between the two hotspots of 1.53 pc, which indicates an inclination angle of about 80.8 degree between the radio jet and the line of sight. The angular separation rate between NE1 and SW1 is 0.027 mas/yr (or 0.133 c). The inner jet knot moves at 0.047 mas/yr (or 0.230 c), about 3.5 times the hotspot advancing speed. Conclusions. The large viewing angle and the modest jet speed suggest a mildly relativistic jet. The jet axis is close to the plane of the sky. The separation rate and the distance between the two primary hotspots result in a kinematic age of 255$\pm$17 yr, confirming that OQ 208 is indeed a young radio source. In addition to the hotspot advancing motions, sideways motions provide evidence that the lobes are obstructed by the external interstellar medium.

preprint2011arXiv

VLBI observations of 10 CSO candidates: expansion velocities of hot spots

Observations of ten Compact Symmetric Objects ({\rm CSO}) candidates have been made with the Very Long Baseline Array at 8.4 GHz in 2005 and with a combined Chinese and European VLBI array at 8.4 GHz in 2009. The 2009 observations incorporate for the first time the two new Chinese telescopes at Miyun and Kunming for international astrophysical observations. The observational data, in combination with archival VLBA data from previous epochs, have been used to derive the proper motions of the VLBI components. Because of the long time baseline of $\sim$16 years of the VLBI data sets, the expansion velocities of the hot spots can be measured at an accuracy as high as $\sim$1.3 $μ$as yr$^{-1}$. Six of the ten sources are identified as CSOs with a typical double or triple morphology on the basis of both spectral index maps and their mirror-symmetry of proper motions of the terminal hot spots. The compact double source J1324+4048 is also identified as a CSO candidate. Among the three remaining sources, J1756+5748 and J2312+3847 are identified as core-jet sources with proper motions of their jet components relating to systemic source expansion. The third source J0017+5312 is likely also a core-jet source, but a robust detection of a core is needed for an unambiguous identification. The kinematic ages of the CSOs derived from proper motions range from 300 to 2500 years. The kinematic age distribution of the CSOs confirm an overabundance of compact young CSOs with ages less than 500 years. CSOs with known kinematic ages may be used to study the dynamical evolution of extragalactic radio sources at early stages.

preprint2010arXiv

Harvesting Collective Intelligence: Temporal Behavior in Yahoo Answers

When harvesting collective intelligence, a user wishes to maximize the accuracy and value of the acquired information without spending too much time collecting it. We empirically study how people behave when facing these conflicting objectives using data from Yahoo Answers, a community driven question-and-answer site. We take two complementary approaches. We first study how users behave when trying to maximize the amount of the acquired information, while minimizing the waiting time. We identify and quantify how question authors at Yahoo Answers trade off the number of answers they receive and the cost of waiting. We find that users are willing to wait more to obtain an additional answer when they have only received a small number of answers; this implies decreasing marginal returns in the amount of collected information. We also estimate the user's utility function from the data. Our second approach focuses on how users assess the qualities of the individual answers without explicitly considering the cost of waiting. We assume that users make a sequence of decisions, deciding to wait for an additional answer as long as the quality of the current answer exceeds some threshold. Under this model, the probability distribution for the number of answers that a question gets is an inverse Gaussian, which is a Zipf-like distribution. We use the data to validate this conclusion.

preprint2010arXiv

Human Speed-Accuracy Tradeoffs in Search

When foraging for information, users face a tradeoff between the accuracy and value of the acquired information and the time spent collecting it, a problem which also surfaces when seeking answers to a question posed to a large community. We empirically study how people behave when facing these conflicting objectives using data from Yahoo Answers, a community driven question-and-answer site. We first study how users behave when trying to maximize the amount of acquired information while minimizing the waiting time. We find that users are willing to wait longer for an additional answer if they have received a small number of answers. We then assume that users make a sequence of decisions, deciding to wait for an additional answer as long as the quality of the current answer exceeds some threshold. The resulting probability distribution for the number of answers that a question gets is an inverse Gaussian, a fact that is validated by our data.

Fang Wu

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training

Proteo-R1: Reasoning Foundation Models for De Novo Protein Design

DiffMD: A Geometric Diffusion Model for Molecular Dynamics Simulations

Molformer: Motif-based Transformer on 3D Heterogeneous Molecular Graphs

Towards Collaborative Simultaneous Localization and Mapping: a Survey of the Current Research Landscape

Kinematics of the Compact Symmetric Object OQ 208 revisited

VLBI observations of 10 CSO candidates: expansion velocities of hot spots

Harvesting Collective Intelligence: Temporal Behavior in Yahoo Answers

Human Speed-Accuracy Tradeoffs in Search