Source author record

Junjie Yu

Junjie Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language cond-mat.mtrl-sci cs.CY Machine Learning Networking and Internet Architecture physics.chem-ph

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AcademiClaw: When Students Set Challenges for AI Agents

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' real academic workflows -- homework, research projects, competitions, and personal projects -- that they found current AI agents unable to solve effectively. Curated from 230 student-submitted candidates through rigorous expert review, the final task set spans 25+ professional domains, ranging from olympiad-level mathematics and linguistics problems to GPU-intensive reinforcement learning and full-stack system debugging, with 16 tasks requiring CUDA GPU execution. Each task executes in an isolated Docker sandbox and is scored on task completion by multi-dimensional rubrics combining six complementary techniques, with an independent five-category safety audit providing additional behavioral analysis. Experiments on six frontier models show that even the best achieves only a 55\% pass rate. Further analysis uncovers sharp capability boundaries across task domains, divergent behavioral strategies among models, and a disconnect between token consumption and output quality, providing fine-grained diagnostic signals beyond what aggregate metrics reveal. We hope that AcademiClaw and its open-sourced data and code can serve as a useful resource for the OpenClaw community, driving progress toward agents that are more capable and versatile across the full breadth of real-world academic demands. All data and code are available at https://github.com/GAIR-NLP/AcademiClaw.

preprint2026arXiv

Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation

Finetuning pretrained models occurs in a low-dimensional subspace of the full parameter space. Prior work has focused on characterizing this optimization subspace, but largely ignored the complementary question: why do certain directions remain unexplored during finetuning? Are these stable directions irrelevant to downstream tasks, or do they already encode task-relevant structure that requires no further adjustment? Answering this question is central to understanding how pretrained knowledge transfers. Through systematic spectral analysis across vision and language models, we show that the leading singular vectors of pretrained weight matrices remain highly stable under finetuning and are shared across unrelated downstream tasks, revealing that pretraining establishes a reusable spectral coordinate system. Models pretrained on larger datasets exhibit greater spectral stability under distribution shift or task change, directly linking pretraining scale to geometric transferability. Motivated by these findings, we propose a parameter-efficient method that freezes pretrained singular vectors and optimizes only leading spectral coefficients, achieving competitive performance on GLUE with 0.2% trainable parameters. Our results reveal that the stable directions encode transferable structure rather than irrelevant noise: successful pretraining discovers spectral bases that downstream tasks inherit and operate within.

preprint2026arXiv

SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning

We introduce SeePhys Pro, a fine-grained modality transfer benchmark that studies whether models preserve the same reasoning capability when critical information is progressively transferred from text to image. Unlike standard vision-essential benchmarks that evaluate a single input form, SeePhys Pro features four semantically aligned variants for each problem with progressively increasing visual elements. Our evaluation shows that current frontier models are far from representation-invariant reasoners: performance degrades on average as information moves from language to diagrams, with visual variable grounding as the most critical bottleneck. Motivated by this inference-time fragility, we further develop large training corpora for multimodal RLVR and use blind training as a diagnostic control, finding that RL with all training images masked can still improve performance on unmasked validation sets. To analyze this effect, text-deletion, image-mask-rate, and format-saturation controls suggest that such gains can arise from residual textual and distributional cues rather than valid visual evidence. Our results highlight the need to evaluate multimodal reasoning not only by final-answer accuracy, but also by robustness under modality transfer and by diagnostics that test whether improvements rely on task-critical visual evidence.

preprint2022arXiv

STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction

We present a simple yet effective self-training approach, named as STAD, for low-resource relation extraction. The approach first classifies the auto-annotated instances into two groups: confident instances and uncertain instances, according to the probabilities predicted by a teacher model. In contrast to most previous studies, which mainly only use the confident instances for self-training, we make use of the uncertain instances. To this end, we propose a method to identify ambiguous but useful instances from the uncertain instances and then divide the relations into candidate-label set and negative-label set for each ambiguous instance. Next, we propose a set-negative training method on the negative-label sets for the ambiguous instances and a positive training method for the confident instances. Finally, a joint-training method is proposed to build the final relation extraction system on all data. Experimental results on two widely used datasets SemEval2010 Task-8 and Re-TACRED with low-resource settings demonstrate that this new self-training approach indeed achieves significant and consistent improvements when comparing to several competitive self-training systems. Code is publicly available at https://github.com/jjyunlp/STAD

preprint2020arXiv

Energy Minimization for Mobile Edge Computing Networks with Time-Sensitive Constraints

Mobile edge computing (MEC) provides users with a high quality experience (QoE) by placing servers with rich services close to the end users. Compared with local computing, MEC can contribute to energy saving, but results in increased communication latency. In this paper, we jointly optimize task offloading and resource allocation to minimize the energy consumption in an orthogonal frequency division multiple access (OFDMA)-based MEC networks, where the time-sensitive tasks can be processed at both local users and MEC server via partial offloading. Since the optimization variables of the problem are strongly coupled, we first decompose the original problem into two subproblems named as offloading selection (PO), and subcarriers and computing resource allocation (PS), and then propose an iterative algorithm to deal with them in a sequence. To be specific, we derive the closed-form solution for PO, and deal with PS by an alternating way in the dual domain due to its NP-hardness. Simulation results demonstrate

preprint2014arXiv

Developing an aqueous approach for synthesizing Au and M@Au (M = Pd, CuPt) hybrid nanostars with plasmonic properties

Anisotropic Au nanoparticles show unique localized surface plasmon resonance (LSPR) properties, which make it attractive in optical, sensing, and biomedical applications. In this contribution, we report a general and facile strategy towards aqueous synthesis of Au and M@Au (M = Pd, CuPt) hybrid nanostars by reducing HAuCl4 with ethanolamine in the presence of cetyltrimethylammonium bromide (CTAB). According to electron microscopic observation and spectral monitoring, we found that the layered epitaxial growth mode (i.e., Frank-van der Merwe mechanism) contributes to the enlargement of the core, while, the random attachment of Au nanoclusters onto the cores accounts for the formation of the branches. Both of them are indispensable for the formation of the nanostars. The LSPR properties of the Au nanoparticles have been well investigated with morphology control via precursor amount and growth temperature. The Au nanostars showed improved surface-enhanced Raman spectroscopy (SERS) performance for rhodamine 6G due to their sharp edges and tips, which were therefore confirmed as good SERS substrate to detect trace amount of molecules.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint