Source author record

Jiajun Zhang

Jiajun Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language astro-ph.CO astro-ph.IM astro-ph.GA Artificial Intelligence gr-qc astro-ph.HE astro-ph.SR Computer Vision Machine Learning Computational Geometry Distributed, Parallel, and Cluster Computing hep-ph hep-th Quantitative Methods Robotics Social and Information Networks Software Engineering Symbolic Computation

Catalog footprint

What is connected

43works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements

Benchmarks play a crucial role in tracking the rapid advancement of large language models (LLMs) and identifying their capability boundaries. However, existing benchmarks predominantly curate questions at the question level, suffering from three fundamental limitations: vulnerability to data contamination, restriction to single-knowledge-point assessment, and reliance on costly domain expert annotation. We propose Encyclo-K, a statement-based benchmark that rethinks benchmark construction from the ground up. Our key insight is that knowledge statements, not questions, can serve as the unit of curation, and questions can then be constructed from them. We extract standalone knowledge statements from authoritative textbooks and dynamically compose them into evaluation questions through random sampling at test time. This design directly addresses all three limitations: the combinatorial space is too vast to memorize, and model rankings remain stable across dynamically generated question sets, enabling reliable periodic dataset refresh; each question aggregates 8-10 statements for comprehensive multi-knowledge assessment; annotators only verify formatting compliance without requiring domain expertise, substantially reducing annotation costs. Experiments on over 50 LLMs demonstrate that Encyclo-K poses substantial challenges with strong discriminative power. Even the top-performing OpenAI-GPT-5.1 achieves only 62.07% accuracy, and model performance displays a clear gradient distribution--reasoning models span from 16.04% to 62.07%, while chat models range from 9.71% to 50.40%. These results validate the challenges introduced by dynamic evaluation and multi-statement comprehensive understanding. These findings establish Encyclo-K as a scalable framework for dynamic evaluation of LLMs' comprehensive understanding over multiple fine-grained disciplinary knowledge statements.

preprint2026arXiv

GR-Dexter Technical Report

Vision-language-action (VLA) models have enabled language-conditioned, long-horizon robot manipulation, but most existing systems are limited to grippers. Scaling VLA policies to bimanual robots with high degree-of-freedom (DoF) dexterous hands remains challenging due to the expanded action space, frequent hand-object occlusions, and the cost of collecting real-robot data. We present GR-Dexter, a holistic hardware-model-data framework for VLA-based generalist manipulation on a bimanual dexterous-hand robot. Our approach combines the design of a compact 21-DoF robotic hand, an intuitive bimanual teleoperation system for real-robot data collection, and a training recipe that leverages teleoperated robot trajectories together with large-scale vision-language and carefully curated cross-embodiment datasets. Across real-world evaluations spanning long-horizon everyday manipulation and generalizable pick-and-place, GR-Dexter achieves strong in-domain performance and improved robustness to unseen objects and unseen instructions. We hope GR-Dexter serves as a practical step toward generalist dexterous-hand robotic manipulation.

preprint2026arXiv

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era

The rapid development of interactive and autonomous AI systems signals our entry into the agentic era. Training and evaluating agents on complex agentic tasks such as software engineering and computer use requires not only efficient model computation but also sophisticated infrastructure capable of coordinating vast agent-environment interactions. However, no open-source infrastructure can effectively support large-scale training and evaluation on such complex agentic tasks. To address this challenge, we present MegaFlow, a large-scale distributed orchestration system that enables efficient scheduling, resource allocation, and fine-grained task management for agent-environment workloads. MegaFlow abstracts agent training infrastructure into three independent services (Model Service, Agent Service, and Environment Service) that interact through unified interfaces, enabling independent scaling and flexible resource allocation across diverse agent-environment configurations. In our agent training deployments, MegaFlow successfully orchestrates tens of thousands of concurrent agent tasks while maintaining high system stability and achieving efficient resource utilization. By enabling such large-scale agent training, MegaFlow addresses a critical infrastructure gap in the emerging agentic AI landscape.

preprint2026arXiv

PlotCraft: Pushing the Limits of LLMs for Complex and Interactive Data Visualization

Recent Large Language Models (LLMs) have demonstrated remarkable proficiency in code generation. However, their ability to create complex visualizations for scaled and structured data remains largely unevaluated and underdeveloped. To address this gap, we introduce PlotCraft, a new benchmark featuring 1k challenging visualization tasks that cover a wide range of topics, such as finance, scientific research, and sociology. The benchmark is structured around seven high-level visualization tasks and encompasses 48 distinct chart types. Crucially, it is the first to systematically evaluate both single-turn generation and multi-turn refinement across a diverse spectrum of task complexities. Our comprehensive evaluation of 23 leading LLMs on PlotCraft reveals obvious performance deficiencies in handling sophisticated visualization tasks. To bridge this performance gap, we develope SynthVis-30K, a large-scale, high-quality dataset of complex visualization code synthesized via a collaborative agent framework. Building upon this dataset, we develope PlotCraftor, a novel code generation model that achieves strong capabilities in complex data visualization with a remarkably small size. Across VisEval, PandasPlotBench, and our proposed PlotCraft, PlotCraftor shows performance comparable to that of leading proprietary approaches. Especially, on hard task, Our model achieves over 50% performance improvement. We will release the benchmark, dataset, and code at https://github.com/Speakn0w/PlotCraft-Benchmark.

preprint2026arXiv

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

Tokenization is a foundational step in the text process of Large Language Models (LLMs). Texts must be first tokenized into token IDs, which are then input to LLMs. Inefficient tokenization results in long token-ID sequences and will slow down the training and inference of LLMs. The fine-grained knowledge transfer between LLMs, like token-level distillation, is also impeded by the mismatch in vocabulary. To bridge this gap, we introduce a method named TokAlign++ to improve vocabulary adaptation performance by learning better token alignment lexicon. The source and target vocabularies are taken as two different languages, and the bilingual token alignment lexicon is learned from monolingual token representations. Model parameters are rearranged following this bilingual lexicon for new vocabulary, and progressively fine-tuned for adaptation. Experimental results on 15 languages show that our method boosts the multilingual text compression rates and preserves most of the multilingual ability of vanilla models. It costs as few as 1k steps to restore the performance of the vanilla model. After unifying vocabularies between vanilla models, token-level distillation remarkably improves the base model with only 235M tokens.

preprint2026arXiv

VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models

Vision-Language-Action (VLA) models, which integrate pretrained large Vision-Language Models (VLM) into their policy backbone, are gaining significant attention for their promising generalization capabilities. This paper revisits a fundamental yet seldom systematically studied question: how VLM choice and competence translate to downstream VLA policies performance? We introduce VLM4VLA, a minimal adaptation pipeline that converts general-purpose VLMs into VLA policies using only a small set of new learnable parameters for fair and efficient comparison. Despite its simplicity, VLM4VLA proves surprisingly competitive with more sophisticated network designs. Through extensive empirical studies on various downstream tasks across three benchmarks, we find that while VLM initialization offers a consistent benefit over training from scratch, a VLM's general capabilities are poor predictors of its downstream task performance. This challenges common assumptions, indicating that standard VLM competence is necessary but insufficient for effective embodied control. We further investigate the impact of specific embodied capabilities by fine-tuning VLMs on seven auxiliary embodied tasks (e.g., embodied QA, visual pointing, depth estimation). Contrary to intuition, improving a VLM's performance on specific embodied skills does not guarantee better downstream control performance. Finally, modality-level ablations identify the visual module in VLM, rather than the language component, as the primary performance bottleneck. We demonstrate that injecting control-relevant supervision into the vision encoder of the VLM yields consistent gains, even when the encoder remains frozen during downstream fine-tuning. This isolates a persistent domain gap between current VLM pretraining objectives and the requirements of embodied action-planning.

preprint2026arXiv

World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning

Vision-language models (VLMs) have shown strong performance on static visual understanding, yet they still struggle with dynamic spatial reasoning that requires imagining how scenes evolve under egocentric motion. Recent efforts address this limitation either by scaling spatial supervision with synthetic data or by coupling VLMs with world models at inference time. However, the former often lacks explicit modeling of motion-conditioned state transitions, while the latter incurs substantial computational overhead. In this work, we propose World2VLM, a training framework that distills spatial imagination from a generative world model into a vision-language model. Given an initial observation and a parameterized camera trajectory, we use a view-consistent world model to synthesize geometrically aligned future views and derive structured supervision for both forward (action-to-outcome) and inverse (outcome-to-action) spatial reasoning. We post-train the VLM with a two-stage recipe on a compact dataset generated by this pipeline and evaluate it on multiple spatial reasoning benchmarks. World2VLM delivers consistent improvements over the base model across diverse benchmarks, including SAT-Real, SAT-Synthesized, VSI-Bench, and MindCube. It also outperforms the test-time world-model-coupled methods while eliminating the need for expensive inference-time generation. Our results suggest that world models can serve not only as inference-time tools, but also as effective training-time teachers, enabling VLMs to internalize spatial imagination in a scalable and efficient manner.

preprint2025arXiv

ACE-RL: Adaptive Constraint-Enhanced Reward for Long-form Generation Reinforcement Learning

Long-form generation has become a critical and challenging application for Large Language Models (LLMs). Existing studies are limited by their reliance on scarce, high-quality long-form response data and their focus on coarse-grained, general-purpose metrics (e.g., coherence and helpfulness), overlooking the nuanced, scenario-specific requirements of real-world tasks. To address these limitations, we propose a framework utilizing Adaptive Constraint-Enhanced reward for long-form generation Reinforcement Learning (ACE-RL). ACE-RL first decomposes each instruction into a set of fine-grained, adaptive constraint criteria spanning key dimensions of long-form generation tasks. Subsequently, we design a reward mechanism to quantify the response quality based on their satisfaction over corresponding constraints, converting subjective quality evaluation into constraint verification. Finally, we leverage reinforcement learning to optimize LLMs using these fine-grained signals. Experimental results show that ACE-RL significantly outperforms existing SFT and RL baselines by 18.63% and 7.61% on WritingBench, and our top-performing model even surpasses proprietary systems like GPT-4o by 8.76%, providing a more effective training paradigm in long-form generation scenarios.

preprint2023arXiv

Language Cognition and Language Computation -- Human and Machine Language Understanding

Language understanding is a key scientific issue in the fields of cognitive and computer science. However, the two disciplines differ substantially in the specific research questions. Cognitive science focuses on analyzing the specific mechanism of the brain and investigating the brain's response to language; few studies have examined the brain's language system as a whole. By contrast, computer scientists focus on the efficiency of practical applications when choosing research questions but may ignore the most essential laws of language. Given these differences, can a combination of the disciplines offer new insights for building intelligent language models and studying language cognitive mechanisms? In the following text, we first review the research questions, history, and methods of language understanding in cognitive and computer science, focusing on the current progress and challenges. We then compare and contrast the research of language understanding in cognitive and computer sciences. Finally, we review existing work that combines insights from language cognition and language computation and offer prospects for future development trends.

preprint2022arXiv

A Roadmap for Big Model

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

preprint2022arXiv

Dark Matter Halos in Interacting Dark Energy Models: Formation History, Density Profile, Spin and Shape

The interacting dark energy (IDE) model, which considers the interaction between dark energy and dark matter, provides a natural mechanism to alleviate the coincidence problem and can also relieve the observational tensions under the $Λ$CDM model. Previous studies have put constraints on IDE models by observations of cosmic expansion history, cosmic microwave background and large-scale structures. However, these data are not yet enough to distinguish IDE models from $Λ$CDM effectively. Because the non-linear structure formation contains rich cosmological information, it can provide additional means to differentiate alternative models. In this paper, based on a set of $N$-body simulations for IDE models, we investigate the formation histories and properties of dark matter halos, and compare with their $Λ$CDM counterparts. For the model with dark matter decaying into dark energy and the parameters being the best-fit values from previous constraints, the structure formation is markedly slowed down, and the halos have systematically lower mass, looser internal structure, higher spin and anisotropy. This is inconsistent with the observed structure formation, and thus this model can be safely ruled out from the perspective of non-linear structure formation. Moreover, we find that the ratio of halo concentrations between IDE and $Λ$CDM counterparts depends sensitively on the interaction parameter and is independent of halo mass. This can act as a powerful probe to constrain IDE models. Our results concretely demonstrate that the interaction of the two dark components can affect the halo formation considerably, and therefore the constraints from non-linear structures are indispensable.

preprint2022arXiv

Improvement of cosmological constraints with the cross correlation between line-of-sight optical galaxy and FRB dispersion measure

Fast Radio Bursts (hereafter FRBs) can be used in cosmology by studying the Dispersion Measure (hereafter DM) as a function of redshift. The large scale structure of matter distribution is regarded as a major error budget for such application. Using optical galaxy and dispersion measure mocks built from N-body simulations, we have shown that the galaxy number density can be used as a tracer for large scale electron density and help improve the measurement of DM as a function of redshift. We have shown that, using the line-of-sight galaxy number counts within 1' around the given localized FRB source can help improve the cosmological parameter constraints by more than 20%.

preprint2022arXiv

Instance-aware Prompt Learning for Language Understanding and Generation

Recently, prompt learning has become a new paradigm to utilize pre-trained language models (PLMs) and achieves promising results in downstream tasks with a negligible increase of parameters. The current usage of discrete and continuous prompts assumes that the prompt is fixed for a specific task and all samples in the task share the same prompt. However, a task may contain quite diverse samples in which some are easy and others are difficult, and diverse prompts are desirable. In this paper, we propose an instance-aware prompt learning method that learns a different prompt for each instance. Specifically, we suppose that each learnable prompt token has a different contribution to different instances, and we learn the contribution by calculating the relevance score between an instance and each prompt token. The contribution weighted prompt would be instance aware. We apply our method to both unidirectional and bidirectional PLMs on both language understanding and generation tasks. Extensive experiments demonstrate that our method obtains considerable improvements compared to strong baselines. Especially, our method achieves the state-of-the-art on the SuperGLUE few-shot learning benchmark.

preprint2022arXiv

Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions

Role-oriented dialogue summarization is to generate summaries for different roles in the dialogue, e.g., merchants and consumers. Existing methods handle this task by summarizing each role's content separately and thus are prone to ignore the information from other roles. However, we believe that other roles' content could benefit the quality of summaries, such as the omitted information mentioned by other roles. Therefore, we propose a novel role interaction enhanced method for role-oriented dialogue summarization. It adopts cross attention and decoder self-attention interactions to interactively acquire other roles' critical information. The cross attention interaction aims to select other roles' critical dialogue utterances, while the decoder self-attention interaction aims to obtain key information from other roles' summaries. Experimental results have shown that our proposed method significantly outperforms strong baselines on two public role-oriented dialogue summarization datasets. Extensive analyses have demonstrated that other roles' content could help generate summaries with more complete semantics and correct topic structures.

preprint2022arXiv

Pre-Training on Dynamic Graph Neural Networks

The pre-training on the graph neural network model can learn the general features of large-scale networks or networks of the same type by self-supervised methods, which allows the model to work even when node labels are missing. However, the existing pre-training methods do not take network evolution into consideration. This paper proposes a pre-training method on dynamic graph neural networks (PT-DGNN), which uses dynamic attributed graph generation tasks to simultaneously learn the structure, semantics, and evolution features of the graph. The method includes two steps: 1) dynamic sub-graph sampling, and 2) pre-training with dynamic attributed graph generation task. Comparative experiments on three realistic dynamic network datasets show that the proposed method achieves the best results on the link prediction fine-tuning task.

preprint2022arXiv

The BINGO Project III: Optical design and optimisation of the focal plane

The BINGO telescope was designed to measure the fluctuations of the 21-cm radiation arising from the hyperfine transition of neutral hydrogen and aims to measure the Baryon Acoustic Oscillations (BAO) from such fluctuations, therefore serving as a pathfinder to future deeper intensity mapping surveys. The requirements for the Phase 1 of the projects consider a large reflector system (two 40 m-class dishes in a crossed-Dragone configuration), illuminating a focal plane with 28 horns to measure the sky with two circular polarisations in a drift scan mode to produce measurements of the radiation in intensity as well as the circular polarisation. In this paper we present the optical design for the instrument. We describe the intensity and polarisation properties of the beams and the optical arrangement of the horns in the focal plane to produce a homogeneous and well-sampled map after the end of Phase 1. Our analysis provides an optimal model for the location of the horns in the focal plane, producing a homogeneous and Nyquist sampled map after the nominal survey time. We arrive at an optimal configuration for the optical system, including the focal plane positioning and the beam behavior of the instrument. We present an estimate of the expected side lobes both for intensity and polarisation, as well as the effect of band averaging on the final side lobes. The cross polarisation leakage values for the final configuration allow us to conclude that the optical arrangement meets the requirements of the project. We conclude that the chosen optical design meets the requirements for the project in terms of polarisation purity, area coverage as well as homogeneity of coverage so that BINGO can perform a successful BAO experiment. We further conclude that the requirements on the placement and r.m.s. error on the mirrors are also achievable so that a successful experiment can be conducted.(Abridged)

preprint2022arXiv

The BINGO Project V: Further steps in Component Separation and Bispectrum Analysis

Observing the neutral hydrogen distribution across the Universe via redshifted 21cm line intensity mapping constitutes a powerful probe for cosmology. However, the redshifted 21cm signal is obscured by the foreground emission from our Galaxy and other extragalactic foregrounds. This paper addresses the capabilities of the BINGO survey to separate such signals. Specifically, this paper looks in detail at the different residuals left over by foreground components, shows that a noise-corrected spectrum is unbiased, and shows that we understand the remaining systematic residuals by analyzing nonzero contributions to the three-point function. We use the generalized needlet internal linear combination, which we apply to sky simulations of the BINGO experiment for each redshift bin of the survey. We present our recovery of the redshifted 21cm signal from sky simulations of the BINGO experiment, including foreground components. We test the recovery of the 21cm signal through the angular power spectrum at different redshifts, as well as the recovery of its non-Gaussian distribution through a bispectrum analysis. We find that non-Gaussianities from the original foreground maps can be removed down to, at least, the noise limit of the BINGO survey with such techniques. Our component separation methodology allows us to subtract the foreground contamination in the BINGO channels down to levels below the cosmological signal and the noise, and to reconstruct the 21cm power spectrum for different redshift bins without significant loss at multipoles $20 \lesssim \ell \lesssim 500$. Our bispectrum analysis yields strong tests of the level of the residual foreground contamination in the recovered 21cm signal, thereby allowing us to both optimize and validate our component separation analysis. (Abridged)

preprint2021arXiv

Numerical convergence of pre-initial conditions on dark matter halo properties

Generating pre-initial conditions (or particle loads) is the very first step to set up a cosmological N-body simulation. In this work, we revisit the numerical convergence of pre-initial conditions on dark matter halo properties using a set of simulations which only differs in initial particle loads, i.e. grid, glass, and the newly introduced capacity constrained Voronoi tessellation (CCVT). We find that the median halo properties agree fairly well (i.e. within a convergence level of a few per cent) among simulations running from different initial loads. We also notice that for some individual haloes cross-matched among different simulations, the relative difference of their properties sometimes can be several tens of per cent. By looking at the evolution history of these poorly converged haloes, we find that they are usually merging haloes or haloes have experienced recent merger events, and their merging processes in different simulations are out-of-sync, making the convergence of halo properties become poor temporarily. We show that, comparing to the simulation starting with an anisotropic grid load, the simulation with an isotropic CCVT load converges slightly better to the simulation with a glass load, which is also isotropic. Among simulations with different pre-initial conditions, haloes in higher density environments tend to have their properties converged slightly better. Our results confirm that CCVT loads behave as well as the widely used grid and glass loads at small scales, and for the first time we quantify the convergence of two independent isotropic particle loads (i.e. glass and CCVT) on halo properties.

preprint2021arXiv

The BINGO Project I: Baryon Acoustic Oscillations from Integrated Neutral Gas Observations

Observations of the redshifted 21-cm line of neutral hydrogen (HI) are a new and powerful window of observation that offers us the possibility to map the spatial distribution of cosmic HI and learn about cosmology. BINGO (Baryon Acoustic Oscillations [BAO] from Integrated Neutral Gas Observations) is a new unique radio telescope designed to be one of the first to probe BAO at radio frequencies. BINGO has two science goals: cosmology and astrophysics. Cosmology is the main science goal and the driver for BINGO's design and strategy. The key of BINGO is to detect the low redshift BAO to put strong constraints in the dark sector models. Given the versatility of the BINGO telescope, a secondary goal is astrophysics, where BINGO can help discover and study Fast Radio Bursts (FRB) and other transients, Galactic and extragalactic science. In this paper, we introduce the latest progress of the BINGO project, its science goals, describing the scientific potential of the project in each science and the new developments obtained by the collaboration. We introduce the BINGO project and its science goals and give a general summary of recent developments in construction, science potential and pipeline development obtained by the BINGO collaboration in the past few years. We show that BINGO will be able to obtain competitive constraints for the dark sector, and also that will allow for the discovery of several FRBs in the southern hemisphere. The capacity of BINGO in obtaining information from 21-cm is also tested in the pipeline introduced here. There is still no measurement of the BAO in radio, and studying cosmology in this new window of observations is one of the most promising advances in the field. The BINGO project is a radio telescope that has the goal to be one of the first to perform this measurement and it is currently being built in the northeast of Brazil. (Abridged)

preprint2021arXiv

The BINGO Project II: Instrument Description

The measurement of diffuse 21-cm radiation from the hyperfine transition of neutral hydrogen (HI signal) in different redshifts is an important tool for modern cosmology. However, detecting this faint signal with non-cryogenic receivers in single-dish telescopes is a challenging task. The BINGO (Baryon Acoustic Oscillations from Integrated Neutral Gas Observations) radio telescope is an instrument designed to detect baryonic acoustic oscillations (BAOs) in the cosmological HI signal, in the redshift interval $0.127 \le z \le 0.449$. This paper describes the BINGO radio telescope, including the current status of the optics, receiver, observational strategy, calibration, and the site. BINGO has been carefully designed to minimize systematics, being a transit instrument with no moving dishes and 28 horns operating in the frequency range $980 \le ν\le 1260$ MHz. Comprehensive laboratory tests were conducted for many of the BINGO subsystems and the prototypes of the receiver chain, horn, polarizer, magic tees, and transitions have been successfully tested between 2018 - 2020. The survey was designed to cover $\sim 13\%$ of the sky, with the primary mirror pointing at declination $δ=-15^{\circ}$. The telescope will see an instantaneous declination strip of $14.75^{\circ}$. The results of the prototype tests closely meet those obtained during the modeling process, suggesting BINGO will perform according to our expectations. After one year of observations with a $60\%$ duty cycle and 28 horns, BINGO should achieve an expected sensitivity of 102 $μK$ per 9.33 MHz frequency channel, one polarization, and be able to measure the HI power spectrum in a competitive time frame.

preprint2021arXiv

The BINGO Project IV: Simulations for mission performance assessment and preliminary component separation steps

The large-scale distribution of neutral hydrogen (HI) in the Universe is luminous through its 21 cm emission. The goal of the Baryon Acoustic Oscillations from Integrated Neutral Gas Observations -- BINGO -- radio telescope is to detect baryon acoustic oscillations (BAOs) at radio frequencies through 21 cm intensity mapping (IM). The telescope will span the redshift range 0.127 $< z <$ 0.449 with an instantaneous field-of-view of $14.75^{\circ} \times 6.0^{\circ}$. In this work we investigate different constructive and operational scenarios of the instrument by generating sky maps as they would be produced by the instrument. In doing this we use a set of end-to-end IM mission simulations. The maps will additionally be used to evaluate the efficiency of a component separation method (GNILC). We have simulated the kind of data that would be produced in a single-dish IM experiment such as BINGO. According to the results obtained, we have optimized the focal plane design of the telescope. In addition, the application of the GNILC method on simulated data shows that it is feasible to extract the cosmological signal across a wide range of multipoles and redshifts. The results are comparable with the standard principal component analysis method.

preprint2021arXiv

The BINGO Project VI: HI Halo Occupation Distribution and Mock Building

BINGO (Baryon Acoustic Oscillations from Integrated Neutral Gas Observations.) is a radio telescope designed to survey from 980 MHz to 1260 MHz, observe the neutral Hydrogen (HI) 21-cm line and detect BAO (Baryon Acoustic Oscillation) signal with Intensity Mapping technique. Here we present our method to generate mock maps of the 21-cm Intensity Mapping signal covering the BINGO frequency range and related test results. (Abridged)

preprint2021arXiv

The BINGO Project VII: Cosmological Forecasts from 21cm Intensity Mapping

The 21cm line of neutral hydrogen (HI) opens a new avenue in our exploration of the structure and evolution of the Universe. It provides complementary data to the current large-scale structure observations with different systematics, and thus it will be used to improve our understanding of the $Λ$CDM model. Among several radio cosmological surveys designed to measure this line, BINGO is a single-dish telescope mainly designed to detect baryon acoustic oscillations (BAOs) at low redshifts ($0.127< z<0.449$). Our goal is to assess the fiducial BINGO setup and its capabilities of constraining the cosmological parameters, and to analyze the effect of different instrument configurations. We used the Phase 1 fiducial configuration of the BINGO telescope to perform our cosmological forecasts. In addition, we investigated the impact of several instrumental setups, taking into account some instrumental systematics, and different cosmological models. Combining BINGO with Planck temperature and polarization data, the projected constraint improves from a $13\%$ and $25\%$ precision measurement at the $68\%$ confidence level with Planck only to $1\%$ and $3\%$ for the Hubble constant and the dark energy equation of state (EoS), respectively, within the wCDM model. Assuming a Chevallier-Polarski-Linder parameterization, the EoS parameters have standard deviations given by $σ_{w_0} = 0.30$ and $σ_{w_a} = 1.2$, which are improvements on the order of $30\%$ with respect to Planck alone. Also, we can access information about the HI density and bias, obtaining $\sim 8.5\%$ and $\sim 6\%$ precision, respectively, assuming they vary with redshift at three independent bins. The fiducial BINGO configuration will be able to extract significant cosmological information from the HI distribution and provide constraints competitive with current and future cosmological surveys. (Abridged)

preprint2021arXiv

Towards a Bias-Free Selection Function in Shear Measurement

Sample selection is a necessary preparation for weak lensing measurement. It is well-known that selection itself may introduce bias in the measured shear signal. Using image simulation and the Fourier_Quad shear measurement pipeline, we quantify the selection bias in various commonly used selection function (signal-to-noise-ratio, magnitude, etc.). We proposed a new selection function defined in the power spectrum of the galaxy image. This new selection function has low selection bias, and it is particularly convenient for shear measurement pipelines based on Fourier transformation.

preprint2021arXiv

White dwarfs identified in LAMOST Data Release 5

In this paper, we report white dwarfs identified in the 5th Data Release of the Large Area Multi-Object fibre Spectroscopic Telescope, including spectral types of DA, DB, DC, DZ, and so on. There are 2 625 DA spectra of 2 281 DA stars, 182 DB spectra of 166 DB stars, 62 DC spectra of 58 DC stars, 36 DZ spectra of 33 DZ stars and many other types identified, in addition to our previous paper (Data Release 2). Among those sources, 393 DA stars and 46 DB stars are new identifications after cross-matching with the literature. In order to select DA candidates, we use the classification result from the LAMOST pipeline, colour-colour cut method and a random forest machine learning method. For DBs, since there is no template for DB in the pipeline model, a random forest machine learning method is chosen to select candidates. All the WD candidates have been visually checked individually. The parameters of effective temperature, surface gravity, mass, and cooling age have been estimated for relatively high signal-to-noise ratio DAs and DBs. The peaks of the DA and DB mass distributions are found to be around 0.62Msun and 0.65Msun, respectively. Finally, the data and method we used to select white dwarf candidates for the second phase of LAMOST survey are also addressed in this paper.

preprint2020arXiv

A new moving group in the Local Arm

We present a new moving group clustered in kinematics, spatial position and elemental abundances. Its spatial position is around the center of the Local Arm of the Milky Way. A convergent point method was taken to select candidate member stars.\textbf{ Among 206 candidate member stars, 74 are pre-main-sequence stars and some of them have stellar disks.} We presume those pre-main sequence stars belong to Orion nebula. We suggest this moving group is caused by density wave of the Local Arm passing by.

preprint2020arXiv

A study on Cubic Galileon Gravity Using N-body Simulations

We use N-body simulation to study the structure formation in the Cubic Galileon Gravity model where along with the usual kinetic and potential term we also have a higher derivative self-interaction term. We find that the large scale structure provides a unique constraining power for this model. The matter power spectrum, halo mass function, galaxy-galaxy weak lensing signal, marked density power spectrum as well as count in cell are measured. The simulations show that there are less massive halos in the Cubic Galileon Gravity model than corresponding $Λ$CDM model and the marked density power spectrum in these two models are different by more than $10\%$. Furthermore, the Cubic Galileon model shows significant differences in voids compared to $Λ$CDM. The number of low density cells is far higher in the Cubic Galileon model than that in the $Λ$CDM model. Therefore, it would be interesting to put constraints on this model using future large scale structure observations, especially in void regions.

preprint2020arXiv

Nearly 30,000 late-type main-sequence stars with stellar age from LAMOST DR5

We construct a sample of nearly 30,000 main-sequence stars with 4500K $<T\rm_{eff}<$ 5000K and stellar ages estimated by the chromospheric activity$-$age relation. This sample is used to determine the age distribution in the $R-Z$ plane of the Galaxy, where $R$ is the projected Galactocentric distance in the disk midplane and $Z$ is the height above the disk midplane. As $|Z|$ increases, the percentage of old stars becomes larger. It is known that scale-height of Galactic disk increases as $R$ increases, which is called flare. A mild flare from $R$ $\sim$ 8.0 to 9.0 kpc in stellar age distribution is found. We also find that the velocity dispersion increases with age as confirmed by previous studies. Finally we present spiral-shaped structures in $Z-\upsilon_{Z}$ phase space in three stellar age bins. The spiral is clearly seen in the age bin of [0, 1] Gyr, which suggests that a vertical perturbation to the disk probably took place within the last $\sim$ 1.0 Gyr.

preprint2020arXiv

Neural Machine Translation: Challenges, Progress and Future

Machine translation (MT) is a technique that leverages computers to translate human languages automatically. Nowadays, neural machine translation (NMT) which models direct mapping between source and target languages with deep neural networks has achieved a big breakthrough in translation performance and become the de facto paradigm of MT. This article makes a review of NMT framework, discusses the challenges in NMT, introduces some exciting recent progresses and finally looks forward to some potential future research trends. In addition, we maintain the state-of-the-art methods for various NMT tasks at the website https://github.com/ZNLP/SOTA-MT.

preprint2020arXiv

Testing interacting dark matter and dark energy model with cosmological data

We investigate the model of dark matter-dark energy (DM-DE) interaction with coupling strength proportional to the multiplication of dark sector densities with different power indices $Q = γρ_{\rm c}^α ρ_{\rm d}^β$. We first investigate the modification of the cosmic expansion history, and then further develop the formalism to take into account the cosmological perturbations and dark matter temperature evolution. We then use the latest observational cosmology data, including cosmic microwave background (CMB) data, baryon acoustic oscillations (BAO) data, redshift-space distortion (RSD) data and Type Ia supernovae (SNe) data to constrain the model parameters. We find in the phantom region, a positive $α$ is preferred by the data above $2\, σ$ statistic significance. If we choose the power indices to be integers or half-integers for {\it plausible} physics of particle interaction, the allowed values within $1\, σ$ confidence regions are $α= 0.5$ and $β= 0, 0.5, 1$. The inclusion of BAO and RSD data from large-scale structure and SNe data improves the constraints significantly. Our model predicts lower values of $f(z) σ_8(z)$ at $z<1$ comparing to $Λ$CDM model, which alleviates the tension of $Λ$CDM with various RSD data from optical galaxy surveys. Overall, the DM-DE interaction model is consistent with the current observational data, especially providing a better fit to the RSD data.

preprint2020arXiv

The parameter-free Finger-Of-God model and its application to 21cm intensity mapping

Using the galaxy catalog built from ELUCID N-body simulation and the semi-analytical galaxy formation model, we have built a mock HI intensity mapping map. We have implemented the Finger-of-God (FoG) effect in the map by considering the galaxy HI gas velocity dispersion. By comparing the HI power spectrum in the redshift space with the measurement from IllustrisTNG simulation, we have found that such FoG effect can explain the discrepancy between current mock map built from N-body simulation and Illustris TNG simulation. Then we built a parameter-free FoG model and a shot-noise model to calculate the HI power spectrum. We found that our model can accurately fit both the monopole and quadrupole moments of the HI matter power spectrum. Our method of building the mock HI intensity map and the parameter-free FoG model will be widely useful for the up-coming 21cm intensity mapping experiments, such as CHIME, Tianlai, BINGO, FAST and SKA. It is also crucial for us to study the non-linear effects in 21cm intensity mapping.

preprint2020arXiv

Weak equivalence principle, swampland and $H_0$ tension with fast single radio bursts FRB 180924 and FRB 190523

Two new fast single radio bursts FRB 180924 and FRB 190523 well localized to massive galaxies have opened a new window to probe and characterize how cosmic baryons are allocated between galaxies, their surroundings and intergalactic medium. We are motivated by testing Einstein's weak equivalence principle with these two cosmic transients which have accurate redshifts. Using photons with different energies emitted by FRB 180924, we obtain, so far, the most stringent bound $Δγ<2.16\times10^{-10}$ for non-repeating FRBs with accurate redshifts when only considering the gravitational potential of the Milk Way. If using the gravitational potential of the Laniakea supercluster instead of the Milk Way one, we also obtain the strictest bound $Δγ<1.06\times10^{-14}$ to date. In light of rapid progress of FRB cosmology, towards the next two decades, we give an universal limitation $Δγ<8.24\times10^{-22}$ from photons with different energies emitted by single FRBs with accurate redshifts. Moreover, we analyze detailedly the effects of various astrophysical parameters on the precision of weak equivalence principle. We also estimate the abilities of single FRBs with known redshifts to test the validity of swampland criterion, and to distinguish which value of $H_0$ is preferred.

preprint2019arXiv

Stellar chromospheric activity and age relation from open clusters in the LAMOST Survey

We identify member stars of more than 90 open clusters in the LAMOST survey. With the method of Fang et al.(2018), the chromospheric activity (CA) indices logR'CaK for 1091 member stars in 82 open clusters and logR'Hα for 1118 member stars in 83 open clusters are calculated. The relations between the average logR'CaK, logR'Hα in each open cluster and its age are investigated in different Teff and [Fe/H] ranges. We find that CA starts to decrease slowly from logt = 6.70 to logt = 8.50, and then decreases rapidly until logt = 9.53. The trend becomes clearer for cooler stars. The quadratic functions between logR' and logt with 4000K < Teff < 5500K are constructed, which can be used to roughly estimate ages of field stars with accuracy about 40% for logR'CaK and 60% for logR'Hα.

preprint2017arXiv

Shortcut Sequence Tagging

Deep stacked RNNs are usually hard to train. Adding shortcut connections across different layers is a common way to ease the training of stacked networks. However, extra shortcuts make the recurrent step more complicated. To simply the stacked architecture, we propose a framework called shortcut block, which is a marriage of the gating mechanism and shortcuts, while discarding the self-connected part in LSTM cell. We present extensive empirical experiments showing that this design makes training easy and improves generalization. We propose various shortcut block topologies and compositions to explore its effectiveness. Based on this architecture, we obtain a 6% relatively improvement over the state-of-the-art on CCGbank supertagging dataset. We also get comparable results on POS tagging task.

preprint2016arXiv

A Dynamic Window Neural Network for CCG Supertagging

Combinatory Category Grammar (CCG) supertagging is a task to assign lexical categories to each word in a sentence. Almost all previous methods use fixed context window sizes as input features. However, it is obvious that different tags usually rely on different context window sizes. These motivate us to build a supertagger with a dynamic window approach, which can be treated as an attention mechanism on the local contexts. Applying dropout on the dynamic filters can be seen as drop on words directly, which is superior to the regular dropout on word embeddings. We use this approach to demonstrate the state-of-the-art CCG supertagging performance on the standard test set.

preprint2016arXiv

An Empirical Exploration of Skip Connections for Sequential Tagging

In this paper, we empirically explore the effects of various kinds of skip connections in stacked bidirectional LSTMs for sequential tagging. We investigate three kinds of skip connections connecting to LSTM cells: (a) skip connections to the gates, (b) skip connections to the internal states and (c) skip connections to the cell outputs. We present comprehensive experiments showing that skip connections to cell outputs outperform the remaining two. Furthermore, we observe that using gated identity functions as skip mappings works pretty well. Based on this novel skip connections, we successfully train deep stacked bidirectional LSTM models and obtain state-of-the-art results on CCG supertagging and comparable results on POS tagging.

preprint2016arXiv

Bridging Neural Machine Translation and Bilingual Dictionaries

Neural Machine Translation (NMT) has become the new state-of-the-art in several language pairs. However, it remains a challenging problem how to integrate NMT with a bilingual dictionary which mainly contains words rarely or never seen in the bilingual training data. In this paper, we propose two methods to bridge NMT and the bilingual dictionaries. The core idea behind is to design novel models that transform the bilingual dictionaries into adequate sentence pairs, so that NMT can distil latent bilingual mappings from the ample and repetitive phenomena. One method leverages a mixed word/character model and the other attempts at synthesizing parallel sentences guaranteeing massive occurrence of the translation lexicon. Extensive experiments demonstrate that the proposed methods can remarkably improve the translation quality, and most of the rare words in the test sentences can obtain correct translations if they are covered by the dictionary.

preprint2016arXiv

Extraction of cylinders and cones from minimal point sets

We propose new algebraic methods for extracting cylinders and cones from minimal point sets, including oriented points. More precisely, we are interested in computing efficiently cylinders through a set of three points, one of them being oriented, or through a set of five simple points. We are also interested in computing efficiently cones through a set of two oriented points, through a set of four points, one of them being oriented, or through a set of six points. For these different interpolation problems, we give optimal bounds on the number of solutions. Moreover, we describe algebraic methods targeted to solve these problems efficiently.

preprint2016arXiv

Neural Name Translation Improves Neural Machine Translation

In order to control computational complexity, neural machine translation (NMT) systems convert all rare words outside the vocabulary into a single unk symbol. Previous solution (Luong et al., 2015) resorts to use multiple numbered unks to learn the correspondence between source and target rare words. However, testing words unseen in the training corpus cannot be handled by this method. And it also suffers from the noisy word alignment. In this paper, we focus on a major type of rare words -- named entity (NE), and propose to translate them with character level sequence to sequence model. The NE translation model is further used to derive high quality NE alignment in the bilingual training corpus. With the integration of NE translation and alignment modules, our NMT system is able to surpass the baseline system by 2.9 BLEU points on the Chinese to English task.

preprint2016arXiv

One Sentence One Model for Neural Machine Translation

Neural machine translation (NMT) becomes a new state-of-the-art and achieves promising translation results using a simple encoder-decoder neural network. This neural network is trained once on the parallel corpus and the fixed network is used to translate all the test sentences. We argue that the general fixed network cannot best fit the specific test sentences. In this paper, we propose the dynamic NMT which learns a general network as usual, and then fine-tunes the network for each test sentence. The fine-tune work is done on a small set of the bilingual training data that is obtained through similarity search according to the test sentence. Extensive experiments demonstrate that this method can significantly improve the translation performance, especially when highly similar sentences are available.

preprint2015arXiv

Beyond Word-based Language Model in Statistical Machine Translation

Language model is one of the most important modules in statistical machine translation and currently the word-based language model dominants this community. However, many translation models (e.g. phrase-based models) generate the target language sentences by rendering and compositing the phrases rather than the words. Thus, it is much more reasonable to model dependency between phrases, but few research work succeed in solving this problem. In this paper, we tackle this problem by designing a novel phrase-based language model which attempts to solve three key sub-problems: 1, how to define a phrase in language model; 2, how to determine the phrase boundary in the large-scale monolingual data in order to enlarge the training set; 3, how to alleviate the data sparsity problem due to the huge vocabulary size of phrases. By carefully handling these issues, the extensive experiments on Chinese-to-English translation show that our phrase-based language model can significantly improve the translation quality by up to +1.47 absolute BLEU score.

preprint2015arXiv

Local Translation Prediction with Global Sentence Representation

Statistical machine translation models have made great progress in improving the translation quality. However, the existing models predict the target translation with only the source- and target-side local context information. In practice, distinguishing good translations from bad ones does not only depend on the local features, but also rely on the global sentence-level information. In this paper, we explore the source-side global sentence-level features for target-side local translation prediction. We propose a novel bilingually-constrained chunk-based convolutional neural network to learn sentence semantic representations. With the sentence-level feature representation, we further design a feed-forward neural network to better predict translations using both local and global information. The large-scale experiments show that our method can obtain substantial improvements in translation quality over the strong baseline: the hierarchical phrase-based translation model augmented with the neural network joint model.

preprint2009arXiv

Geometric Characteristics of Dynamic Correlations for Combinatorial Regulation in Gene Expression Noise

Knowing which mode of combinatorial regulation (typically, AND or OR logic operation) that a gene employs is important for determining its function in regulatory networks. Here, we introduce a dynamic cross-correlation function between the output of a gene and its upstream regulator concentrations for signatures of combinatorial regulation in gene expression noise. We find that the correlation function is always upwards convex for the AND operation whereas downwards convex for the OR operation, whichever sources of noise (intrinsic or extrinsic or both). In turn, this fact implies a means for inferring regulatory synergies from available experimental data. The extensions and applications are discussed.

Jiajun Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

43 published item(s)

Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements

GR-Dexter Technical Report

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era

PlotCraft: Pushing the Limits of LLMs for Complex and Interactive Data Visualization

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models

World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning

ACE-RL: Adaptive Constraint-Enhanced Reward for Long-form Generation Reinforcement Learning

Language Cognition and Language Computation -- Human and Machine Language Understanding

A Roadmap for Big Model

Dark Matter Halos in Interacting Dark Energy Models: Formation History, Density Profile, Spin and Shape

Improvement of cosmological constraints with the cross correlation between line-of-sight optical galaxy and FRB dispersion measure

Instance-aware Prompt Learning for Language Understanding and Generation

Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions

Pre-Training on Dynamic Graph Neural Networks

The BINGO Project III: Optical design and optimisation of the focal plane

The BINGO Project V: Further steps in Component Separation and Bispectrum Analysis

Numerical convergence of pre-initial conditions on dark matter halo properties

The BINGO Project I: Baryon Acoustic Oscillations from Integrated Neutral Gas Observations

The BINGO Project II: Instrument Description

The BINGO Project IV: Simulations for mission performance assessment and preliminary component separation steps

The BINGO Project VI: HI Halo Occupation Distribution and Mock Building

The BINGO Project VII: Cosmological Forecasts from 21cm Intensity Mapping

Towards a Bias-Free Selection Function in Shear Measurement

White dwarfs identified in LAMOST Data Release 5

A new moving group in the Local Arm

A study on Cubic Galileon Gravity Using N-body Simulations

Nearly 30,000 late-type main-sequence stars with stellar age from LAMOST DR5

Neural Machine Translation: Challenges, Progress and Future

Testing interacting dark matter and dark energy model with cosmological data

The parameter-free Finger-Of-God model and its application to 21cm intensity mapping

Weak equivalence principle, swampland and $H_0$ tension with fast single radio bursts FRB 180924 and FRB 190523

Stellar chromospheric activity and age relation from open clusters in the LAMOST Survey

Shortcut Sequence Tagging

A Dynamic Window Neural Network for CCG Supertagging

An Empirical Exploration of Skip Connections for Sequential Tagging

Bridging Neural Machine Translation and Bilingual Dictionaries

Extraction of cylinders and cones from minimal point sets

Neural Name Translation Improves Neural Machine Translation

One Sentence One Model for Neural Machine Translation

Beyond Word-based Language Model in Statistical Machine Translation

Local Translation Prediction with Global Sentence Representation

Geometric Characteristics of Dynamic Correlations for Combinatorial Regulation in Gene Expression Noise