Source author record

Jiajun Sun

Jiajun Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Networking and Internet Architecture Computation and Language Artificial Intelligence Computer Vision eess.AS Sound Cryptography and Security Robotics Social and Information Networks

Catalog footprint

What is connected

16works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training

The GPT-4 technical report suggests that downstream performance can be predicted from pre-training signals, but offers little methodological detail on how to quantify this. This work address this gap by modeling knowledge retention, the capacity of a pre-trained language model to memorize factual information from its corpus, and introduce a principled method to estimate it prior to training. We propose Size-dependent Mutual Information (SMI), an information-theoretic predictor that integrates knowledge frequency, knowledge specificity, and model size to forecast closed-book question answering (QA) accuracy. SMI is validated through large-scale document retrieval over the disclosed pre-training corpora of 21 public and 3 custom models, combined with a robust multi-template QA evaluation. Experiments show that SMI significantly outperforms repetition-based baselines and achieves $R^2$ > 0.7 in predicting QA accuracy for models above 1B parameters, without additional training. The analysis further reveals diminishing returns from scaling data and model size and provides evidence for an intrinsic upper bound on knowledge retention achievable by pre-training alone, motivating retrieval and other augmentation strategies. The dataset and code are available at https://github.com/yuhui1038/SMI.

preprint2026arXiv

CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges

The ability to reason from audio, including speech, environmental sounds, and music, is essential for AI agents to interact effectively in real-world scenarios. Existing benchmarks mainly focus on static or single-scene settings and English audio data and do not fully capture scenarios where multiple speakers, unfolding events, and heterogeneous audio sources interact. To address these challenges, we introduce CMDAR, a Chinese benchmark for evaluating models on complex, multi-scene, and dynamically evolving audio reasoning tasks. CMDAR comprises 3,000 carefully curated question-answer pairs linked to diverse audio clips, covering five categories of complex reasoning and spanning three question types. We benchmark 26 state-of-the-art audio language models on CMDAR and observe that they exhibit limitations in complex reasoning tasks. In CMDAR-main, Qwen2.5-Omni achieves 76.67% accuracy, whereas GPT-4o Audio reaches 68.47%. However, GPT-4o Audio substantially outperforms Qwen2.5-Omni on the more challenging multiple-choice with multiple audios and open-ended tasks. And we provide detail analysis corresponding suggestions for the future development of large audio language models.

preprint2026arXiv

Multimodal Signal Processing For Thermo-Visible-Lidar Fusion In Real-time 3D Semantic Mapping

In complex environments, autonomous robot navigation and environmental perception pose higher requirements for SLAM technology. This paper presents a novel method for semantically enhancing 3D point cloud maps with thermal information. By first performing pixel-level fusion of visible and infrared images, the system projects real-time LiDAR point clouds onto this fused image stream. It then segments heat source features in the thermal channel to instantly identify high temperature targets and applies this temperature information as a semantic layer on the final 3D map. This approach generates maps that not only have accurate geometry but also possess a critical semantic understanding of the environment, making it highly valuable for specific applications like rapid disaster assessment and industrial preventive maintenance.

preprint2026arXiv

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

Recent commercial systems such as Suno demonstrate strong capabilities in long-form song generation, while academic research remains largely non-reproducible due to the lack of publicly available training data, hindering fair comparison and progress. To this end, we release a fully open-source system for long-form song generation with fine-grained style conditioning, including a licensed synthetic dataset, training and evaluation pipelines, and Muse, an easy-to-deploy song generation model. The dataset consists of 116k fully licensed synthetic songs with automatically generated lyrics and style descriptions paired with audio synthesized by SunoV5. We train Muse via single-stage supervised finetuning of a Qwen-based language model extended with discrete audio tokens using MuCodec, without task-specific losses, auxiliary objectives, or additional architectural components. Our evaluations find that although Muse is trained with a modest data scale and model size, it achieves competitive performance on phoneme error rate, text--music style similarity, and audio aesthetic quality, while enabling controllable segment-level generation across different musical structures. All data, model weights, and training and evaluation pipelines will be publicly released, paving the way for continued progress in controllable long-form song generation research. The project repository is available at https://github.com/yuhui1038/Muse.

preprint2026arXiv

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

Speech-language models (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the role of speech tokenizer designs in LLM-centric SLMs, augmented by speech heads and speaker modeling. We compare coupled, semi-decoupled, and fully decoupled speech tokenizers under a fair SLM framework and find that decoupled tokenization significantly improves alignment and synthesis quality. To address the information density mismatch between speech and text, we introduce multi-token prediction (MTP) into SLMs, enabling each hidden state to decode multiple speech tokens. This leads to up to 12$\times$ faster decoding and a substantial drop in word error rate (from 6.07 to 3.01). Furthermore, we propose a speaker-aware generation paradigm and introduce RoleTriviaQA, a large-scale role-playing knowledge QA benchmark with diverse speaker identities. Experiments demonstrate that our methods enhance both knowledge understanding and speaker consistency.

preprint2022arXiv

DATA: Domain-Aware and Task-Aware Self-supervised Learning

The paradigm of training models on massive data without label through self-supervised learning (SSL) and finetuning on many downstream tasks has become a trend recently. However, due to the high training costs and the unconsciousness of downstream usages, most self-supervised learning methods lack the capability to correspond to the diversities of downstream scenarios, as there are various data domains, different vision tasks and latency constraints on models. Neural architecture search (NAS) is one universally acknowledged fashion to conquer the issues above, but applying NAS on SSL seems impossible as there is no label or metric provided for judging model selection. In this paper, we present DATA, a simple yet effective NAS approach specialized for SSL that provides Domain-Aware and Task-Aware pre-training. Specifically, we (i) train a supernet which could be deemed as a set of millions of networks covering a wide range of model scales without any label, (ii) propose a flexible searching mechanism compatible with SSL that enables finding networks of different computation costs, for various downstream vision tasks and data domains without explicit metric provided. Instantiated With MoCo v2, our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation. DATA is orthogonal to most existing SSL methods and endows them the ability of customization on downstream needs. Extensive experiments on other SSL methods demonstrate the generalizability of the proposed method. Code is released at https://github.com/GAIA-vision/GAIA-ssl

preprint2015arXiv

An Incentive Mechanism for Periodical Mobile Crowdsensing from a Frugality Perspective

Mobile crowdsensing (MCS) has been intensively explored recently due to its flexible and pervasive sensing ability. Although many incentive mechanisms have been built to attract extensive user participation, Most of these mechanisms focus only on independent task scenarios, where the sensing tasks are independent of each other. On the contrary, we focus on a periodical task scenario, where each user participates in the same type of sensing tasks periodically. In this paper, we consider the long-term user participation incentive in a general periodical MCS system from a frugality payment perspective. We explore the issue under both semi-online (the intra-period interactive process is synchronous while the inter-period interactive process is sequential and asynchronous during each period) and online user arrival models (the previous two interactive processes are sequential and asynchronous). In particular, we first propose a semi-online frugal incentive mechanism by introducing a Lyapunov method. Moreover, we also extend it to an online frugal incentive mechanism, which satisfies the constant frugality. Besides, the two mechanisms can also satisfy computational efficiency, asymptotical optimality, individual rationality and truthfulness. Through extensive simulations, we evaluate the performance and validate the theoretical properties of our online mechanisms.

preprint2015arXiv

Long-Term Profit-Maximizing Incentive for Crowd Sensing in Mobile Social Networks

Crowd sensing is a new paradigm that leverages pervasive sensor-equipped mobile devices to provide sensing services like forensic analysis, documenting public spaces, and collaboratively constructing statistical models. Extensive user participation is indispensable for achieving good service quality. Nowadays, most of existing mechanisms focus on guaranteeing good service quality based on instantaneous extensive user participation for crowd sensing applications. Little attention has been dedicated to maximizing long-term service quality for crowd sensing applications due to their asymmetric interests, preferences, selfish behaviors, etc. To fill these gaps, in this paper, we derive the closed expression of the marginal sensing data quality based on the monopoly aggregation in economics. Furthermore, we design marginalquality based incentive mechanisms for long-term crowd sensing applications, not only to enhance extensive user participation by maximizing the expected total profits of mobile users, but also to stimulate mobile users to produce high-quality contents by applying the marginal quality. Finally, simulation results show that our mechanisms outperform the existing solutions.

preprint2014arXiv

Behavior-Based online Incentive Mechanism for Crowd Sensing with Budget Constraints

Crowd sensing is a new paradigm which leverages the ubiquity of sensor-equipped mobile devices to collect data. To achieve good quality for crowd sensing, incentive mechanisms are indispensable to attract more participants. Most of existing mechanisms focus on the expected utility prior to sensing, ignoring the risk of low quality solution and privacy leakage. Traditional incentive mechanisms such as the Vickrey-Clarke-Groves (VCG) mechanism and its variants are not applicable here. In this paper, to address these challenges, we propose a behavior based incentive mechanism for crowd sensing applications with budget constraints by applying sequential all-pay auctions in mobile social networks (MSNs), not only to consider the effects of extensive user participation, but also to maximize high quality of the context based sensing content submission for crowd sensing platform under the budget constraints, where users arrive in a sequential order. Through an extensive simulation, results indicate that incentive mechanisms in our proposed framework outperform the best existing solution.

preprint2014arXiv

Collection-behavior based Multi-parameter Posted Pricing Mechanism for Crowd Sensing

Crowd sensing is a new paradigm which leverages a large number of sensor-equipped mobile phones to collect sensing data. To attract more participants to provide good quality, bidding mechanisms that solicit the Vickrey-Clarke-Groves (VCG) mechanism and its variants are natural fits for crowd sensing applications in mobile social networks. However, in practical continuous crowd sensing applications, where bids cannot be solicited and only posted pricing mechanisms can be implemented. Obviously, these mechanisms for continuous crowd sensing are not applicable. To tackle the issue, we propose a collection-behavior based multi-parameter posted pricing mechanism, not only to consider extensive user participating and sensing data submission quality under given budget constraints by applying all-pay auctions and posted pricing mechanisms, but also to maximize the coverage utilities by applying crowd aversion. Simulation results indicate that incentive mechanisms in our proposed framework outperform the best existing solution.

preprint2014arXiv

General Privacy-Preserving Verifiable Incentive Mechanism for Crowdsourcing Markets

In crowdsourcing markets, there are two different type jobs, i.e. homogeneous jobs and heterogeneous jobs, which need to be allocated to workers. Incentive mechanisms are essential to attract extensive user participating for achieving good service quality, especially under a given budget constraint condition. To this end, recently, Singer et al. propose a novel class of auction mechanisms for determining near-optimal prices of tasks for crowdsourcing markets constrained by the given budget. Their mechanisms are very useful to motivate extensive user to truthfully participate in crowdsourcing markets. Although they are so important, there still exist many security and privacy challenges in real-life environments. In this paper, we present a general privacy-preserving verifiable incentive mechanism for crowdsourcing markets with the budget constraint, not only to exploit how to protect the bids and assignments' privacy, and the chosen winners' privacy in crowdsourcing markets with homogeneous jobs and heterogeneous jobs and identity privacy from users, but also to make the verifiable payment between the platform and users for crowdsourcing applications. Results show that our general privacy-preserving verifiable incentive mechanisms achieve the same results as the generic one without privacy preservation.

preprint2014arXiv

How Much Should I Pay for Privacy Concerns in Truthful Online Crowd Sensing?

Crowd sensing is a new paradigm which leverages the pervasive smartphones to efficiently collect sensing data, enabling numerous novel applications. To achieve good service quality for a crowd sensing application, incentive mechanisms are indispensable to attract more user participation. Most of existing mechanisms only apply for the offline scenario, where the system has full information about the users' sensing profiles, i.e., a set of locations or mobility as well as the type of smartphones used, and their true costs. On the contrary, we focus on a more real scenario where users with their own privacy concerns arrive one by one online in a random order. We model the problem as a privacy-respecting online auction in which users are willing to negotiate access to certain private information and submit their sensing profiles satisfying privacy concerns to the platform (the provider of crowd sensing applications) over time, and the platform aims to the total total value of the services provided by selected users under a budget constraint. We then design two online mechanisms for a budgeted crowd sensing application, satisfying the computational efficiency, individual rationality, budget feasibility, truthfulness, consumer sovereignty, constant competitiveness and privacy concerns. Through extensive simulations, we evaluate the performance and validate the theoretical properties of our online mechanisms.

preprint2014arXiv

Incentive Mechanisms for Mobile Crowd Sensing: Current States and Challenges of Work

Mobile crowd sensing (MCS) is a new paradigm which leverages the ubiquity of sensor-equipped mobile devices such as smartphones, music players, and in-vehicle sensors at the edge of the Internet, to collect data. The new paradigm will fuel the evolution of the Internet of Things to three changes as follows: First, the terminal devices at the edge of the Internet change from PCs to mobile phones. Second, the interactive mode extends from the virtual space to the real physical world. Thirdly, the forwarding manner of sensing data are undergoing the transition from the priori to the opportunistic. To better meet the demands of MCS applications at a societal scale, incentive mechanisms are indispensable. In this paper, we will first overview three categories of MCS applications, and then propose a new architecture for MCS applications. Based on the architecture, we discuss various research challenges about incentive mechanism designs for MCS applications, followed by potential future work discussions. Finally, we present potential future works.

preprint2014arXiv

Privacy-Preserving Verifiable Incentive Mechanism for Crowdsourcing Market Applications

Recently, a novel class of incentive mechanisms is proposed to attract extensive users to truthfully participate in crowd sensing applications with a given budget constraint. The class mechanisms also bring good service quality for the requesters in crowd sensing applications. Although it is so important, there still exists many verification and privacy challenges, including users' bids and subtask information privacy and identification privacy, winners' set privacy of the platform, and the security of the payment outcomes. In this paper, we present a privacy-preserving verifiable incentive mechanism for crowd sensing applications with the budget constraint, not only to explore how to protect the privacies of users and the platform, but also to make the verifiable payment correct between the platform and users for crowd sensing applications. Results indicate that our privacy-preserving verifiable incentive mechanism achieves the same results as the generic one without privacy preservation.

preprint2014arXiv

Service-Constraint Based Truthful Incentive Mechanisms for Crowd Sensing

Crowd sensing is a new paradigm which leverages the pervasive smartphones to efficiently collect and upload sensing data, enabling numerous novel applications. To achieve good service quality for a crowd sensing application, incentive mechanisms are necessary for attracting more user participation. Most of existing mechanisms apply only for the budget-constraint scenario where the platform (the crowd sensing organizer) has a budget limit. On the contrary, we focus on a different scenario where the platform has a service limit. Based on the offline and online auction model, we consider a general problem: users submit their private profiles to the platform, and the platform aims at selecting a subset of users before a specified deadline for minimizing the total payment while a specific service can be completed. Specially, we design offline and online service-constraint incentive mechanisms for the case where the value function of selected users is monotone submodular. The mechanisms are individual rationality, task feasibility, computational efficiency, truthfulness, consumer sovereignty, constant frugality, and also performs well in practice. Finally, we use extensive simulations to demonstrate the theoretical properties of our mechanisms.

preprint2013arXiv

Uplink Scheduling Strategy Based on A Population Game in Vehicular Sensor Networks

Recent advances in the integration of vehicular sensor network (VSN) technology, and crowd sensing leveraging pervasive sensors called onboard units (OBUs), like smartphones and radio frequency IDentifications to provide sensing services, have attracted increasing attention from both industry and academy. Nowadays, existing vehicular sensing applications lack good mechanisms to improve the maximum achievable throughput and minimizing service time of participating sensing OBUs in vehicular sensor networks. To fill these gaps, in this paper, first, we introduce real imperfect link states to the calculation of Markov chains. Second, we incorporate the result of different link states for multiple types of vehicles with the calculations of uplink throughput and service time. Third, in order to accurately calculate the service time of an OBU, we introduce the steady state probability to calculate the exact time of a duration for back-off decrement, rather than using the traditional relative probability. Additionally, to our best knowledge, we first explore a multichannel scheduling strategy of uplink data access in a single roadside unit (RSU) by using a non-cooperative game in a RSU coverage region to maximize the uplink throughput and minimize service time under saturated and unsaturated traffic loads. To this end, we conduct a theoretical analysis and find the equilibrium point of the scheduling. The numerical results show that the solution of the equilibrium points are consistent with optimization problems.

Jiajun Sun

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training

CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges

Multimodal Signal Processing For Thermo-Visible-Lidar Fusion In Real-time 3D Semantic Mapping

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

DATA: Domain-Aware and Task-Aware Self-supervised Learning

An Incentive Mechanism for Periodical Mobile Crowdsensing from a Frugality Perspective

Long-Term Profit-Maximizing Incentive for Crowd Sensing in Mobile Social Networks

Behavior-Based online Incentive Mechanism for Crowd Sensing with Budget Constraints

Collection-behavior based Multi-parameter Posted Pricing Mechanism for Crowd Sensing

General Privacy-Preserving Verifiable Incentive Mechanism for Crowdsourcing Markets

How Much Should I Pay for Privacy Concerns in Truthful Online Crowd Sensing?

Incentive Mechanisms for Mobile Crowd Sensing: Current States and Challenges of Work

Privacy-Preserving Verifiable Incentive Mechanism for Crowdsourcing Market Applications

Service-Constraint Based Truthful Incentive Mechanisms for Crowd Sensing

Uplink Scheduling Strategy Based on A Population Game in Vehicular Sensor Networks