Source author record

Tianyu Jia

Tianyu Jia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Hardware Architecture Machine Learning Applications astro-ph.CO Robotics

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Purely gravitational dark matter production in warm inflation

We consider an appealing scenario for the production of purely gravitational dark matter in the background of warm inflation, a mechanism that maintains stable thermal bath during inflation. Through systematic investigation of various gravitational production channels, we reveal distinctive features compared to the standard inflation scenario. Notably, the inflaton annihilation channel in warm inflation exhibits markedly different thermodynamics from the standard inflation paradigm, leading to a suppression on the production of sub-inflaton-mass dark matter. For the production channel of inflationary vacuum fluctuations, we find an abundance-mass correlation of $ρ_χ\propto m_χ^{1/2}(m_χ^{5/2})$ for the sub-Hubble-mass dark matter with minimal(conformal) coupling. Our results also indicate that a minimum temperature threshold of $10^{-6}M_P$ is necessary for warm inflation, which allows adequate dark matter production. With observational constraints, our results provide stringent limits on the mass range of purely gravitational dark matter with sufficient density: $10^{-8}-10^{-2}M_P$ for minimal coupling and $10^{-14}-10^{-2}M_P$ for conformal coupling.

preprint2025arXiv

Structured and sparse partial least squares coherence for multivariate cortico-muscular analysis

Multivariate cortico-muscular analysis has recently emerged as a promising approach for evaluating the corticospinal neural pathway. However, current multivariate approaches encounter challenges such as high dimensionality and limited sample sizes, thus restricting their further applications. In this paper, we propose a structured and sparse partial least squares coherence algorithm (ssPLSC) to extract shared latent space representations related to cortico-muscular interactions. Our approach leverages an embedded optimization framework by integrating a partial least squares (PLS)-based objective function, a sparsity constraint and a connectivity-based structured constraint, addressing the generalizability, interpretability and spatial structure. To solve the optimization problem, we develop an efficient alternating iterative algorithm within a unified framework and prove its convergence experimentally. Extensive experimental results from one synthetic and several real-world datasets have demonstrated that ssPLSC can achieve competitive or better performance over some representative multivariate cortico-muscular fusion methods, particularly in scenarios characterized by limited sample sizes and high noise levels. This study provides a novel multivariate fusion method for cortico-muscular analysis, offering a transformative tool for the evaluation of corticospinal pathway integrity in neurological disorders.

preprint2022arXiv

FRL-FI: Transient Fault Analysis for Federated Reinforcement Learning-Based Navigation Systems

Swarm intelligence is being increasingly deployed in autonomous systems, such as drones and unmanned vehicles. Federated reinforcement learning (FRL), a key swarm intelligence paradigm where agents interact with their own environments and cooperatively learn a consensus policy while preserving privacy, has recently shown potential advantages and gained popularity. However, transient faults are increasing in the hardware system with continuous technology node scaling and can pose threats to FRL systems. Meanwhile, conventional redundancy-based protection methods are challenging to deploy on resource-constrained edge applications. In this paper, we experimentally evaluate the fault tolerance of FRL navigation systems at various scales with respect to fault models, fault locations, learning algorithms, layer types, communication intervals, and data types at both training and inference stages. We further propose two cost-effective fault detection and recovery techniques that can achieve up to 3.3x improvement in resilience with <2.7% overhead in FRL systems.

preprint2022arXiv

OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge

Autonomous machines (e.g., vehicles, mobile robots, drones) require sophisticated 3D mapping to perceive the dynamic environment. However, maintaining a real-time 3D map is expensive both in terms of compute and memory requirements, especially for resource-constrained edge machines. Probabilistic OctoMap is a reliable and memory-efficient 3D dense map model to represent the full environment, with dynamic voxel node pruning and expansion capacity. This paper presents the first efficient accelerator solution, i.e. OMU, to enable real-time probabilistic 3D mapping at the edge. To improve the performance, the input map voxels are updated via parallel PE units for data parallelism. Within each PE, the voxels are stored using a specially developed data structure in parallel memory banks. In addition, a pruning address manager is designed within each PE unit to reuse the pruned memory addresses. The proposed 3D mapping accelerator is implemented and evaluated using a commercial 12 nm technology. Compared to the ARM Cortex-A57 CPU in the Nvidia Jetson TX2 platform, the proposed accelerator achieves up to 62$\times$ performance and 708$\times$ energy efficiency improvement. Furthermore, the accelerator provides 63 FPS throughput, more than 2$\times$ higher than a real-time requirement, enabling real-time perception for 3D mapping.

preprint2022arXiv

Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration

The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present Trireme, a fully automated tool-chain that explores multiple levels of parallelism and produces domain specific accelerator designs and configurations that maximize performance, given an area budget. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20x, as well as a speedup of up to 37x for smaller applications, compared to software-only implementations.

Tianyu Jia

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Purely gravitational dark matter production in warm inflation

Structured and sparse partial least squares coherence for multivariate cortico-muscular analysis

FRL-FI: Transient Fault Analysis for Federated Reinforcement Learning-Based Navigation Systems

OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge

Trireme: Exploring Hierarchical Multi-Level Parallelism for Domain Specific Hardware Acceleration