Researcher profile

Chi Wang

Chi Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

HetScene: Heterogeneity-Aware Diffusion for Dense Indoor Scene Generation

Generating controllable and physically plausible indoor scenes is a pivotal prerequisite for constructing high-fidelity simulation environments for embodied AI. However, existing deeplearning-based methods usually treat all objects as homogeneous instances within a unified generation process. While effective for sparse and simplistic layouts, they struggle to model realistic layouts with dense object arrangements and complex spatial dependencies, leadingto limited scalability and degraded physical plausibility. To deal with these challenges, we revisit indoor layout generation from the perspective of structural heterogeneity and decompose the objects into primary objects and secondary objects according to their distinct roles in shaping a scene. Based on this decomposition, we propose HetScene, a heterogeneous two-stage generation framework that decouples indoor layout synthesis into Structural Layout Generation (SLG) and Contextual Layout Generation (CLG). SLG first generates globally coherent structural layouts with only primary objects conditioned on text descriptions, top-down binary room masks, and spatial relation graphs, establishing a stable global macro-skeleton of large core furniture.

preprint2022arXiv

ACE: Adaptive Constraint-aware Early Stopping in Hyperparameter Optimization

Deploying machine learning models requires high model quality and needs to comply with application constraints. That motivates hyperparameter optimization (HPO) to tune model configurations under deployment constraints. The constraints often require additional computation cost to evaluate, and training ineligible configurations can waste a large amount of tuning cost. In this work, we propose an Adaptive Constraint-aware Early stopping (ACE) method to incorporate constraint evaluation into trial pruning during HPO. To minimize the overall optimization cost, ACE estimates the cost-effective constraint evaluation interval based on a theoretical analysis of the expected evaluation cost. Meanwhile, we propose a stratum early stopping criterion in ACE, which considers both optimization and constraint metrics in pruning and does not require regularization hyperparameters. Our experiments demonstrate superior performance of ACE in hyperparameter tuning of classification tasks under fairness or robustness constraints.

preprint2022arXiv

Active Boundary Loss for Semantic Segmentation

This paper proposes a novel active boundary loss for semantic segmentation. It can progressively encourage the alignment between predicted boundaries and ground-truth boundaries during end-to-end training, which is not explicitly enforced in commonly used cross-entropy loss. Based on the predicted boundaries detected from the segmentation results using current network parameters, we formulate the boundary alignment problem as a differentiable direction vector prediction problem to guide the movement of predicted boundaries in each iteration. Our loss is model-agnostic and can be plugged in to the training of segmentation networks to improve the boundary details. Experimental results show that training with the active boundary loss can effectively improve the boundary F-score and mean Intersection-over-Union on challenging image and video object segmentation datasets.

preprint2022arXiv

Multivariate Sparse Group Lasso Joint Model for Radiogenomics Data

Radiogenomics is an emerging field in cancer research that combines medical imaging data with genomic data to predict patients clinical outcomes. In this paper, we propose a multivariate sparse group lasso joint model to integrate imaging and genomic data for building prediction models. Specifically, we jointly consider two models, one regresses imaging features on genomic features, and the other regresses patients clinical outcomes on genomic features. The regularization penalties through sparse group lasso allow incorporation of intrinsic group information, e.g. biological pathway and imaging category, to select both important intrinsic groups and important features within a group. To integrate information from the two models, in each model, we introduce a weight in the penalty term of each individual genomic feature, where the weight is inversely correlated with the model coefficient of that feature in the other model. This weight allows a feature to have a higher chance of selection by one model if it is selected by the other model. Our model is applicable to both continuous and time to event outcomes. It also allows the use of two separate datasets to fit the two models, addressing a practical challenge that many genomic datasets do not have imaging data available. Simulations and real data analyses demonstrate that our method outperforms existing methods in the literature.

preprint2022arXiv

Whistler Waves As a Signature of Converging Magnetic Holes in Space Plasmas

Magnetic holes are plasma structures that trap a large number of particles in a magnetic field that is weaker than the field in its surroundings. The unprecedented high time-resolution observations by NASA's Magnetospheric Multi-Scale (MMS) mission enable us to study the particle dynamics in magnetic holes in the Earth's magnetosheath in great detail. We reveal the local generation mechanism of whistler waves by a combination of Landau-resonant and cyclotron-resonant wave-particle interactions of electrons in response to the large-scale evolution of a magnetic hole. As the magnetic hole converges, a pair of counter-streaming electron beams form near the hole's center as a consequence of the combined action of betatron and Fermi effects. The beams trigger the generation of slightly-oblique whistler waves. Our conceptual prediction is supported by a remarkable agreement between our observations and numerical predictions from the Arbitrary Linear Plasma Solver (ALPS). Our study shows that wave-particle interactions are fundamental to the evolution of magnetic holes in space and astrophysical plasmas.

preprint2020arXiv

ALEX: An Updatable Adaptive Learned Index

Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory footprint. However, it is limited to static, read-only workloads. In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. ALEX effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint. On read-only workloads, ALEX beats the learned index from Kraska et al. by up to 2.2X on performance with up to 15X smaller index size. Across the spectrum of read-write workloads, ALEX beats B+Trees by up to 4.1X while never performing worse, with up to 2000X smaller index size. We believe ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates.

preprint2020arXiv

Evolution of the Earth's Magnetosheath Turbulence: A statistical study based on MMS observations

Composed of shocked solar wind, the Earth&#39;s magnetosheath serves as a natural laboratory to study the transition of turbulence from low Alfv{é}n Mach number, $M_\mathrm{A}$, to high $M_\mathrm{A}$. The simultaneous observations of magnetic field and plasma moments with unprecedented high temporal resolution provided by NASA&#39;s \textit{Magnetospheric Multiscale} Mission enable us to study the magnetosheath turbulence at both magnetohydrodynamics (MHD) and sub-ion scales. Based on 1841 burst-mode segments of MMS-1 from 2015/09 to 2019/06, comprehensive patterns of the spatial evolution of magnetosheath turbulences are obtained: (1) from the sub-solar region to the flanks, $M_\mathrm{A}$ increases from $<$ 1 to $>$ 5. At MHD scales, the spectral indices of the magnetic-field and velocity spectra present a positive and negative correlation with $M_\mathrm{A}$. However, no obvious correlations between the spectral indices and $M_\mathrm{A}$ are found at sub-ion scales. (2) from the bow shock to the magnetopause, the turbulent sonic Mach number, $M_{\mathrm{turb}}$, generally decreases from $>$ 0.4 to $<$ 0.1. All spectra steepen at MHD scales and flatten at sub-ion scales, representing a positive/negative correlations with $M_\mathrm{turb}$. The break frequency increases by 0.1 Hz when approaching the magnetopause for the magnetic-field and velocity spectra, while it remains at 0.3 Hz for the density spectra. (3) In spite of some differences, similar results are found for the quasi-parallel and quasi-perpendicular magnetosheath. In addition, the spatial evolution of magnetosheath turbulence is found to be independent of the upstream solar wind conditions, e.g., the Z-component of the interplanetary magnetic field and the solar wind speed.

preprint2020arXiv

GEE-TGDR: A longitudinal feature selection algorithm and its application to lncRNA expression profiles for psoriasis patients treated with immune therapies

With the fast evolution of high-throughput technology, longitudinal gene expression experiments have become affordable and increasingly common in biomedical fields. Generalized estimating equation (GEE) approach is a widely used statistical method for the analysis of longitudinal data. Feature selection is imperative in longitudinal omics data analysis. Among a variety of existing feature selection methods, an embedded method, namely, threshold gradient descent regularization (TGDR) stands out due to its excellent characteristics. An alignment of GEE with TGDR is a promising area for the purpose of identifying relevant markers that can explain the dynamic changes of outcomes across time. In this study, we proposed a new novel feature selection algorithm for longitudinal outcomes:GEE-TGDR. In the GEE-TGDR method, the corresponding quasi-likelihood function of a GEE model is the objective function to be optimized and the optimization and feature selection are accomplished by the TGDR method. We applied the GEE-TGDR method a longitudinal lncRNA gene expression dataset that examined the treatment response of psoriasis patients to immune therapy. Under different working correlation structures, a list including 10 relevant lncRNAs were identified with a predictive accuracy of 80 % and meaningful biological interpretation. To conclude, a widespread application of the proposed GEE-TGDR method in omics data analysis is anticipated.

preprint2020arXiv

Qd-tree: Learning Data Layouts for Big Data Analytics

Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice in most commercial database systems. However, the problem of best assigning records to data blocks on storage is still open. For example, today&#39;s systems usually partition data by arrival time into row groups, or range/hash partition the data based on selected fields. For a given workload, however, such techniques are unable to optimize for the important metric of the number of blocks accessed by a query. This metric directly relates to the I/O cost, and therefore performance, of most analytical queries. Further, they are unable to exploit additional available storage to drive this metric down further. In this paper, we propose a new framework called a query-data routing tree, or qd-tree, to address this problem, and propose two algorithms for their construction based on greedy and deep reinforcement learning techniques. Experiments over benchmark and real workloads show that a qd-tree can provide physical speedups of more than an order of magnitude compared to current blocking schemes, and can reach within 2X of the lower bound for data skipping based on selectivity, while providing complete semantic descriptions of created blocks.

preprint2020arXiv

TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications. For example, online retailers (e.g., Amazon and eBay) use taxonomies for product recommendation, and web search engines (e.g., Google and Bing) leverage taxonomies to enhance query understanding. Enormous efforts have been made on constructing taxonomies either manually or semi-automatically. However, with the fast-growing volume of web content, existing taxonomies will become outdated and fail to capture emerging knowledge. Therefore, in many applications, dynamic expansions of an existing taxonomy are in great demand. In this paper, we study how to expand an existing taxonomy by adding a set of new concepts. We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of <query concept, anchor concept> pairs from the existing taxonomy as training data. Using such self-supervision data, TaxoExpan learns a model to predict whether a query concept is the direct hyponym of an anchor concept. We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data. Extensive experiments on three large-scale datasets from different domains demonstrate both the effectiveness and the efficiency of TaxoExpan for taxonomy expansion.