Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
26topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, local context checks, and multi-step hypothesis refinement are difficult to implement by calling a conventional off-the-shelf retriever, and evidence filtered out early cannot be recovered by stronger downstream reasoning. Agentic tasks further exacerbate this limitation because they require agents to orchestrate multiple steps, including discovering intermediate entities, combining weak clues, and revising the plan after observing partial evidence. To tackle the limitation, we study direct corpus interaction (DCI), where an agent searches the raw corpus directly with general-purpose terminal tools (e.g., grep, file reads, shell commands, lightweight scripts), without any embedding model, vector index, or retrieval API. This approach requires no offline indexing and adapts naturally to evolving local corpora. Across IR benchmarks and end-to-end agentic search tasks, this simple setup substantially outperforms strong sparse, dense, and reranking baselines on several BRIGHT and BEIR datasets, and attains strong accuracy on BrowseComp-Plus and multi-hop QA without relying on any conventional semantic retriever. Our results indicate that as language agents become stronger, retrieval quality depends not only on reasoning ability but also on the resolution of the interface through which the model interacts with the corpus, with which DCI opens a broader interface-design space for agentic search.

preprint2023arXiv

Side-by-Side vs Face-to-Face: Evaluating Colocated Collaboration via a Transparent Wall-sized Display

Traditional wall-sized displays mostly only support side-by-side co-located collaboration, while transparent displays naturally support face-to-face interaction. Many previous works assume transparent displays support collaboration. Yet it is unknown how exactly its afforded face-to-face interaction can support loose or close collaboration, especially compared to the side-by-side configuration offered by traditional large displays. In this paper, we used an established experimental task that operationalizes different collaboration coupling and layout locality, to compare pairs of participants collaborating side-by-side versus face-to-face in each collaborative situation. We compared quantitative measures and collected interview and observation data to further illustrate and explain our observed user behavior patterns. The results showed that the unique face-to-face collaboration brought by transparent display can result in more efficient task performance, different territorial behavior, and both positive and negative collaborative factors. Our findings provided empirical understanding about the collaborative experience supported by wall-sized transparent displays and shed light on its future design.

preprint2022arXiv

Active Learning Over Multiple Domains in Natural Language Tasks

Studies of active learning traditionally assume the target and source data stem from a single domain. However, in realistic applications, practitioners often require active learning with multiple sources of out-of-distribution data, where it is unclear a priori which data sources will help or hurt the target domain. We survey a wide variety of techniques in active learning (AL), domain shift detection (DS), and multi-domain sampling to examine this challenging setting for question answering and sentiment analysis. We ask (1) what family of methods are effective for this task? And, (2) what properties of selected examples and domains achieve strong results? Among 18 acquisition functions from 4 families of methods, we find H-Divergence methods, and particularly our proposed variant DAL-E, yield effective results, averaging 2-3% improvements over the random baseline. We also show the importance of a diverse allocation of domains, as well as room-for-improvement of existing methods on both domain and example selection. Our findings yield the first comprehensive analysis of both existing and novel methods for practitioners faced with multi-domain active learning for natural language tasks.

preprint2022arXiv

An Extended Halo-based Group/Cluster finder: application to the DESI legacy imaging surveys DR8

We extend the halo-based group finder developed by \citet[][]{Yang2005a} to use data {\it simultaneously} with either photometric or spectroscopic redshifts. A mock galaxy redshift survey constructed from a high-resolution N-body simulation is used to evaluate the performance of this extended group finder. For galaxies with magnitude ${\rm z\le 21}$ and redshift $0<z\le 1.0$ in the DESI legacy imaging surveys (the Legacy Surveys), our group finder successfully identifies more than 60\% of the members in about $90\%$ of halos with mass $\ga 10^{12.5}\msunh$. Detected groups with mass $\ga 10^{12.0}\msunh$ have a purity (the fraction of true groups) greater than 90\%. The halo mass assigned to each group has an uncertainty of about 0.2 dex at the high mass end $\ga 10^{13.5}\msunh$ and 0.40 dex at the low mass end. Groups with more than 10 members have a redshift accuracy of $\sim 0.008$. We apply this group finder to the Legacy Surveys DR8 and find 5.2 Million groups with at least 3 members. About 387,000 of these groups have at least 10 members. The resulting catalog containing 3D coordinates, richness, halo masses, and total group luminosities, is made publicly available.

preprint2022arXiv

COGEDAP: A COmprehensive GEnomic Data Analysis Platform

Non-sharable sensitive data collection and analysis in large-scale consortia for genomic research is complicated. Time consuming issues in installing software arise due to different operating systems, software dependencies and running the software. Therefore, easier, more standardized, automated protocols and platforms can be a solution to overcome these issues. We have developed one such solution for genomic data analysis using software container technologies. The platform, COGEDAP, consists of different software tools placed into Singularity containers with corresponding pipelines and instructions on how to perform genome-wide association studies (GWAS) and other genomic data analysis via corresponding tools. Using a provided helper script written in Python, users can obtain auto-generated scripts to conduct the desired analysis both on high-performance computing (HPC) systems and on personal computers. The analyses can be done by running these auto-generated scripts with the software containers. The helper script also performs minor re-formatting of the input/output data, so that the end user can work with a unified file format regardless of which genetic software is used for the analysis. COGEDAP is actively being used by users from different countries/projects to conduct their genomic data analyses. Thanks to this platform, users can easily run GWAS and other genomic analyses without spending much effort on software installation, data formats, and other technical requirements.

preprint2022arXiv

DePS: An improved deep learning model for de novo peptide sequencing

De novo peptide sequencing from mass spectrometry data is an important method for protein identification. Recently, various deep learning approaches were applied for de novo peptide sequencing and DeepNovoV2 is one of the represetative models. In this study, we proposed an enhanced model, DePS, which can improve the accuracy of de novo peptide sequencing even with missing signal peaks or large number of noisy peaks in tandem mass spectrometry data. It is showed that, for the same test set of DeepNovoV2, the DePS model achieved excellent results of 74.22%, 74.21% and 41.68% for amino acid recall, amino acid precision and peptide recall respectively. Furthermore, the results suggested that DePS outperforms DeepNovoV2 on the cross species dataset.

preprint2022arXiv

High spectral-resolution interferometry down to 1 micron with Asgard/BIFROST at VLTI: Science drivers and project overview

We present science cases and instrument design considerations for the BIFROST instrument that will open the short-wavelength (Y/J/H-band), high spectral dispersion (up to R=25,000) window for the VLT Interferometer. BIFROST will be part of the Asgard Suite of instruments and unlock powerful venues for studying accretion & mass-loss processes at the early/late stages of stellar evolution, for detecting accreting protoplanets around young stars, and for probing the spin-orbit alignment in directly-imaged planetary systems and multiple star systems. Our survey on GAIA binaries aims to provide masses and precision ages for a thousand stars, providing a legacy data set for improving stellar evolutionary models as well as for Galactic Archaeology. BIFROST will enable off-axis spectroscopy of exoplanets in the 0.025-1&#34; separation range, enabling high-SNR, high spectral resolution follow-up of exoplanets detected with ELT and JWST. We give an update on the status of the project, outline our key technology choices, and discuss synergies with other instruments in the proposed Asgard Suite of instruments.

preprint2022arXiv

Magnetism in doped infinite-layer NdNiO2 studied by combined density functional theory and dynamical mean-field theory

The recent observation of superconductivity in infinite-layer nickelates has brought intense debate on the established knowledge of unconventional superconductivity based on the cuprates. Despite many similarities, the nickelates differ from the cuprates in many characteristics, the most notable one among which is the magnetism. Instead of a canonical antiferromagnetic Mott insulator as the undoped cuprates, from which the superconductivity is generally believed to arise upon doping, the undoped nickelates show no sign of magnetic ordering in experiments. Through a combined density functional theory, dynamical mean-field theory, and model study, we show that although the increased energy splitting between O-$p$ orbital and Cu/Ni-$d$ orbital ($Δ_{dp}$) results in larger magnetic moment in nickelates, it also leads to stronger antiferromagnetism/ferromagnetism competition, and weaker magnetic exchange coupling. Meanwhile, the self-doping effect caused by Nd-$d$ orbital screens the magnetic moment of Ni. The Janus-faced effect of $Δ_{dp}$ and self-doping effect together give a systematic understanding of magnetic behavior in nickelates and explain recent experimental observations.

preprint2022arXiv

Matrix Syncer -- A Multi-chain Data Aggregator For Supporting Blockchain-based Metaverses

Due to the rising complexity of the metaverse&#39;s business logic and the low-latency nature of the metaverse, developers typically encounter the challenge of effectively reading, writing, and retrieving historical on-chain data in order to facilitate their functional implementations at scale. While it is true that accessing blockchain states is simple, more advanced real-world operations such as search, aggregation, and conditional filtering are not available when interacting directly with blockchain networks, particularly when dealing with requirements for on-chain event reflection. We offer Matrix Syncer, the ultimate middleware that bridges the data access gap between blockchains and end-user applications. Matrix Syncer is designed to facilitate the consolidation of on-chain information into a distributed data warehouse while also enabling customized on-chain state transformation for a scalable storage, access, and retrieval. It offers a unified layer for both on- and off-chain state, as well as a fast and flexible atomic query. Matrix Syncer is easily incorporated into any infrastructure to aggregate data from various blockchains concurrently, such as Ethereum and Flow. The system has been deployed to support several metaverse projects with a total value of more than $15 million USD.

preprint2022arXiv

Nuclear states projected from a pair condensate

Atomic nuclei exhibit deformation, pairing correlations, and rotational symmetries. To meet these competing demands in a computationally tractable formalism, we revisit the use of general pair condensates with good particle number as a trial wave function for even-even nuclei. After minimizing the energy of the condensate, we project out states with good angular momentum with a fast projection technique, allowing for general triaxial deformations. To show applicability, we present example calculations from pair condensates in several model spaces, and compare against projected Hartree-Fock and full configuration-interaction shell model calculations. This approach successfully generates spherical, vibrational and rotational spectra, demonstrating potential for modeling medium- to heavy-mass nuclei.

preprint2020arXiv

BAlN alloy for enhanced two-dimensional electron gas characteristics of GaN-based high electron mobility transistor

The emerging wide bandgap BAlN alloys have potentials for improved III-nitride power devices including high electron mobility transistor (HEMT). Yet few relevant studies have been carried. In this work, we have investigated the use of the B0.14Al0.86N alloy as part or entirety of the interlayer between the GaN buffer and the AlGaN barrier in the conventional GaN-based high electron mobility transistor (HEMT). The numerical results show considerable improvement of the two-dimensional electron gas (2DEG) concentration with small 2DEG leakage into the ternary layer by replacing the conventional AlN interlayer by either the B0.14Al0.86N interlayer or the B0.14Al0.86N/AlN hybrid interlayer. Consequently, the transfer characteristics can be improved. The saturation current can be enhanced as well. For instance, the saturation currents for HEMTs with the 0.5 nm B0.14Al0.86N/0.5 nm AlN hybrid interlayer and the 1 nm B0.14Al0.86N interlayer are 5.8% and 2.2% higher than that for the AlN interlayer when VGS-Vth= +3 V.

preprint2020arXiv

BAlN for III-nitride UV light emitting diodes: undoped electron blocking layer

The undoped BAlN electron-blocking layer (EBL) is investigated to replace the conventional AlGaN EBL in light-emitting diodes (LEDs). Numerical studies of the impact of variously doped EBLs on the output characteristics of LEDs demonstrate that the LED performance shows heavy dependence on the p-doping level in the case of the AlGaN EBL, while it shows less dependence on the p-doping level for the BAlN EBL. As a result, we propose an undoped BAlN EBL for LEDs to avoid the p-doping issues, which a major technical challenge in the AlGaN EBL. Without doping, the proposed BAlN EBL structure still possesses a superior capacity in blocking electrons and improving hole injection compared with the AlGaN EBL having high doping. This study provides a feasible route to addressing electron leakage and insufficient hole injection issues when designing UV LED structures.

preprint2020arXiv

ContourRend: A Segmentation Method for Improving Contours by Rendering

A good object segmentation should contain clear contours and complete regions. However, mask-based segmentation can not handle contour features well on a coarse prediction grid, thus causing problems of blurry edges. While contour-based segmentation provides contours directly, but misses contours&#39; details. In order to obtain fine contours, we propose a segmentation method named ContourRend which adopts a contour renderer to refine segmentation contours. And we implement our method on a segmentation model based on graph convolutional network (GCN). For the single object segmentation task on cityscapes dataset, the GCN-based segmentation con-tour is used to generate a contour of a single object, then our contour renderer focuses on the pixels around the contour and predicts the category at high resolution. By rendering the contour result, our method reaches 72.41% mean intersection over union (IoU) and surpasses baseline Polygon-GCN by 1.22%.

preprint2020arXiv

Exact sum rules with approximate ground states

Electromagnetic and weak transitions tell us a great deal about the structure of atomic nuclei. Yet modeling transitions can be difficult: it is often easier to compute the ground state, if only as an approximation, than excited states. One alternative is through transition sum rules, in particular the non-energy-weighted and energy-weighted sum rules, which can be computed as expectation values of operators. We investigate by computing sum rules for a variety of nuclei, comparing the numerically exact full configuration-interaction shell model, as a reference, to Hartree-Fock, projected Hartree-Fock, and the nucleon pair approximation. These approximations yield reasonable agreement, which we explain by prior work on the systematics of transition moments.

preprint2020arXiv

Graph Computing based Distributed State Estimation with PMUs

Power system state estimation plays a fundamental and critical role in the energy management system (EMS). To achieve a high performance and accurate system states estimation, a graph computing based distributed state estimation approach is proposed in this paper. Firstly, a power system network is divided into multiple areas. Reference buses are selected with PMUs being installed at these buses for each area. Then, the system network is converted into multiple independent areas. In this way, the power system state estimation could be conducted in parallel for each area and the estimated system states are obtained without compromise of accuracy. IEEE 118-bus system and MP 10790-bus system are employed to verify the results accuracy and present the promising computation performance.

preprint2020arXiv

Graph-FCN for image semantic segmentation

Semantic segmentation with deep learning has achieved great progress in classifying the pixels in the image. However, the local location information is usually ignored in the high-level feature extraction by the deep learning, which is important for image semantic segmentation. To avoid this problem, we propose a graph model initialized by a fully convolutional network (FCN) named Graph-FCN for image semantic segmentation. Firstly, the image grid data is extended to graph structure data by a convolutional network, which transforms the semantic segmentation problem into a graph node classification problem. Then we apply graph convolutional network to solve this graph node classification problem. As far as we know, it is the first time that we apply the graph convolutional network in image semantic segmentation. Our method achieves competitive performance in mean intersection over union (mIOU) on the VOC dataset(about 1.34% improvement), compared to the original FCN model.

preprint2020arXiv

Polar Rectification Effect in Electro-Fatigued SrTiO3 Based Junctions

Rectifying semiconductor junctions are crucial to electronic devices. They convert alternating current into direct one by allowing unidirectional charge flows. In analogy to the current-flow rectification for itinerary electrons, here, a polar rectification that based on the localized oxygen vacancies (OVs) in a Ti/fatigued-SrTiO3 (fSTO) Schottky junction is first demonstrated. The fSTO with OVs is produced by an electro-degradation process. The different movability of localized OVs and itinerary electrons in the fSTO yield a unidirectional electric polarization at the interface of the junction under the coaction of external and built-in electric fields. Moreover, the fSTO displays a pre-ferroelectric state located between paraelectric and ferroelectric phases. The pre-ferroelectric state has three sub-states and can be easily driven into a ferroelectric state by external electric field. These observations open up opportunities for potential polar devices and may underpin many useful polar-triggered electronic phenomena.

preprint2020arXiv

Polarity induced electronic and atomic reconstruction at NdNiO2/SrTiO3 interfaces

Superconductivity has recently been observed in Sr-doped NdNiO2 films grown on SrTiO3. Whether it is caused by or related to the interface remains an open question. To address this issue, we use density functional theory calculation and charge transfer self-consistent model to study the effects of polar discontinuity on the electronic and atomic reconstruction at the NdNiO2/SrTiO3 interface. We find that sharp interface with pure electronic reconstruction only is energetically unfavorable, and atomic reconstruction is unavoidable. We further propose a possible interface configuration that contain residual apical oxygen. These oxygen atoms lead to hybrids of dz2 and dx2-y2 states at the Fermi level, which weaken the single-band feature and may be detrimental to superconductivity.

preprint2020arXiv

Populating HI gas in dark matter halos: I. method

We combine data from the Sloan Digital Sky Survey (SDSS) and the Arecibo Legacy Fast ALFA Survey (ALFALFA) to establish an empirical model for the HI gas content within dark matter halos. A cross-match between our SDSS DR7 galaxy group sample and the ALFALFA HI sources provides a catalog of 16,520 HI-galaxy pairs within 14,270 galaxy groups (halos). Using these matched pairs, we model the HI gas mass distributions within halos using two components: 1) {\it in situ} galaxy relations that involve the HI masses, colors $({\rm g-r})$ and stellar masses 2) an {\it ex situ} dependence of the HI mass on the halo mass/environment. We find that if we solely use galaxy associated scaling relations to predict the HI gas distribution (solely component 1), the number of HI detections is significantly over-predicted with respect the ALFALFA observations. We introduce a concept for the survival of the HI masses/members within halos of different masses labelled as the `efficiency&#39; factor, in order to describe the probability that a halo has in retaining its HI detections. Taking the above consideration into account we construct a `halo based HI mass model&#39; which does not only predict the HI masses of galaxies, but also yields similar number, stellar, halo mass and satellite fraction distributions to the HI detections retrieved from observational data.

preprint2020arXiv

Prob2Vec: Mathematical Semantic Embedding for Problem Retrieval in Adaptive Tutoring

We propose a new application of embedding techniques for problem retrieval in adaptive tutoring. The objective is to retrieve problems whose mathematical concepts are similar. There are two challenges: First, like sentences, problems helpful to tutoring are never exactly the same in terms of the underlying concepts. Instead, good problems mix concepts in innovative ways, while still displaying continuity in their relationships. Second, it is difficult for humans to determine a similarity score that is consistent across a large enough training set. We propose a hierarchical problem embedding algorithm, called Prob2Vec, that consists of abstraction and embedding steps. Prob2Vec achieves 96.88\% accuracy on a problem similarity test, in contrast to 75\% from directly applying state-of-the-art sentence embedding methods. It is interesting that Prob2Vec is able to distinguish very fine-grained differences among problems, an ability humans need time and effort to acquire. In addition, the sub-problem of concept labeling with imbalanced training data set is interesting in its own right. It is a multi-label problem suffering from dimensionality explosion, which we propose ways to ameliorate. We propose the novel negative pre-training algorithm that dramatically reduces false negative and positive ratios for classification, using an imbalanced training data set.

preprint2020arXiv

Topotactic hydrogen in nickelate superconductors and akin infinite-layer oxides ABO2

Superconducting nickelates appear to be difficult to synthesize. Since the chemical reduction of ABO3 (A: rare earth; B transition metal) with CaH2 may result in both, ABO2 and ABO2H, we calculate the topotactic H binding energy by density functional theory (DFT). We find intercalating H is energetically favorable for LaNiO2 but not for Sr-doped NdNiO2. This has dramatic consequences for the electronic structure as determined by DFT+dynamical mean field theory: that of 3d9 LaNiO2 is similar to (doped) cuprates, 3d8 LaNiO2H is a two-orbital Mott insulator. Topotactical H might hence explain why some nickelates are superconducting and others are not.