Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
27works
0followers
27topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

27 published item(s)

preprint2026arXiv

BayesRAG: Probabilistic Mutual Evidence Corroboration for Multimodal Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) has become a pivotal paradigm for Large Language Models (LLMs), yet current approaches struggle with visually rich documents by treating text and images as isolated retrieval targets. Existing methods relying solely on cosine similarity often fail to capture the semantic reinforcement provided by cross-modal alignment and layout-induced coherence. To address these limitations, we propose BayesRAG, a novel multimodal retrieval framework grounded in Bayesian inference and Dempster-Shafer evidence theory. Unlike traditional approaches that rank candidates strictly by similarity, BayesRAG models the intrinsic consistency of retrieved candidates across modalities as probabilistic evidence to refine retrieval confidence. Specifically, our method computes the posterior association probability for combinations of multimodal retrieval results, prioritizing text-image pairs that mutually corroborate each other in terms of both semantics and layout. Extensive experiments demonstrate that BayesRAG significantly outperforms state-of-the-art (SOTA) methods on challenging multimodal benchmarks. This study establishes a new paradigm for multimodal retrieval fusion that effectively resolves the isolation of heterogeneous modalities through an evidence fusion mechanism and enhances the robustness of retrieval outcomes. Our code is available at https://github.com/TioeAre/BayesRAG.

preprint2026arXiv

LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification

Chain-of-thought (CoT) reasoning improves large language models (LLMs) on difficult tasks, but it also makes inference expensive because every intermediate step must be generated as a discrete token. Latent reasoning reduces visible token generation by propagating continuous states, yet replacing explicit derivations with latent computation can hurt tasks that require symbolic checking. We propose Latent-Then-Explicit Reasoning (LaTER), a two-stage paradigm that first performs bounded exploration in a continuous latent space and then switches to explicit CoT for verification and answer generation. In a training-free instantiation, LaTER projects final-layer hidden states back to the input embedding space, preserves the latent KV cache, and uses entropy and model-native stop-token probes to decide when to switch. We find that strong reasoning models already exhibit structured latent trajectories under this interface. On Qwen3-14B, training-free LaTER reduces total token usage by 16%-32% on several benchmarks while matching or improving accuracy on most of them; for example, it improves AIME 2025 from 70.0% to 73.3% while reducing tokens from 15,730 to 10,661. We further construct Latent-Switch-69K, a supervised corpus that pairs condensed solution intuitions with shortened explicit derivations. Fine-tuning with latent rollout and halting supervision yields additional gains: trained LaTER reaches 80.0% accuracy on AIME 2025, 10.0 points above the standard CoT baseline, while using 33% fewer tokens. Our code, data, and model are available at https://github.com/TioeAre/LaTER.

preprint2026arXiv

Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge

The 2025 BEHAVIOR Challenge is designed to rigorously track progress toward solving long-horizon tasks by physical agents in simulated environments. BEHAVIOR-1K focuses on everyday household tasks that people most want robots to assist with and these tasks introduce long-horizon mobile manipulation challenges in realistic settings, bridging the gap between current research and real-world, human-centric applications. This report presents our solution to the 2025 BEHAVIOR Challenge in a very close 2nd place and substantially outperforms the rest of the submissions. Building on $π_{0.5}$, we focus on systematically building our solution by studying the effects of training techniques and data. Through careful ablation studies, we reveal the scaling benefits in both the pre-training and post-training phases, leading to a validation Q-score of 0.345, significantly surpassing previous state-of-the-art performance. We summarize our practical lessons and design recommendations that we hope will provide actionable insights for the broader embodied AI community when adapting powerful foundation models to complex embodied scenarios. Project page: https://github.com/mli0603/openpi-comet

preprint2025arXiv

Rheology of bidisperse suspensions at the colloidal-to-granular transition

We use particle-based simulation to study the rheology of dense suspensions comprising mixtures of small colloids and larger grains, which exhibit shear thinning at low shear rates and shear thickening at high shear rates. By systematically varying the volume fraction of the two species, we demonstrate a monotonic increase in viscosity when grains are added to colloids, but, conversely, a nonmonotonic response in both the viscosity and shear thickening onset when colloids are added to grains. Both effects are most prominent at intermediate shear rates where diffusion and convection play similar roles in the dynamics. We rationalise these results by measuring the maximum flowable volume fraction as functions of the Peclet number and composition, showing that in extreme cases increasing the solids content can allow a jammed suspension to flow. These results establish a constitutive description for the rheology of bidisperse suspensions across the colloidal-to-granular transition, with implications for flow prediction and control in multicomponent particulate systems.

preprint2022arXiv

2022 Roadmap on Neuromorphic Computing and Engineering

Modern computation based on the von Neumann architecture is today a mature cutting-edge science. In the Von Neumann architecture, processing and memory units are implemented as separate blocks interchanging data intensively and continuously. This data transfer is responsible for a large part of the power consumption. The next generation computer technology is expected to solve problems at the exascale with 1018 calculations each second. Even though these future computers will be incredibly powerful, if they are based on von Neumann type architectures, they will consume between 20 and 30 megawatts of power and will not have intrinsic physically built-in capabilities to learn or deal with complex data as our brain does. These needs can be addressed by neuromorphic computing systems which are inspired by the biological concepts of the human brain. This new generation of computers has the potential to be used for the storage and processing of large amounts of digital information with much lower power consumption than conventional processors. Among their potential future applications, an important niche is moving the control from data centers to edge devices. The aim of this Roadmap is to present a snapshot of the present state of neuromorphic technology and provide an opinion on the challenges and opportunities that the future holds in the major areas of neuromorphic technology, namely materials, devices, neuromorphic circuits, neuromorphic algorithms, applications, and ethics. The Roadmap is a collection of perspectives where leading researchers in the neuromorphic community provide their own view about the current state and the future challenges. We hope that this Roadmap will be a useful resource to readers outside this field, for those who are just entering the field, and for those who are well established in the neuromorphic community. https://doi.org/10.1088/2634-4386/ac4a83

preprint2022arXiv

D-brane Superpotentials and Geometric Invariants in Complete Intersection Calabi-Yau Manifolds

By blowing up the ambient space along the curve wrapped by B-branes, we study the brane superpotentials and Ooguri-Vafa invariants on complete intersections Calabi-Yau threefolds. On the topological B-model side, B-brane superpotentials are expressed in terms of the period integral of the blow-up manifolds. By mirror maps, the superpotentials are generating functions of Ooguri-Vafa invariants counting holomorphic disks on the topological A-model side.

preprint2022arXiv

Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval

Remote sensing (RS) cross-modal text-image retrieval has attracted extensive attention for its advantages of flexible input and efficient query. However, traditional methods ignore the characteristics of multi-scale and redundant targets in RS image, leading to the degradation of retrieval accuracy. To cope with the problem of multi-scale scarcity and target redundancy in RS multimodal retrieval task, we come up with a novel asymmetric multimodal feature matching network (AMFMN). Our model adapts to multi-scale feature inputs, favors multi-source retrieval methods, and can dynamically filter redundant features. AMFMN employs the multi-scale visual self-attention (MVSA) module to extract the salient features of RS image and utilizes visual features to guide the text representation. Furthermore, to alleviate the positive samples ambiguity caused by the strong intraclass similarity in RS image, we propose a triplet loss function with dynamic variable margin based on prior similarity of sample pairs. Finally, unlike the traditional RS image-text dataset with coarse text and higher intraclass similarity, we construct a fine-grained and more challenging Remote sensing Image-Text Match dataset (RSITMD), which supports RS image retrieval through keywords and sentence separately and jointly. Experiments on four RS text-image datasets demonstrate that the proposed model can achieve state-of-the-art performance in cross-modal RS text-image retrieval task.

preprint2022arXiv

Monadic Pavlovian associative learning in a backpropagation-free photonic network

Over a century ago, Ivan P. Pavlov, in a classic experiment, demonstrated how dogs can learn to associate a ringing bell with food, thereby causing a ring to result in salivation. Today, it is rare to find the use of Pavlovian type associative learning for artificial intelligence (AI) applications even though other learning concepts, in particular backpropagation on artificial neural networks (ANNs) have flourished. However, training using the backpropagation method on 'conventional' ANNs, especially in the form of modern deep neural networks (DNNs), is computationally and energy intensive. Here we experimentally demonstrate a form of backpropagation-free learning using a single (or monadic) associative hardware element. We realize this on an integrated photonic platform using phase-change materials combined with on-chip cascaded directional couplers. We then develop a scaled-up circuit network using our monadic Pavlovian photonic hardware that delivers a distinct machine-learning framework based on single-element associations and, importantly, using backpropagation-free architectures to address general learning tasks. Our approach reduces the computational burden imposed by learning in conventional neural network approaches, thereby increasing speed, whilst also offering higher bandwidth inherent to our photonic implementation.

preprint2022arXiv

Open Topological String Amplitudes and BPS Invariants on Complete Intersection Calabi-Yau Threefolds

Open topological string partition function on compact Calabi-Yau threefolds satisfies the extended holomorphic anomaly equation. By direct integration, we solve these equations and obtain partition functions for first several genus and boundaries on complete intersection Calabi-Yau threefolds. Complemented by the unoriented worldsheet contribution, the annulus functions encode the genus one BPS invariants.

preprint2022arXiv

Open Topological String Amplitudes on Calabi-Yau Threefolds by Extended Holomorphic Anomaly Equation

In this paper, we study the open topological string amplitudes on Calabi-Yau threefolds by the extended holomorphic anomaly equation. The disk two-point function determined by the domainwall tension, together with the Yukawa couplings, solves the amplitudes with high genus and boundaries recursively. The BPS invariants encoded in the amplitudes are extracted by mirror symmetry.

preprint2022arXiv

Robust single-sideband-modulated Raman light generation for atom interferometry by FBG-based optical rectangular filtration

Low-phase-noise and pure-spectrum Raman light is vital for high-precision atom interferometry by two-photon Raman transition. A preferred and prevalent solution for Raman light generation is electro-optic phase modulation. However, phase modulation inherently brings in double sidebands, resulting in residual sideband effects of multiple laser pairs beside Raman light in atom interferometry. Based on a well-designed rectangular fiber Bragg grating and an electro-optic modulator, optical single-sideband modulation has been realized at 1560 nm with a stable suppression ratio better than -25 dB despite of intense temperature variations. After optical filtration and frequency doubling, a robust phase-coherent Raman light at 780 nm is generated with a stable SNR of better than -19 dB and facilitates measuring the local gravity successfully. This proposed all-fiber single-sideband-modulated Raman light source, characterized as robust, compact and low-priced, is practical and potential for field applications of portable atom interferometry.

preprint2022arXiv

Superpotentials of D-branes in Calabi-Yau Manifolds with Several Moduli by Mirror Symmetry and Blown-up

We study B-brane superpotentials depending on several closed- and open- moduli on Calabi-Yau hypersurfaces and complete intersections. By blowing up the ambient space along a curve wrapped by B-branes in a Calabi-Yau manifold, we obtain a blow-up new manifold and the period integral satisfying the GKZ-system. Via mirror symmetry to A-model, we calculate the superpotentials and extract Ooguri-Vafa invariants for concrete examples of several open-closed moduli in Calabi-Yau manifolds.

preprint2022arXiv

Topology Optimization with Frictional Self-Contact

Contact-aware topology optimization faces challenges in robustness, accuracy, and applicability to internal structural surfaces under self-contact. This work builds on the recently proposed barrier-based Incremental Potential Contact (IPC) model and presents a new self-contact-aware topology optimization framework. A combination of SIMP, adjoint sensitivity analysis, and the IPC frictional-contact model is investigated. Numerical examples for optimizing varying objective functions under contact are presented. The resulting algorithm proposed solves topology optimization for large deformation and complex frictionally contacting scenarios with accuracy and robustness.

preprint2022arXiv

Variational determination of arbitrarily many eigenpairs in one quantum circuit

The state-of-the-art quantum computing hardware has entered the noisy intermediate-scale quantum (NISQ) era. Having been constrained by the limited number of qubits and shallow circuit depth, NISQ devices have nevertheless demonstrated the potential of applications on various subjects. One example is the variational quantum eigensolver (VQE) that was first introduced for computing ground states. Although VQE has now been extended to the study of excited states, the algorithms previously proposed involve a recursive optimization scheme which requires many extra operations with significantly deeper quantum circuits to ensure the orthogonality of different trial states. Here we propose a new algorithm to determine many low energy eigenstates simultaneously. By introducing ancillary qubits to purify the trial states so that they keep orthogonal to each other throughout the whole optimization process, our algorithm allows these states to be efficiently computed in one quantum circuit. Our algorithm reduces significantly the complexity of circuits and the readout errors, and enables flexible post-processing on the eigen-subspace from which the eigenpairs can be accurately determined. We demonstrate this algorithm by applying it to the transverse Ising model. By comparing the results obtained using this variational algorithm with the exact ones, we find that the eigenvalues of the Hamiltonian converge quickly with the increase of the circuit depth. The accuracies of the converged eigenvalues are of the same order, which implies that the difference between any two eigenvalues can be more accurately determined than the eigenvalues themselves.

preprint2021arXiv

BFEMP: Interpenetration-Free MPM-FEM Coupling with Barrier Contact

This paper introduces BFEMP, a new approach for monolithically coupling the Material Point Method (MPM) with the Finite Element Method (FEM) through barrier energy-based particle-mesh frictional contact using a variational time-stepping formulation. The fully implicit time integration of the coupled system is recast into a barrier-augmented unconstrained nonlinear optimization problem. A modified line-search Newton's method is adopted to strictly prevent material points from penetrating the FEM domain, ensuring convergence and feasibility regardless of the time step size or the mesh resolutions. The proposed coupling scheme also reduces to a new approach for imposing separable frictional kinematic boundaries for MPM when all nodal displacements in the FEM domain are prescribed with Dirichlet boundary conditions. Compared to standard implicit time integration, the extra algorithmic components associated with the contact treatment only depend on simple point-segment (or point-triangle in 3D) geometric queries which robustly handle arbitrary FEM mesh boundaries represented with codimension-1 simplices. Experiments and analyses are performed to demonstrate the robustness and accuracy of the proposed method.

preprint2021arXiv

Gapless Spin Liquid Behavior in A Kagome Heisenberg Antiferromagnet with Randomly Distributed Hexagons of Alternate Bonds

We demonstrate that the new single crystal of YCu$_3$[OH(D)]$_{6.5}$Br$_{2.5}$ (YCOB) is a kagome Heisenberg antiferromagnet (KHA) without evident orphan spins ($\ll$ 0.8\%). The site mixing between polar OH$^-$ and non-polar Br$^-$ causes local distortions of Cu-O-Cu exchange paths, and gives rise to 70(2)\% of randomly distributed hexagons of alternate bonds ($\sim$ $J_1-ΔJ$ and $J_1+ΔJ$) and the rest of almost uniform hexagons ($\sim$ $J_1$) on the kagome lattice. Simulations of the random exchange model with $ΔJ$/$J_1$ = 0.7(1) show good agreement with the experimental observations, including the weak upturn seen in susceptibility and the slight polarization in magnetization. Despite the average antiferromagnetic coupling of $J_1$ $\sim$ 60 K, no conventional freezing is observed down to $T$ $\sim$ 0.001$J_1$, and the raw specific heat exhibits a nearly quadratic temperature dependence below 1 K $\sim$ 0.02$J_1$, phenomenologically consistent with a gapless (spin gap $\leq$ 0.025$J_1$) Dirac quantum spin liquid (QSL). Our result sheds new light on the theoretical understanding of the randomness-relevant gapless QSL behavior in YCOB, as well as in other relevant materials.

preprint2021arXiv

Heavy Flavor and Jet Studies for the Future Electron-Ion Collider to Explore the Hadronization Process

Heavy flavor production at the future Electron-Ion Collider (EIC) will allow us to precisely determine the quark/gluon fragmentation processes in vacuum and the nuclear medium especially within the poorly constrained kinematic region. Heavy flavor hadron and jet reconstruction with the recent EIC detector design have been studies in simulation. Results of corresponding physics projections such as the flavor dependent hadron nuclear modification factor $R_{eA}$ in electron+nucleus collisions will be shown. The statistical precision obtained by these proposed heavy flavor measurements for the future EIC provides a strong discriminating power in separating different theoretical predictions.

preprint2021arXiv

Self-Amplification of Coherent Energy Modulation in Seeded Free-Electron Lasers

The spectroscopic techniques for time-resolved fine analysis of matter require coherent X-ray radiation with femtosecond duration and high average brightness. Seeded free-electron lasers (FELs), which use the frequency up-conversion of an external seed laser to improve temporal coherence, are ideal for providing fully coherent soft X-ray pulses. However, it is difficult to operate seeded FELs at a high repetition rate due to the limitations of present state-of-the-art laser systems. Here, we report the novel self-modulation method for enhancing laser-induced energy modulation, thereby significantly reducing the requirement of an external laser system. Driven by this scheme, we experimentally realize high harmonic generation in a seeded FEL using an unprecedentedly small energy modulation. An electron beam with a laser-induced energy modulation as small as 1.8 times the slice energy spread is used for lasing at the 7th harmonic of a 266-nm seed laser in a single-stage high-gain harmonic generation (HGHG) setup and the 30th harmonic of the seed laser in a two-stage HGHG setup. The results mark a major step towards a high-repetition-rate, fully coherent X-ray FEL.

preprint2020arXiv

A New Heavy Flavor Program for the Future Electron-Ion Collider

The proposed high-energy and high-luminosity Electron-Ion Collider (EIC) will provide one of the cleanest environments to precisely determine the nuclear parton distribution functions (nPDFs) in a wide $x$-$Q^{2}$ range. Heavy flavor production at the EIC provides access to nPDFs in the poorly constrained high Bjorken-$x$ region, allows us to study the quark and gluon fragmentation processes, and constrains parton energy loss in cold nuclear matter. Scientists at the Los Alamos National Laboratory are developing a new physics program to study heavy flavor production, flavor tagged jets, and heavy flavor hadron-jet correlations in the nucleon/nucleus going direction at the future EIC. The proposed measurements will provide a unique way to explore the flavor dependent fragmentation functions and energy loss in a heavy nucleus. They will constrain the initial-state effects that are critical for the interpretation of previous and ongoing heavy ion measurements at the Relativistic Heavy Ion Collider and the Large Hadron Collider. We show an initial conceptual design of the proposed Forward Silicon Tracking (FST) detector at the EIC, which is essential to carry out the heavy flavor measurements. We further present initial feasibility studies/simulations of heavy flavor hadron reconstruction using the proposed FST.

preprint2020arXiv

A Proposed Forward Silicon Tracker for the Future Electron-Ion Collider and Associated Physics Studies

The future Electron-Ion Collider (EIC) will explore several fundamental questions in a broad Bjorken-x ($x_{BJ}$) and $Q^{2}$ phase space. Heavy flavor and jet products are ideal probes to precisely study the tomography of nucleon/nuclei structure, help solve the proton spin puzzle and understand the hadronizaton processes in vacuum or in the QCD medium. Due to the asymmetric collisions at the EIC, most of the final state hadrons are produced in the nucleon/nuclei beam going (forward) direction. A silicon vertex/tracking is critical to precisely measure these forward hadrons at the EIC. Details of different conceptual designs of the proposed Forward Silicon Tracker (FST) and the relevant detector performance are presented in this technical note. The associated heavy flavor and jet studies with the evaluated FST performance are discussed as well.

preprint2020arXiv

An End-to-End Dialogue State Tracking System with Machine Reading Comprehension and Wide & Deep Classification

This paper describes our approach in DSTC 8 Track 4: Schema-Guided Dialogue State Tracking. The goal of this task is to predict the intents and slots in each user turn to complete the dialogue state tracking (DST) based on the information provided by the task's schema. Different from traditional stage-wise DST, we propose an end-to-end DST system to avoid error accumulation between the dialogue turns. The DST system consists of a machine reading comprehension (MRC) model for non-categorical slots and a Wide & Deep model for categorical slots. As far as we know, this is the first time that MRC and Wide & Deep model are applied to DST problem in a fully end-to-end way. Experimental results show that our framework achieves an excellent performance on the test dataset including 50% zero-shot services with a joint goal accuracy of 0.8652 and a slot tagging F1-Score of 0.9835.

preprint2020arXiv

CASIA-SURF CeFA: A Benchmark for Multi-modal Cross-ethnicity Face Anti-spoofing

Ethnic bias has proven to negatively affect the performance of face recognition systems, and it remains an open research problem in face anti-spoofing. In order to study the ethnic bias for face anti-spoofing, we introduce the largest up to date CASIA-SURF Cross-ethnicity Face Anti-spoofing (CeFA) dataset (briefly named CeFA), covering $3$ ethnicities, $3$ modalities, $1,607$ subjects, and 2D plus 3D attack types. Four protocols are introduced to measure the affect under varied evaluation conditions, such as cross-ethnicity, unknown spoofs or both of them. To the best of our knowledge, CeFA is the first dataset including explicit ethnic labels in current published/released datasets for face anti-spoofing. Then, we propose a novel multi-modal fusion method as a strong baseline to alleviate these bias, namely, the static-dynamic fusion mechanism applied in each modality (i.e., RGB, Depth and infrared image). Later, a partially shared fusion strategy is proposed to learn complementary information from multiple modalities. Extensive experiments demonstrate that the proposed method achieves state-of-the-art results on the CASIA-SURF, OULU-NPU, SiW and the CeFA dataset.

preprint2020arXiv

Cross-ethnicity Face Anti-spoofing Recognition Challenge: A Review

Face anti-spoofing is critical to prevent face recognition systems from a security breach. The biometrics community has %possessed achieved impressive progress recently due the excellent performance of deep neural networks and the availability of large datasets. Although ethnic bias has been verified to severely affect the performance of face recognition systems, it still remains an open research problem in face anti-spoofing. Recently, a multi-ethnic face anti-spoofing dataset, CASIA-SURF CeFA, has been released with the goal of measuring the ethnic bias. It is the largest up to date cross-ethnicity face anti-spoofing dataset covering $3$ ethnicities, $3$ modalities, $1,607$ subjects, 2D plus 3D attack types, and the first dataset including explicit ethnic labels among the recently released datasets for face anti-spoofing. We organized the Chalearn Face Anti-spoofing Attack Detection Challenge which consists of single-modal (e.g., RGB) and multi-modal (e.g., RGB, Depth, Infrared (IR)) tracks around this novel resource to boost research aiming to alleviate the ethnic bias. Both tracks have attracted $340$ teams in the development stage, and finally 11 and 8 teams have submitted their codes in the single-modal and multi-modal face anti-spoofing recognition challenges, respectively. All the results were verified and re-ran by the organizing team, and the results were used for the final ranking. This paper presents an overview of the challenge, including its design, evaluation protocol and a summary of results. We analyze the top ranked solutions and draw conclusions derived from the competition. In addition we outline future work directions.

preprint2020arXiv

Hierarchical Context Enhanced Multi-Domain Dialogue System for Multi-domain Task Completion

Task 1 of the DSTC8-track1 challenge aims to develop an end-to-end multi-domain dialogue system to accomplish complex users' goals under tourist information desk settings. This paper describes our submitted solution, Hierarchical Context Enhanced Dialogue System (HCEDS), for this task. The main motivation of our system is to comprehensively explore the potential of hierarchical context for sufficiently understanding complex dialogues. More specifically, we apply BERT to capture token-level information and employ the attention mechanism to capture sentence-level information. The results listed in the leaderboard show that our system achieves first place in automatic evaluation and the second place in human evaluation.

preprint2020arXiv

Out-of-Distribution Detection for Skin Lesion Images with Deep Isolation Forest

In this paper, we study the problem of out-of-distribution detection in skin disease images. Publicly available medical datasets normally have a limited number of lesion classes (e.g. HAM10000 has 8 lesion classes). However, there exists a few thousands of clinically identified diseases. Hence, it is important if lesions not in the training data can be differentiated. Toward this goal, we propose DeepIF, a non-parametric Isolation Forest based approach combined with deep convolutional networks. We conduct comprehensive experiments to compare our DeepIF with three baseline models. Results demonstrate state-of-the-art performance of our proposed approach on the task of detecting abnormal skin lesions.

preprint2020arXiv

Parallel convolution processing using an integrated photonic tensor core

With the proliferation of ultra-high-speed mobile networks and internet-connected devices, along with the rise of artificial intelligence, the world is generating exponentially increasing amounts of data - data that needs to be processed in a fast, efficient and smart way. These developments are pushing the limits of existing computing paradigms, and highly parallelized, fast and scalable hardware concepts are becoming progressively more important. Here, we demonstrate a computational specific integrated photonic tensor core - the optical analog of an ASIC-capable of operating at Tera-Multiply-Accumulate per second (TMAC/s) speeds. The photonic core achieves parallelized photonic in-memory computing using phase-change memory arrays and photonic chip-based optical frequency combs (soliton microcombs). The computation is reduced to measuring the optical transmission of reconfigurable and non-resonant passive components and can operate at a bandwidth exceeding 14 GHz, limited only by the speed of the modulators and photodetectors. Given recent advances in hybrid integration of soliton microcombs at microwave line rates, ultra-low loss silicon nitride waveguides, and high speed on-chip detectors and modulators, our approach provides a path towards full CMOS wafer-scale integration of the photonic tensor core. While we focus on convolution processing, more generally our results indicate the major potential of integrated photonics for parallel, fast, and efficient computational hardware in demanding AI applications such as autonomous driving, live video processing, and next generation cloud computing services.

preprint2020arXiv

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Scene text image contains two levels of contents: visual texture and semantic information. Although the previous scene text recognition methods have made great progress over the past few years, the research on mining semantic information to assist text recognition attracts less attention, only RNN-like structures are explored to implicitly model semantic information. However, we observe that RNN based methods have some obvious shortcomings, such as time-dependent decoding manner and one-way serial transmission of semantic context, which greatly limit the help of semantic information and the computation efficiency. To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission. The state-of-the-art results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method. In addition, the speed of SRN has significant advantages over the RNN based methods, demonstrating its value in practical use.