Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
26works
0followers
21topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

26 published item(s)

preprint2026arXiv

BioHuman: Learning Biomechanical Human Representations from Video

Understanding human motion beyond surface kinematics is crucial for motion analysis, rehabilitation, and injury risk assessment. However, progress in this domain is limited by the lack of large-scale datasets with biomechanical annotations, and by existing approaches that cannot directly infer internal biomechanical states from visual observations. In this paper, we introduce a simulation-based framework for estimating muscle activations from existing motion capture datasets, resulting in BioHuman10M, a large-scale dataset with synchronized video, motion, and activations. Building on BioHuman10M, we propose BioHuman, an end-to-end model that takes monocular video as input and jointly predicts human motion and muscle activations, effectively bridging visual observations and internal biomechanical states. Extensive experiments demonstrate that BioHuman enables accurate reconstruction of both kinematic motion and muscle activity, and generalizes across diverse subjects and motions. We believe our approach establishes a new benchmark for video-based biomechanical understanding and opens up new possibilities for physically grounded human modeling.

preprint2025arXiv

OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions

Existing feedforward subject-driven video customization methods mainly study single-subject scenarios due to the difficulty of constructing multi-subject training data pairs. Another challenging problem that how to use the signals such as depth, mask, camera, and text prompts to control and edit the subject in the customized video is still less explored. In this paper, we first propose a data construction pipeline, VideoCus-Factory, to produce training data pairs for multi-subject customization from raw videos without labels and control signals such as depth-to-video and mask-to-video pairs. Based on our constructed data, we develop an Image-Video Transfer Mixed (IVTM) training with image editing data to enable instructive editing for the subject in the customized video. Then we propose a diffusion Transformer framework, OmniVCus, with two embedding mechanisms, Lottery Embedding (LE) and Temporally Aligned Embedding (TAE). LE enables inference with more subjects by using the training subjects to activate more frame embeddings. TAE encourages the generation process to extract guidance from temporally aligned control signals by assigning the same frame embeddings to the control and noise tokens. Experiments demonstrate that our method significantly surpasses state-of-the-art methods in both quantitative and qualitative evaluations. Video demos are at our project page: https://caiyuanhao1998.github.io/project/OmniVCus/. Our code, models, data are released at https://github.com/caiyuanhao1998/Open-OmniVCus

preprint2023arXiv

A Programmable Spatiotemporal Quantum Parametric Mode Sorter

We experimentally demonstrate a programmable parametric mode sorter of high-dimensional signals in a composite spatiotemporal Hilbert space through mode-selective quantum frequency up-conversion. As a concrete example and with quantum communication applications in mind, we consider the Laguerre-Gaussian and Hermite-Gaussian modes as the spatial and temporal state basis for the signals, respectively. By modulating the spatiotemporal profiles of the up-conversion pump, we demonstrate the faithful selection of single photons in those modes and their superposition modes. Our results show an improvement in the quantum mode-sorting performance by coupling the up-converted light into a single-mode fiber and/or operating the upconversion at the edge of phase matching. By optimizing pump temporal profiles only, we achieve more than 12 dB extinction for mutually unbiased basis (MUB) sets of the spatiotemporal modes. This fully programmable and efficient system could serve as a viable resource for quantum communications, quantum computation, and quantum metrology.

preprint2023arXiv

Metrics for Software Process Simulation Modeling

Background: Software Process Simulation (SPS) has become an effective tool for software process management and improvement. However, its adoption in industry is less than what the research community expected due to the burden of measurement cost and the high demand for domain knowledge. The difficulty of extracting appropriate metrics with real data from process enactment is one of the great challenges. Objective: We aim to provide evidence-based support of the process metrics for software process (simulation) modeling. Method: A systematic literature review was performed by extending our previous review series to draw a comprehensive understanding of the metrics for process modeling following a meta-model of ontology of metrics in SPS. Results: We identified 145 process modeling studies that collectively involve 2130 metrics and classified them using the coding technique. Two diagrams which illustrate the high frequency causal relationships used between metrics are proposed in terms of two hierarchical levels of modeling purposes. We revisited the data issues encountered in SPS data preparing phases, as well as identified the corresponding strategies. Conclusion: The results of this study provide process modelers with an evidence-based reference of the identification and the use of metrics in SPS modeling, and further contribute to the development of the body of knowledge on software metrics in the context of process modeling. Furthermore, this study is not limited to process simulation but can be extended to software process modeling, in general. Taking simulation metrics as standards and references can further motivate and guide software developers to improve the collection, governance, and application of process data in practice.

preprint2022arXiv

A Cross-Company Ethnographic Study on Software Teams for DevOps and Microservices: Organization, Benefits, and Issues

Context: DevOps and microservices are acknowledged to be important new paradigms to tackle contemporary software demands and provide capabilities for rapid and reliable software development. Industrial reports show that they are quickly adopted together in massive software companies. However, because of the technical and organizational requirements, many difficulties against efficient implementation of the both emerge in real software teams. Objectives: This study aims to discover the organization, benefits and issues of software teams using DevOps & microservices from an immersive perspective. Method: An ethnographic study was carried out in three companies with different business, size, products, customers, and degree of globalization. All the three companies claimed their adoption of DevOps and microservices. Seven months (cumulative) of participant observations and nine interviews with practitioners were conducted to collect the data of software teams related to DevOps and microservices. A cross-company empirical investigation using grounded theory was done by analyzing the archive data. Results: The adoption of DevOps and microservices brings benefits to rapid delivery, ability improvements and burden reduction, whilst the high cost and lack of practical guidance were emerged. Moreover, our observations and interviews reflect that in software teams, the relationship between DevOps and microservices is not significant, which differs from the relationship described in the previous studies. Four lessons for practitioners and four implications for researchers were discussed based on our findings. Conclusion: Our findings contribute to the understanding of the organization, benefits and issues of adopting DevOps and microservices from an immersive perspective of software teams.

preprint2022arXiv

A Flexible Diffusion Model

Diffusion (score-based) generative models have been widely used for modeling various types of complex data, including images, audios, and point clouds. Recently, the deep connection between forward-backward stochastic differential equations (SDEs) and diffusion-based models has been revealed, and several new variants of SDEs are proposed (e.g., sub-VP, critically-damped Langevin) along this line. Despite the empirical success of the hand-crafted fixed forward SDEs, a great quantity of proper forward SDEs remain unexplored. In this work, we propose a general framework for parameterizing the diffusion model, especially the spatial part of the forward SDE. An abstract formalism is introduced with theoretical guarantees, and its connection with previous diffusion models is leveraged. We demonstrate the theoretical advantage of our method from an optimization perspective. Numerical experiments on synthetic datasets, MINIST and CIFAR10 are also presented to validate the effectiveness of our framework.

preprint2022arXiv

A predictor-corrector deep learning algorithm for high dimensional stochastic partial differential equations

In this paper, we present a deep learning-based numerical method for approximating high dimensional stochastic partial differential equations (SPDEs). At each time step, our method relies on a predictor-corrector procedure. More precisely, we decompose the original SPDE into a degenerate SPDE and a deterministic PDE. Then in the prediction step, we solve the degenerate SPDE with the Euler scheme, while in the correction step we solve the second-order deterministic PDE by deep neural networks via its equivalent backward stochastic differential equation (BSDE). Under standard assumptions, error estimates and the rate of convergence of the proposed algorithm are presented. The efficiency and accuracy of the proposed algorithm are illustrated by numerical examples.

preprint2022arXiv

An Industrial Experience Report on Retro-inspection

To reinforce the quality of code delivery, especially to improve future coding quality, one global Information and Communication Technology (ICT) enterprise has institutionalized a retrospective style inspection (namely retro-inspection), which is similar to Fagan inspection but differs in terms of stage, participants, etc. This paper reports an industrial case study that aims to investigate the experiences and lessons from this software practice. To this end, we collected and analyzed various empirical evidence for data triangulation. The results reflect that retro-inspection distinguishes itself from peer code review by identifying more complicated and underlying defects, providing more indicative and suggestive comments. Many experienced inspectors indicate defects together with their rationale behind and offer suggestions for correction and prevention. As a result, retro-inspection can benefit not only quality assurance (like Fagan inspection), but also internal audit, inter-division communication, and competence promotion. On the other side, we identify several lessons of retro-inspection at this stage, e.g., developers' acceptance and organizers' predicament, for next-step improvement of this practice. To be specific, some recommendations are discussed for retro-inspection, e.g., more adequate preparation and more careful publicity. This study concludes that most of the expected benefits of retro-inspection can be empirically confirmed in this enterprise and its value on the progress to continuous maturity can be recognized organization-wide. The experiences on executing this altered practice in a large enterprise provide reference value on code quality assurance to other software organizations.

preprint2022arXiv

Automorphisms and representations of quasi Laurent polynomial algebras

We study automorphisms and representations of quasi polynomial algebras (QPAs) and quasi Laurent polynomial algebras (QLPAs). For any QLPA defined by an arbitrary skew symmetric integral matrix, we explicitly describe its automorphism groups at generic $q$ and at roots of unity. Any QLPA is isomorphic to the tensor product of copies of the QLPA of degree $2$ at different powers of $q$ and the centre, thus the study of representations of QPAs and QLPAs largely reduces to that of ${\mathcal L}_q(2)$ and ${\mathcal A}_q(2)$, the QLPA and QPA of degree $2$. We study a category of ${\mathcal A}_q(2)$-modules which have finite covers by submodules with natural local finiteness properties and satisfy some condition under localisation, determining its blocks, classifying the simple objects and providing two explicitly constructions for the simples. One construction produces the simple ${\mathcal A}_q(2)$-modules from ${\mathcal L}_q(2)$-modules via monomorphisms composed of the natural embedding of ${\mathcal A}_q(2)$ in ${\mathcal L}_q(2)$ and automorphisms of ${\mathcal L}_q(2)$, and the other explores a class of holonomic ${\mathcal D}_q$-modules for the algebra ${\mathcal D}_q$ of $q$-differential operators.

preprint2022arXiv

Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation

Deep image matting methods have achieved increasingly better results on benchmarks (e.g., Composition-1k/alphamatting.com). However, the robustness, including robustness to trimaps and generalization to images from different domains, is still under-explored. Although some works propose to either refine the trimaps or adapt the algorithms to real-world images via extra data augmentation, none of them has taken both into consideration, not to mention the significant performance deterioration on benchmarks while using those data augmentation. To fill this gap, we propose an image matting method which achieves higher robustness (RMat) via multilevel context assembling and strong data augmentation targeting matting. Specifically, we first build a strong matting framework by modeling ample global information with transformer blocks in the encoder, and focusing on details in combination with convolution layers as well as a low-level feature assembling attention block in the decoder. Then, based on this strong baseline, we analyze current data augmentation and explore simple but effective strong data augmentation to boost the baseline model and contribute a more generalizable matting method. Compared with previous methods, the proposed method not only achieves state-of-the-art results on the Composition-1k benchmark (11% improvement on SAD and 27% improvement on Grad) with smaller model size, but also shows more robust generalization results on other benchmarks, on real-world images, and also on varying coarse-to-fine trimaps with our extensive experiments.

preprint2022arXiv

Controllable Shadow Generation Using Pixel Height Maps

Shadows are essential for realistic image compositing. Physics-based shadow rendering methods require 3D geometries, which are not always available. Deep learning-based shadow synthesis methods learn a mapping from the light information to an object's shadow without explicitly modeling the shadow geometry. Still, they lack control and are prone to visual artifacts. We introduce pixel heigh, a novel geometry representation that encodes the correlations between objects, ground, and camera pose. The pixel height can be calculated from 3D geometries, manually annotated on 2D images, and can also be predicted from a single-view RGB image by a supervised approach. It can be used to calculate hard shadows in a 2D image based on the projective geometry, providing precise control of the shadows' direction and shape. Furthermore, we propose a data-driven soft shadow generator to apply softness to a hard shadow based on a softness input parameter. Qualitative and quantitative evaluations demonstrate that the proposed pixel height significantly improves the quality of the shadow generation while allowing for controllability.

preprint2022arXiv

DoubleField: Bridging the Neural Surface and Radiance Fields for High-fidelity Human Reconstruction and Rendering

We introduce DoubleField, a novel framework combining the merits of both surface field and radiance field for high-fidelity human reconstruction and rendering. Within DoubleField, the surface field and radiance field are associated together by a shared feature embedding and a surface-guided sampling strategy. Moreover, a view-to-view transformer is introduced to fuse multi-view features and learn view-dependent features directly from high-resolution inputs. With the modeling power of DoubleField and the view-to-view transformer, our method significantly improves the reconstruction quality of both geometry and appearance, while supporting direct inference, scene-specific high-resolution finetuning, and fast rendering. The efficacy of DoubleField is validated by the quantitative evaluations on several datasets and the qualitative results in a real-world sparse multi-view system, showing its superior capability for high-quality human model reconstruction and photo-realistic free-viewpoint human rendering. Data and source code will be made public for the research purpose. Please refer to our project page: http://www.liuyebin.com/dbfield/dbfield.html.

preprint2022arXiv

HVTR: Hybrid Volumetric-Textural Rendering for Human Avatars

We propose a novel neural rendering pipeline, Hybrid Volumetric-Textural Rendering (HVTR), which synthesizes virtual human avatars from arbitrary poses efficiently and at high quality. First, we learn to encode articulated human motions on a dense UV manifold of the human body surface. To handle complicated motions (e.g., self-occlusions), we then leverage the encoded information on the UV manifold to construct a 3D volumetric representation based on a dynamic pose-conditioned neural radiance field. While this allows us to represent 3D geometry with changing topology, volumetric rendering is computationally heavy. Hence we employ only a rough volumetric representation using a pose-conditioned downsampled neural radiance field (PD-NeRF), which we can render efficiently at low resolutions. In addition, we learn 2D textural features that are fused with rendered volumetric features in image space. The key advantage of our approach is that we can then convert the fused features into a high-resolution, high-quality avatar by a fast GAN-based textural renderer. We demonstrate that hybrid rendering enables HVTR to handle complicated motions, render high-quality avatars under user-controlled poses/shapes and even loose clothing, and most importantly, be efficient at inference time. Our experimental results also demonstrate state-of-the-art quantitative results.

preprint2022arXiv

Interactive Portrait Harmonization

Current image harmonization methods consider the entire background as the guidance for harmonization. However, this may limit the capability for user to choose any specific object/person in the background to guide the harmonization. To enable flexible interaction between user and harmonization, we introduce interactive harmonization, a new setting where the harmonization is performed with respect to a selected \emph{region} in the reference image instead of the entire background. A new flexible framework that allows users to pick certain regions of the background image and use it to guide the harmonization is proposed. Inspired by professional portrait harmonization users, we also introduce a new luminance matching loss to optimally match the color/luminance conditions between the composite foreground and select reference region. This framework provides more control to the image harmonization pipeline achieving visually pleasing portrait edits. Furthermore, we also introduce a new dataset carefully curated for validating portrait harmonization. Extensive experiments on both synthetic and real-world datasets show that the proposed approach is efficient and robust compared to previous harmonization baselines, especially for portraits. Project Webpage at \href{https://jeya-maria-jose.github.io/IPH-web/}{https://jeya-maria-jose.github.io/IPH-web/}

preprint2022arXiv

Leveraging Cross Feedback of User and Item Embeddings with Attention for Variational Autoencoder based Collaborative Filtering

Matrix factorization (MF) has been widely applied to collaborative filtering in recommendation systems. Its Bayesian variants can derive posterior distributions of user and item embeddings, and are more robust to sparse ratings. However, the Bayesian methods are restricted by their update rules for the posterior parameters due to the conjugacy of the priors and the likelihood. Variational autoencoders (VAE) can address this issue by capturing complex mappings between the posterior parameters and the data. However, current research on VAEs for collaborative filtering only considers the mappings based on the explicit data information while the implicit embedding information is overlooked. In this paper, we first derive evidence lower bounds (ELBO) for Bayesian MF models from two viewpoints: user-oriented and item-oriented. Based on the ELBOs, we propose a VAE-based Bayesian MF framework. It leverages not only the data but also the embedding information to approximate the user-item joint distribution. As suggested by the ELBOs, the approximation is iterative with cross feedback of user and item embeddings into each other's encoders. More specifically, user embeddings sampled at the previous iteration are fed to the item-side encoders to estimate the posterior parameters for the item embeddings at the current iteration, and vice versa. The estimation also attends to the cross-fed embeddings to further exploit useful information. The decoder then reconstructs the data via the matrix factorization over the currently re-sampled user and item embeddings.

preprint2022arXiv

Representations of Quantum Coordinate Algebras at Generic $q$ and Wiring Diagrams

This paper is devoted to the representation theory of quantum coordinate algebra $\mathbb{C}_q[G]$, for a semisimple Lie group $G$ and a generic parameter $q$. By inspecting the actions of normal elements on tensor modules, we generalize a result of Levendorski and Soibelman in [22] for highest weight modules. For a double Bruhat cell $G^{w_1,w_2}$, we describe the primitive spectra $\mathrm{prim}\,\mathbb{C}_q[G]_{w_1,w_2}$ in a new fashion, and construct a bundle of $(w_1,w_2)$ type simple modules onto $\mathrm{prim}\,\mathbb{C}_q[G]_{w_1,w_2}$, provided $\mathrm{Supp}(w_1)\cap\mathrm{Supp}(w_2)=\varnothing$ or enough pivot elements. The fibers of the bundle are shown to be products of the spectrums of simple modules of 2-dimensional quantum torus $L_q(2)$. As an application of our theory, we deduce an equivalent condition for the tensor module to be simple, and construct some simple modules for each primitive ideal when $G=SL_3(\mathbb{C})$. This completes the Dixmier's program for $\mathbb{C}_q[SL_3]$. The wiring diagrams, introduced by Fomin and Zelevinsky in their study of total positivity (cf. [3,9]), is the main tool to compute the action of generalized quantum minors on tensor modules in the type A case. We obtain a quantum version of Lindström's lemma, which plays an important role in transforming representation problems into combinatorial ones of wiring diagrams.

preprint2022arXiv

Threshold solutions for nonlocal reaction diffusion equations

We study the Cauchy problem for nonlocal reaction diffusion equations with bistable nonlinearity in 1D spatial domain and investigate the asymptotic behaviors of solutions with a one-parameter family of monotonically increasing and compactly supported initial data. We show that for small values of the parameter the corresponding solutions decay to 0, while for large values the related solutions converge to 1 uniformly on compacts. Moreover, we prove that the transition from extinction (converging to 0) to propagation (converging to 1) is sharp. Numerical results are provided to verify the theoretical results.

preprint2021arXiv

A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN

In autonomous driving, perceiving the driving behaviors of surrounding agents is important for the ego-vehicle to make a reasonable decision. In this paper, we propose a neural network model based on trajectories information for driving behavior recognition. Unlike existing trajectory-based methods that recognize the driving behavior using the hand-crafted features or directly encoding the trajectory, our model involves a Multi-Scale Convolutional Neural Network (MSCNN) module to automatically extract the high-level features which are supposed to encode the rich spatial and temporal information. Given a trajectory sequence of an agent as the input, firstly, the Bi-directional Long Short Term Memory (Bi-LSTM) module and the MSCNN module respectively process the input, generating two features, and then the two features are fused to classify the behavior of the agent. We evaluate the proposed model on the public BLVD dataset, achieving a satisfying performance.

preprint2020arXiv

Single-Pixel Pattern Recognition with Coherent Nonlinear Optics

We propose and experimentally demonstrate a nonlinear-optics approach to pattern recognition with single-pixel imaging and deep neural network. It employs mode selective image up-conversion to project a raw image onto a set of coherent spatial modes, whereby its signature features are extracted nonlinear-optically. With 40 projection modes, the classification accuracy reaches a high value of 99.49% for the MNIST handwritten digit images, and up to 95.32% even when they are mixed with strong noise. Our experiment harnesses rich coherent processes in nonlinear optics for efficient machine learning, with potential applications in online classification of large size images, fast lidar data analyses, complex pattern recognition, and so on.

preprint2020arXiv

SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression

Obtaining training data for multi-document summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains. In this paper, we propose SummPip: an unsupervised method for multi-document summarization, in which we convert the original documents to a sentence graph, taking both linguistic and deep representation into account, then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary. Experiments on Multi-News and DUC-2004 datasets show that our method is competitive to previous unsupervised methods and is even comparable to the neural supervised approaches. In addition, human evaluation shows our system produces consistent and complete summaries compared to human written ones.

preprint2020arXiv

Synergy between Machine/Deep Learning and Software Engineering: How Far Are We?

Since 2009, the deep learning revolution, which was triggered by the introduction of ImageNet, has stimulated the synergy between Machine Learning (ML)/Deep Learning (DL) and Software Engineering (SE). Meanwhile, critical reviews have emerged that suggest that ML/DL should be used cautiously. To improve the quality (especially the applicability and generalizability) of ML/DL-related SE studies, and to stimulate and enhance future collaborations between SE/AI researchers and industry practitioners, we conducted a 10-year Systematic Literature Review (SLR) on 906 ML/DL-related SE papers published between 2009 and 2018. Our trend analysis demonstrated the mutual impacts that ML/DL and SE have had on each other. At the same time, however, we also observed a paucity of replicable and reproducible ML/DL-related SE studies and identified five factors that influence their replicability and reproducibility. To improve the applicability and generalizability of research results, we analyzed what ingredients in a study would facilitate an understanding of why a ML/DL technique was selected for a specific SE problem. In addition, we identified the unique trends of impacts of DL models on SE tasks, as well as five unique challenges that needed to be met in order to better leverage DL to improve the productivity of SE tasks. Finally, we outlined a road-map that we believe can facilitate the transfer of ML/DL-based SE research results into real-world industry practices.

preprint2020arXiv

ThreshKnot: Thresholded ProbKnot for Improved RNA Secondary Structure Prediction

RNA structure prediction is a challenging problem, especially with pseudoknots. Recently, there has been a shift from the classical minimum free energy-based methods (MFE) to partition function-based ones that assemble structures using base-pairing probabilities. Two examples of the latter group are the popular maximum expected accuracy (MEA) method and the ProbKnot method. ProbKnot is a fast heuristic that pairs nucleotides that are reciprocally most probable pairing partners, and unlike MEA, can also predict structures with pseudoknots. However, ProbKnot's full potential has been largely overlooked. In particular, when introduced, it did not have an MEA-like hyperparameter that can balance between positive predictive value (PPV) and sensitivity. We show that a simple thresholded version of ProbKnot, which we call ThreshKnot, leads to more accurate overall predictions by filtering out unlikely pairs whose probabilities fall under a given threshold. We also show that on three widely-used folding engines (RNAstructure, Vienna RNAfold, and CONTRAfold), ThreshKnot always outperforms the much more involved MEA algorithm in (1) its higher structure prediction accuracy, (2) its capability to predict pseudoknots, and (3) its faster runtime and easier implementation. This suggests that ThreshKnot should replace MEA as the default partition function-based structure prediction algorithm. ThreshKnot is already available in the widely used RNAstructure software package version 6.2 (released November 27, 2019): https://rna.urmc.rochester.edu/RNAstructure.html

preprint2020arXiv

Traffic Agent Trajectory Prediction Using Social Convolution and Attention Mechanism

The trajectory prediction is significant for the decision-making of autonomous driving vehicles. In this paper, we propose a model to predict the trajectories of target agents around an autonomous vehicle. The main idea of our method is considering the history trajectories of the target agent and the influence of surrounding agents on the target agent. To this end, we encode the target agent history trajectories as an attention mask and construct a social map to encode the interactive relationship between the target agent and its surrounding agents. Given a trajectory sequence, the LSTM networks are firstly utilized to extract the features for all agents, based on which the attention mask and social map are formed. Then, the attention mask and social map are fused to get the fusion feature map, which is processed by the social convolution to obtain a fusion feature representation. Finally, this fusion feature is taken as the input of a variable-length LSTM to predict the trajectory of the target agent. We note that the variable-length LSTM enables our model to handle the case that the number of agents in the sensing scope is highly dynamic in traffic scenes. To verify the effectiveness of our method, we widely compare with several methods on a public dataset, achieving a 20% error decrease. In addition, the model satisfies the real-time requirement with the 32 fps.

preprint2020arXiv

Wide-field, high-resolution lensless on-chip microscopy via near-field blind ptychographic modulation

We report a novel lensless on-chip microscopy platform based on near-field blind ptychographic modulation. In this platform, we place a thin diffuser in between the object and the image sensor for light wave modulation. By blindly scanning the unknown diffuser to different x-y positions, we acquire a sequence of modulated intensity images for quantitative object recovery. Different from previous ptychographic implementations, we employ a unit magnification configuration with a Fresnel number of ~50,000, which is orders of magnitude higher than previous ptychographic setups. The unit magnification configuration allows us to have the entire sensor area, 6.4 mm by 4.6 mm, as the imaging field of view. The ultra-high Fresnel number enables us to directly recover the positional shift of the diffuser in the phase retrieval process, addressing the positioning accuracy issue plagued in regular ptychographic experiments. In our implementation, we use a low-cost, DIY scanning stage to perform blind diffuser modulation. Precise mechanical scanning that is critical in conventional ptychography experiments is no longer needed in our setup. We further employ an up-sampling phase retrieval scheme to bypass the resolution limit set by the imager pixel size and demonstrate a half-pitch resolution of 0.78 micron. We validate the imaging performance via in vitro cell cultures, transparent and stained tissue sections, and a thick biological sample. We show that the recovered quantitative phase map can be used to perform effective cell segmentation of the dense yeast culture. We also demonstrate 3D digital refocusing of the thick biological sample based on the recovered wavefront. The reported platform provides a cost-effective and turnkey solution for large field-of-view, high-resolution, and quantitative on-chip microscopy.

preprint2019arXiv

LinearFold: linear-time approximate RNA folding by 5'-to-3' dynamic programming and beam search

Motivation: Predicting the secondary structure of an RNA sequence is useful in many applications. Existing algorithms (based on dynamic programming) suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Results: We present a novel alternative $O(n^3)$-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in $O(n)$ time and $O(n)$ space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5'-to-3') direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models. Availability: Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100,000nt).

preprint2019arXiv

Polarimetric Thermal to Visible Face Verification via Attribute Preserved Synthesis

Thermal to visible face verification is a challenging problem due to the large domain discrepancy between the modalities. Existing approaches either attempt to synthesize visible faces from thermal faces or extract robust features from these modalities for cross-modal matching. In this paper, we take a different approach in which we make use of the attributes extracted from the visible image to synthesize the attribute-preserved visible image from the input thermal image for cross-modal matching. A pre-trained VGG-Face network is used to extract the attributes from the visible image. Then, a novel Attribute Preserved Generative Adversarial Network (AP-GAN) is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. Finally, a deep network is used to extract features from the synthesized image and the input visible image for verification. Extensive experiments on the ARL Polarimetric face dataset show that the proposed method achieves significant improvements over the state-of-the-art methods.