Researcher profile

Junpeng Wang

Junpeng Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

TabKDE: Simple and Scalable Tabular Data Generation with Kernel Density Estimates

Tabular data generation considers a large table with multiple columns -- each column comprised of numerical, categorical, or sometimes ordinal values. The goal is to produce new rows for the table that replicate the distribution of rows from the original data -- without just copying those initial rows. The last 4 years have seen enormous progress on this problem, mostly using computational expensive methods that employ one-hot encoding, VAEs, and diffusion. This paper describes a new approach to the problem of tabular data generation. By employing copula transformations and modeling the distribution as a kernel density estimate we can nearly match the accuracy and leakage-avoidance achievements of the previous methods, but with almost no training time. Our method is very scalable, and can be run on data sets orders of magnitude larger than prior state-of-the-art on a simple laptop. Moreover, because we employ kernel density estimates, we can store the model as a coreset of the original data -- we believe the first for generative modeling -- and as a result, require significantly less space as well. Our code is available here: \url{https://github.com/tabkde/tabkde-main}

preprint2022arXiv

3D-TSV: The 3D Trajectory-based Stress Visualizer

We present the 3D Trajectory-based Stress Visualizer (3D-TSV), a visual analysis tool for the exploration of the principal stress directions in 3D solids under load. 3D-TSV provides a modular and generic implementation of key algorithms required for a trajectory-based visual analysis of principal stress directions, including the automatic seeding of space-filling stress lines, their extraction using numerical schemes, their mapping to an effective renderable representation, and rendering options to convey structures with special mechanical properties. In the design of 3D-TSV, several perceptual challenges have been addressed when simultaneously visualizing three mutually orthogonal stress directions via lines. We present a novel algorithm for generating a space-filling and evenly spaced set of mutually orthogonal lines. The algorithm further considers the locations of lines to obtain a more regular appearance, and enables the extraction of a level-of-detail representation with adjustable sparseness of the trajectories along a certain stress direction. To convey ambiguities in the orientation of the principal stress directions, the user can select a combined visualization of two principal directions via oriented ribbons. Additional depth cues improve the perception of the spatial relationships between trajectories. 3D-TSV is accessible to end users via a C++- and OpenGL-based rendering frontend that is seamlessly connected to a MatLab-based extraction backend. The code (BSD license) of 3D-TSV as well as scripts to make ANSYS and ABAQUS simulation results accessible to the 3D-TSV backend are publicly available.

preprint2022arXiv

A Streamline-guided De-Homogenization Approach for Structural Design

We present a novel de-homogenization approach for efficient design of high-resolution load-bearing structures. The proposed approach builds upon a streamline-based parametrization of the design domain, using a set of space-filling and evenly-spaced streamlines in the two mutually orthogonal direction fields that are obtained from homogenization-based topology optimization. Streamlines in these fields are converted into a graph, which is then used to construct a quad-dominant mesh whose edges follow the direction fields. In addition, the edge width is adjusted according to the density and anisotropy of the optimized orthotropic cells. In a number of numerical examples, we demonstrate the mechanical performance and regular appearance of the resulting structural designs, and compare them with those from classic and contemporary approaches.

preprint2022arXiv

Embedding Compression with Hashing for Efficient Representation Learning in Large-Scale Graph

Graph neural networks (GNNs) are deep learning models designed specifically for graph data, and they typically rely on node features as the input to the first layer. When applying such a type of network on the graph without node features, one can extract simple graph-based node features (e.g., number of degrees) or learn the input node representations (i.e., embeddings) when training the network. While the latter approach, which trains node embeddings, more likely leads to better performance, the number of parameters associated with the embeddings grows linearly with the number of nodes. It is therefore impractical to train the input node embeddings together with GNNs within graphics processing unit (GPU) memory in an end-to-end fashion when dealing with industrial-scale graph data. Inspired by the embedding compression methods developed for natural language processing (NLP) tasks, we develop a node embedding compression method where each node is compactly represented with a bit vector instead of a floating-point vector. The parameters utilized in the compression method can be trained together with GNNs. We show that the proposed node embedding compression method achieves superior performance compared to the alternatives.

preprint2022arXiv

Learning-From-Disagreement: A Model Comparison and Visual Analytics Framework

With the fast-growing number of classification models being produced every day, numerous model interpretation and comparison solutions have also been introduced. For example, LIME and SHAP can interpret what input features contribute more to a classifier's output predictions. Different numerical metrics (e.g., accuracy) can be used to easily compare two classifiers. However, few works can interpret the contribution of a data feature to a classifier in comparison with its contribution to another classifier. This comparative interpretation can help to disclose the fundamental difference between two classifiers, select classifiers in different feature conditions, and better ensemble two classifiers. To accomplish it, we propose a learning-from-disagreement (LFD) framework to visually compare two classification models. Specifically, LFD identifies data instances with disagreed predictions from two compared classifiers and trains a discriminator to learn from the disagreed instances. As the two classifiers' training features may not be available, we train the discriminator through a set of meta-features proposed based on certain hypotheses of the classifiers to probe their behaviors. Interpreting the trained discriminator with the SHAP values of different meta-features, we provide actionable insights into the compared classifiers. Also, we introduce multiple metrics to profile the importance of meta-features from different perspectives. With these metrics, one can easily identify meta-features with the most complementary behaviors in two classifiers, and use them to better ensemble the classifiers. We focus on binary classification models in the financial services and advertising industry to demonstrate the efficacy of our proposed framework and visualizations.

preprint2021arXiv

Stress Topology Analysis for Porous Infill Optimization

The optimization of porous infill structures via local volume constraints has become a popular approach in topology optimization. In some design settings, however, the iterative optimization process converges only slowly, or not at all even after several hundreds or thousands of iterations. This leads to regions in which a distinct binary design is difficult to achieve. Interpreting intermediate density values by applying a threshold results in large solid or void regions, leading to sub-optimal structures. We find that this convergence issue relates to the topology of the stress tensor field that is simulated when applying the same external forces on the solid design domain. In particular, low convergence is observed in regions around so-called trisector degenerate points. Based on this observation, we propose an automatic initialization process that prescribes the topological skeleton of the stress field into the material field as solid simulation elements. These elements guide the material deposition around the degenerate points, but can also be remodelled or removed during the optimization. We demonstrate significantly improved convergence rates in a number of use cases with complex stress topologies. The improved convergence is demonstrated for infill optimization under homogeneous as well as spatially varying local volume constraints.

preprint2020arXiv

CNNPruner: Pruning Convolutional Neural Networks with Visual Analytics

Convolutional neural networks (CNNs) have demonstrated extraordinarily good performance in many computer vision tasks. The increasing size of CNN models, however, prevents them from being widely deployed to devices with limited computational resources, e.g., mobile/embedded devices. The emerging topic of model pruning strives to address this problem by removing less important neurons and fine-tuning the pruned networks to minimize the accuracy loss. Nevertheless, existing automated pruning solutions often rely on a numerical threshold of the pruning criteria, lacking the flexibility to optimally balance the trade-off between model size and accuracy. Moreover, the complicated interplay between the stages of neuron pruning and model fine-tuning makes this process opaque, and therefore becomes difficult to optimize. In this paper, we address these challenges through a visual analytics approach, named CNNPruner. It considers the importance of convolutional filters through both instability and sensitivity, and allows users to interactively create pruning plans according to a desired goal on model size or accuracy. Also, CNNPruner integrates state-of-the-art filter visualization techniques to help users understand the roles that different filters played and refine their pruning plans. Through comprehensive case studies on CNNs with real-world sizes, we validate the effectiveness of CNNPruner.

preprint2020arXiv

Multi-stream RNN for Merchant Transaction Prediction

Recently, digital payment systems have significantly changed people's lifestyles. New challenges have surfaced in monitoring and guaranteeing the integrity of payment processing systems. One important task is to predict the future transaction statistics of each merchant. These predictions can thus be used to steer other tasks, ranging from fraud detection to recommendation. This problem is challenging as we need to predict not only multivariate time series but also multi-steps into the future. In this work, we propose a multi-stream RNN model for multi-step merchant transaction predictions tailored to these requirements. The proposed multi-stream RNN summarizes transaction data in different granularity and makes predictions for multiple steps in the future. Our extensive experimental results have demonstrated that the proposed model is capable of outperforming existing state-of-the-art methods.