Source author record

Jiajia Zhang

Jiajia Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Applications Computer Science and Game Theory eess.SY math.CO math.ST Methodology Multiagent Systems Robotics Statistics Theory Systems and Control

Catalog footprint

What is connected

7works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

D2CFR: Minimize Counterfactual Regret with Deep Dueling Neural Network

Counterfactual Regret Minimization (CFR)} is the popular method for finding approximate Nash equilibrium in two-player zero-sum games with imperfect information. CFR solves games by travsersing the full game tree iteratively, which limits its scalability in larger games. When applying CFR to solve large-scale games in previously, large-scale games are abstracted into small-scale games firstly. Secondly, CFR is used to solve the abstract game. And finally, the solution strategy is mapped back to the original large-scale game. However, this process requires considerable expert knowledge, and the accuracy of abstraction is closely related to expert knowledge. In addition, the abstraction also loses certain information, which will eventually affect the accuracy of the solution strategy. Towards this problem, a recent method, \textit{Deep CFR} alleviates the need for abstraction and expert knowledge by applying deep neural networks directly to CFR in full games. In this paper, we introduces \textit{Neural Network Counterfactual Regret Minimization (NNCFR)}, an improved variant of \textit{Deep CFR} that has a faster convergence by constructing a dueling netwok as the value network. Moreover, an evaluation module is designed by combining the value network and Monte Carlo, which reduces the approximation error of the value network. In addition, a new loss function is designed in the procedure of training policy network in the proposed \textit{NNCFR}, which can be good to make the policy network more stable. The extensive experimental tests are conducted to show that the \textit{NNCFR} converges faster and performs more stable than \textit{Deep CFR}, and outperforms \textit{Deep CFR} with respect to exploitability and head-to-head performance on test games.

preprint2022arXiv

Design and experimental investigation of a vibro-impact self-propelled capsule robot with orientation control

This paper presents a novel design and experimental investigation for a self-propelled capsule robot that can be used for painless colonoscopy during a retrograde progression from the patient's rectum. The steerable robot is driven forward and backward via its internal vibration and impact with orientation control by using an electromagnetic actuator. The actuator contains four sets of coils and a shaft made by permanent magnet. The shaft can be excited linearly in a controllable and tilted angle, so guide the progression orientation of the robot. Two control strategies are studied in this work and compared via simulation and experiment. Extensive results are presented to demonstrate the progression efficiency of the robot and its potential for robotic colonoscopy.

preprint2022arXiv

Efficient Distributed Framework for Collaborative Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning for incomplete information environments has attracted extensive attention from researchers. However, due to the slow sample collection and poor sample exploration, there are still some problems in multi-agent reinforcement learning, such as unstable model iteration and low training efficiency. Moreover, most of the existing distributed framework are proposed for single-agent reinforcement learning and not suitable for multi-agent. In this paper, we design an distributed MARL framework based on the actor-work-learner architecture. In this framework, multiple asynchronous environment interaction modules can be deployed simultaneously, which greatly improves the sample collection speed and sample diversity. Meanwhile, to make full use of computing resources, we decouple the model iteration from environment interaction, and thus accelerate the policy iteration. Finally, we verified the effectiveness of propose framework in MaCA military simulation environment and the SMAC 3D realtime strategy gaming environment with imcomplete information characteristics.

preprint2022arXiv

Empirical likelihood inference for longitudinal data with covariate measurement errors: An application to the LEAN study

Measurement errors usually arise during the longitudinal data collection process. Ignoring the effects of measurement errors will lead to invalid estimates. The Lifestyle Education for Activity and Nutrition (LEAN) study was designed to assess the effectiveness of intervention for enhancing weight loss over nine months. The covariates systolic blood pressure (SBP) and diastolic blood pressure (DBP) were measured at baseline, month 4, and month 9. At each assessment time, there were two replicate measurements for SBP and DBP. The replicate measurement errors of SBP follow different distributions, as does DBP. To account for the distributional difference of replicate measurement errors, a new method for analyzing longitudinal data with replicate covariate measurement errors is developed based on the empirical likelihood method. The asymptotic properties of the proposed estimator are established under some regularity conditions. The confidence region for the parameters of interest can be constructed based on the chi-squared approximation without estimating the covariance matrix. Additionally, the proposed empirical likelihood estimator is asymptotically more efficient than the estimator of Lin et al. (2018). Extensive simulations demonstrate that the proposed method can eliminate the effects of measurement errors in the covariate and has a high estimation efficiency. The proposed method indicates the significant effect of the intervention on BMI in the LEAN study.

preprint2020arXiv

RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning

Counterfactual regret minimization (CFR) is a popular method to deal with decision-making problems of two-player zero-sum games with imperfect information. Unlike existing studies that mostly explore for solving larger scale problems or accelerating solution efficiency, we propose a framework, RLCFR, which aims at improving the generalization ability of the CFR method. In the RLCFR, the game strategy is solved by the CFR in a reinforcement learning framework. And the dynamic procedure of iterative interactive strategy updating is modeled as a Markov decision process (MDP). Our method, RLCFR, then learns a policy to select the appropriate way of regret updating in the process of iteration. In addition, a stepwise reward function is formulated to learn the action policy, which is proportional to how well the iteration strategy is at each step. Extensive experimental results on various games have shown that the generalization ability of our method is significantly improved compared with existing state-of-the-art methods.

preprint2015arXiv

Modeling county level breast cancer survival data using a covariate-adjusted frailty proportional hazards model

Understanding the factors that explain differences in survival times is an important issue for establishing policies to improve national health systems. Motivated by breast cancer data arising from the Surveillance Epidemiology and End Results program, we propose a covariate-adjusted proportional hazards frailty model for the analysis of clustered right-censored data. Rather than incorporating exchangeable frailties in the linear predictor of commonly-used survival models, we allow the frailty distribution to flexibly change with both continuous and categorical cluster-level covariates and model them using a dependent Bayesian nonparametric model. The resulting process is flexible and easy to fit using an existing R package. The application of the model to our motivating example showed that, contrary to intuition, those diagnosed during a period of time in the 1990s in more rural and less affluent Iowan counties survived breast cancer better. Additional analyses showed the opposite trend for earlier time windows. We conjecture that this anomaly has to be due to increased hormone replacement therapy treatments prescribed to more urban and affluent subpopulations.

preprint2012arXiv

On the spectral moments of trees with a given bipartition

For two given positive integers $p$ and $q$ with $p\leqslant q$, we denote $\mathscr{T}_n^{p, q}={T: T$ is a tree of order $n$ with a $(p, q)$-bipartition}. For a graph $G$ with $n$ vertices, let $A(G)$ be its adjacency matrix with eigenvalues $λ_1(G), λ_2(G), ..., λ_n(G)$ in non-increasing order. The number $S_k(G):=\sum_{i=1}^{n}λ_i^k(G)\,(k=0, 1, ..., n-1)$ is called the $k$th spectral moment of $G$. Let $S(G)=(S_0(G), S_1(G),..., S_{n-1}(G))$ be the sequence of spectral moments of $G$. For two graphs $G_1$ and $G_2$, one has $G_1\prec_s G_2$ if for some $k\in {1,2,...,n-1}$, $S_i(G_1)=S_i(G_2) (i=0,1,...,k-1)$ and $S_k(G_1)<S_k(G_2)$ holds. In this paper, the last four trees, in the $S$-order, among $\mathscr{T}_n^{p, q} (4\leqslant p\leqslant q)$ are characterized.

Jiajia Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

D2CFR: Minimize Counterfactual Regret with Deep Dueling Neural Network

Design and experimental investigation of a vibro-impact self-propelled capsule robot with orientation control

Efficient Distributed Framework for Collaborative Multi-Agent Reinforcement Learning

Empirical likelihood inference for longitudinal data with covariate measurement errors: An application to the LEAN study

RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning

Modeling county level breast cancer survival data using a covariate-adjusted frailty proportional hazards model

On the spectral moments of trees with a given bipartition