Researcher profile

Francis K. C. Hui

Francis K. C. Hui contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays

Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.

preprint2019arXiv

Symbolic Formulae for Linear Mixed Models

A statistical model is a mathematical representation of an often simplified or idealised data-generating process. In this paper, we focus on a particular type of statistical model, called linear mixed models (LMMs), that is widely used in many disciplines e.g.~agriculture, ecology, econometrics, psychology. Mixed models, also commonly known as multi-level, nested, hierarchical or panel data models, incorporate a combination of fixed and random effects, with LMMs being a special case. The inclusion of random effects in particular gives LMMs considerable flexibility in accounting for many types of complex correlated structures often found in data. This flexibility, however, has given rise to a number of ways by which an end-user can specify the precise form of the LMM that they wish to fit in statistical software. In this paper, we review the software design for specification of the LMM (and its special case, the linear model), focusing in particular on the use of high-level symbolic model formulae and two popular but contrasting R-packages in lme4 and asreml.

preprint2012arXiv

A Nonparametric Measure of Local Association for two-way Contingency Tables

In contingency table analysis, the odds ratio is a commonly applied measure used to summarize the degree of association between two categorical variables, say R and S. Suppose now that for each individual in the table, a vector of continuous variables X is also observed. It is then vital to analyze whether and how the degree of association varies with X. In this work, we extend the classical odds ratio to the conditional case, and develop nonparametric estimators of this "pointwise odds ratio" to summarize the strength of local association between R and S given X. To allow for maximum flexibility, we make this extension using kernel regression. We develop confidence intervals based on these nonparametric estimators. We demonstrate via simulation that our pointwise odds ratio estimators can outperform model-based counterparts from logistic regression and GAMs, without the need for a linearity or additivity assumption. Finally, we illustrate its application to a dataset of patients from an intensive care unit (ICU), offering a greater insight into how the association between survival of patients admitted for emergency versus elective reasons varies with the patients' ages.