Source author record

Łukasz Kidziński

Łukasz Kidziński appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory Artificial Intelligence Computation Machine Learning Quantitative Methods

Catalog footprint

What is connected

6works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Curated AI beats frontier LLMs at pharma asset discovery

General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset -- an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotations -- against four frontier systems with web access (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, Perplexity sonar-pro) on ten niche oncology/immunology targets where most of the pipeline lives in the long tail of preclinical and Asian-developed assets. All five systems receive the same natural-language query and the same JSON output schema. Across 10 targets Gosset returns 3.2x more verified drugs per query than the best frontier system, at perfect precision and 100% recall against the cross-system union of verified drugs. The same curated index is exposed as a Gosset MCP server that any frontier model can call as a tool, suggesting that each of these systems can close most of the recall gap by swapping generic web search for a curated index behind the same chat interface.

preprint2022arXiv

Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays

Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.

preprint2016arXiv

Principal component analysis of periodically correlated functional time series

Within the framework of functional data analysis, we develop principal component analysis for periodically correlated time series of functions. We define the components of the above analysis including periodic, operator-valued filters, score processes and the inversion formulas. We show that these objects are defined via convergent series under a simple condition requiring summability of the Hilbert-Schmidt norms of the filter coefficients, and that they poses optimality properties. We explain how the Hilbert space theory reduces to an approximate finite-dimensional setting which is implemented in a custom build R package. A data example and a simulation study show that the new methodology is superior to existing tools if the functional time series exhibit periodic characteristics.

preprint2015arXiv

Dynamic Functional Principal Component

In this paper, we address the problem of dimension reduction for time series of functional data $(X_t\colon t\in\mathbb{Z})$. Such {\it functional time series} frequently arise, e.g., when a continuous-time process is segmented into some smaller natural units, such as days. Then each~$X_t$ represents one intraday curve. We argue that functional principal component analysis (FPCA), though a key technique in the field and a benchmark for any competitor, does not provide an adequate dimension reduction in a time-series setting. FPCA indeed is a {\it static} procedure which ignores the essential information provided by the serial dependence structure of the functional data under study. Therefore, inspired by Brillinger's theory of {\it dynamic principal components}, we propose a {\it dynamic} version of FPCA, which is based on a frequency-domain approach. By means of a simulation study and an empirical illustration, we show the considerable improvement the dynamic approach entails when compared to the usual static procedure.

preprint2015arXiv

Functional Time Series

The continuous advances in data collection and storage techniques allow us to observe and record real-life processes in great detail. Examples include financial transaction data, fMRI images, satellite photos, earths pollution distribution in time etc. Due to the high dimensionality of such data, classical statistical tools become inadequate and inefficient. The need for new methods emerges and one of the most prominent techniques in this context is functional data analysis (FDA). The main objective of this article is to present techniques of the analysis of temporal dependence in FDA. Such dependence occurs, for example, if the data consist of a continuous time process which has been cut into segments, days for instance. We are then in the context of so-called functional time series.

preprint2014arXiv

A note on estimation in Hilbertian linear models

We study estimation and prediction in linear models where the response and the regressor variable both take values in some Hilbert space. Our main objective is to obtain consistency of a principal components based estimator for the regression operator under minimal assumptions. In particular, we avoid some inconvenient technical restrictions that have been used throughout the literature. We develop our theory in a time dependent setup which comprises as important special case the autoregressive Hilbertian model.