Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2026arXiv

Video Active Perception: Effective Inference-Time Long-Form Video Understanding with Vision-Language Models

Large vision-language models (VLMs) have advanced multimodal tasks such as video question answering (QA). However, VLMs face the challenge of selecting frames effectively and efficiently, as standard uniform sampling is expensive and performance may plateau. Inspired by active perception theory, which posits that models gain information by acquiring data that differs from their expectations, we introduce Video Active Perception (VAP), a training-free method to enhance long-form video QA using VLMs. Our approach treats keyframe selection as data acquisition in active perception and leverages a lightweight text-conditioned video generation model to represent prior world knowledge. Empirically, VAP achieves state-of-the-art zero-shot results on long-form or reasoning video QA datasets such as EgoSchema, NExT-QA, ActivityNet-QA, IntentQA, and CLEVRER, achieving an increase of up to 5.6 x frame efficiency by frames per question over standard GPT-4o, Gemini 1.5 Pro, and LLaVA-OV. Moreover, VAP shows stronger reasoning abilities than previous methods and effectively selects keyframes relevant to questions. These findings highlight the potential of leveraging active perception to improve the frame effectiveness and efficiency of long-form video QA.

preprint2022arXiv

Adaptive Traffic Signal Control for Developing Countries Using Fused Parameters Derived from Crowd-Source Data

Advancement of mobile technologies has enabled economical collection, storage, processing, and sharing of traffic data. These data are made accessible to intended users through various application program interfaces (API) and can be used to recognize and mitigate congestion in real time. In this paper, quantitative (time of arrival) and qualitative (color-coded congestion levels) data were acquired from the Google traffic APIs. New parameters that reflect heterogeneous traffic conditions were defined and utilized for real-time control of traffic signals while maintaining the green-to-red time ratio. The proposed method utilizes a congestion-avoiding principle commonly used in computer networking. Adaptive congestion levels were observed on three different intersections of Delhi (India), in peak hours. It showed good variation, hence sensitive for the control algorithm to act efficiently. Also, simulation study establishes that proposed control algorithm decreases waiting time and congestion. The proposed method provides an inexpensive alternative for traffic sensing and tracking technologies.

preprint2022arXiv

Braitenberg Vehicles as Developmental Neurosimulation

Connecting brain and behavior is a longstanding issue in the areas of behavioral science, artificial intelligence, and neurobiology. As is standard among models of artificial and biological neural networks, an analogue of the fully mature brain is presented as a blank slate. However, this does not consider the realities of biological development and developmental learning. Our purpose is to model the development of an artificial organism that exhibits complex behaviors. We introduce three alternate approaches to demonstrate how developmental embodied agents can be implemented. The resulting developmental BVs (dBVs) will generate behaviors ranging from stimulus responses to group behavior that resembles collective motion. We will situate this work in the domain of artificial brain networks along with broader themes such as embodied cognition, feedback, and emergence. Our perspective is exemplified by three software instantiations that demonstrate how a BV-genetic algorithm hybrid model, multisensory Hebbian learning model, and multi-agent approaches can be used to approach BV development. We introduce use cases such as optimized spatial cognition (vehicle-genetic algorithm hybrid model), hinges connecting behavioral and neural models (multisensory Hebbian learning model), and cumulative classification (multi-agent approaches). In conclusion, we consider future applications of the developmental neurosimulation approach.

preprint2022arXiv

Diagonal State Spaces are as Effective as Structured State Spaces

Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In an exciting result, Gu et al. (ICLR 2022) proposed the $\textit{Structured State Space}$ (S4) architecture delivering large gains over state-of-the-art models on several long-range tasks across various modalities. The core proposition of S4 is the parameterization of state matrices via a diagonal plus low rank structure, allowing efficient computation. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal. Our $\textit{Diagonal State Space}$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement.

preprint2022arXiv

Long Range Language Modeling via Gated State Spaces

State space models have shown to be effective at modeling long range dependencies, specially on sequence classification tasks. In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. Based on recent developments around the effectiveness of gated activation functions, we propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4 (i.e. DSS) on TPUs, is fairly competitive with several well-tuned Transformer-based baselines and exhibits zero-shot generalization to longer inputs while being straightforward to implement. Finally, we show that leveraging self-attention to model local dependencies improves the performance of GSS even further.

preprint2022arXiv

Machine Learning-based Urban Canyon Path Loss Prediction using 28 GHz Manhattan Measurements

Large bandwidth at mm-wave is crucial for 5G and beyond but the high path loss (PL) requires highly accurate PL prediction for network planning and optimization. Statistical models with slope-intercept fit fall short in capturing large variations seen in urban canyons, whereas ray-tracing, capable of characterizing site-specific features, faces challenges in describing foliage and street clutter and associated reflection/diffraction ray calculation. Machine learning (ML) is promising but faces three key challenges in PL prediction: 1) insufficient measurement data; 2) lack of extrapolation to new streets; 3) overwhelmingly complex features/models. We propose an ML-based urban canyon PL prediction model based on extensive 28 GHz measurements from Manhattan where street clutters are modeled via a LiDAR point cloud dataset and buildings by a mesh-grid building dataset. We extract expert knowledge-driven street clutter features from the point cloud and aggressively compress 3D-building information using convolutional-autoencoder. Using a new street-by-street training and testing procedure to improve generalizability, the proposed model using both clutter and building features achieves a prediction error (RMSE) of $4.8 \pm 1.1$ dB compared to $10.6 \pm 4.4$ dB and $6.5 \pm 2.0$ dB for 3GPP LOS and slope-intercept prediction, respectively, where the standard deviation indicates street-by-street variation. By only using four most influential clutter features, RMSE of $5.5\pm 1.1$ dB is achieved.

preprint2022arXiv

On the Parameterization and Initialization of Diagonal State Space Models

State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpretable mathematical mechanism for modeling long dependencies, it introduces a custom representation and algorithm that can be difficult to implement. On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix. This work seeks to systematically understand how to parameterize and initialize such diagonal state space models. While it follows from classical results that almost all SSMs have an equivalent diagonal form, we show that the initialization is critical for performance. We explain why DSS works mathematically, by showing that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension. We also systematically describe various design choices in parameterizing and computing diagonal SSMs, and perform a controlled empirical study ablating the effects of these choices. Our final model S4D is a simple diagonal version of S4 whose kernel computation requires just 2 lines of code and performs comparably to S4 in almost all settings, with state-of-the-art results for image, audio, and medical time-series domains, and averaging 85\% on the Long Range Arena benchmark.

preprint2022arXiv

Stochastic filtering for multiscale stochastic reaction networks based on hybrid approximations

In the past few decades, the development of fluorescent technologies and microscopic techniques has greatly improved scientists' ability to observe real-time single-cell activities. In this paper, we consider the filtering problem associate with these advanced technologies, i.e., how to estimate latent dynamic states of an intracellular multiscale stochastic reaction network from time-course measurements of fluorescent reporters. A good solution to this problem can further improve scientists' ability to extract information about intracellular systems from time-course experiments. A straightforward approach to this filtering problem is to use a particle filter where particles are generated by simulation of the full model and weighted according to observations. However, the exact simulation of the full dynamic model usually takes an impractical amount of computational time and prevents this type of particle filters from being used for real-time applications, such as transcription regulation networks. Inspired by the recent development of hybrid approximations to multiscale chemical reaction networks, we approach the filtering problem in an alternative way. We first prove that accurate solutions to the filtering problem can be constructed by solving the filtering problem for a reduced model that represents the dynamics as a hybrid process. The model reduction is based on exploiting the time-scale separations in the original network and, therefore, can greatly reduce the computational effort required to simulate the dynamics. As a result, we are able to develop efficient particle filters to solve the filtering problem for the original model by applying particle filters to the reduced model. We illustrate the accuracy and the computational efficiency of our approach using several numerical examples.

preprint2020arXiv

Break It Down: A Question Understanding Benchmark

Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer. In this work, we introduce a Question Decomposition Meaning Representation (QDMR) for questions. QDMR constitutes the ordered list of steps, expressed through natural language, that are necessary for answering a question. We develop a crowdsourcing pipeline, showing that quality QDMRs can be annotated at scale, and release the Break dataset, containing over 83K pairs of questions and their QDMRs. We demonstrate the utility of QDMR by showing that (a) it can be used to improve open-domain question answering on the HotpotQA dataset, (b) it can be deterministically converted to a pseudo-SQL formal language, which can alleviate annotation in semantic parsing applications. Last, we use Break to train a sequence-to-sequence model with copying that parses questions into QDMR structures, and show that it substantially outperforms several natural baselines.

preprint2020arXiv

GMAT: Global Memory Augmentation for Transformers

Transformer-based models have become ubiquitous in natural language processing thanks to their large capacity, innate parallelism and high performance. The contextualizing component of a Transformer block is the $\textit{pairwise dot-product}$ attention that has a large $Ω(L^2)$ memory requirement for length $L$ sequences, limiting its ability to process long documents. This has been the subject of substantial interest recently, where multiple approximations were proposed to reduce the quadratic memory requirement using sparse attention matrices. In this work, we propose to augment sparse Transformer blocks with a dense attention-based $\textit{global memory}$ of length $M$ ($\ll L$) which provides an aggregate global view of the entire input sequence to each position. Our augmentation has a manageable $O(M\cdot(L+M))$ memory overhead, and can be seamlessly integrated with prior sparse solutions. Moreover, global memory can also be used for sequence compression, by representing a long input sequence with the memory representations only. We empirically show that our method leads to substantial improvement on a range of tasks, including (a) synthetic tasks that require global reasoning, (b) masked language modeling, and (c) reading comprehension.

preprint2020arXiv

HeartFit: An Accurate Platform for Heart Murmur Diagnosis Utilizing Deep Learning

Cardiovascular disease (CD) is the number one leading cause of death worldwide, accounting for more than 17 million deaths in 2015. Critical indicators of CD include heart murmurs, intense sounds emitted by the heart during periods of irregular blood flow. Current diagnosis of heart murmurs relies on echocardiography (ECHO), which costs thousands of dollars and medical professionals to analyze the results, making it very unsuitable for areas with inadequate medical facilities. Thus, there is a need for an accessible alternative. Based on a simple interface and deep learning, HeartFit allows users to administer diagnoses themselves. An inexpensive, custom designed stethoscope in conjunction with a mobile application allows users to record and upload audio of their heart to a database. Using a deep learning network architecture, the database classifies the audio and returns the diagnosis to the user. The model consists of a deep recurrent convolutional neural network trained on 300 prelabeled heartbeat audio samples. After the model was validated on a previously unseen set of 100 heartbeat audio samples, it achieved a f beta score of 0.9545 and an accuracy of 95.5 percent. This value exceeds that of clinical examination accuracy, which is around 83 percent to 91 percent and costs orders of magnitude less than ECHO, demonstrating the effectiveness of the HeartFit platform. Through the platform, users can obtain immediate, accurate diagnosis of heart murmurs without any professional medical assistance, revolutionizing how we combat CD.

preprint2020arXiv

Injecting Numerical Reasoning Skills into Language Models

Large pre-trained language models (LMs) are known to encode substantial amounts of linguistic information. However, high-level reasoning skills, such as numerical reasoning, are difficult to learn from a language-modeling objective only. Consequently, existing models for numerical reasoning have used specialized architectures with limited flexibility. In this work, we show that numerical reasoning is amenable to automatic data generation, and thus one can inject this skill into pre-trained LMs, by generating large amounts of data, and training in a multi-task setup. We show that pre-training our model, GenBERT, on this data, dramatically improves performance on DROP (49.3 $\rightarrow$ 72.3 F1), reaching performance that matches state-of-the-art models of comparable size, while using a simple and general-purpose encoder-decoder architecture. Moreover, GenBERT generalizes well to math word problem datasets, while maintaining high performance on standard RC tasks. Our approach provides a general recipe for injecting skills into large pre-trained LMs, whenever the skill is amenable to automatic data augmentation.

preprint2020arXiv

Stochastic filters based on hybrid approximations of multiscale stochastic reaction networks

We consider the problem of estimating the dynamic latent states of an intracellular multiscale stochastic reaction network from time-course measurements of fluorescent reporters. We first prove that accurate solutions to the filtering problem can be constructed by solving the filtering problem for a reduced model that represents the dynamics as a hybrid process. The model reduction is based on exploiting the time-scale separations in the original network, and it can greatly reduce the computational effort required to simulate the dynamics. This enables us to develop efficient particle filters to solve the filtering problem for the original model by applying particle filters to the reduced model. We illustrate the accuracy and the computational efficiency of our approach using a numerical example.

preprint2019arXiv

A study of topological structures on equi-continuous mappings

Function space topologies are developed for EC(Y,Z), the class of equi-continuous mappings from a topological space Y to a uniform space Z. Properties such as splittingness, admissibility etc. are defined for such spaces. The net theoretic investigations are carried out to provide characterizations of splittingness and admissibility of function spaces on EC(Y,Z). The open-entourage topology and point-transitive-entourage topology are shown to be admissible and splitting respectively. Dual topologies are defined. A topology on EC(Y,Z) is found to be admissible (resp. splitting) if and only if its dual is so.

preprint2019arXiv

User-Interactive Machine Learning Model for Identifying Structural Relationships of Code Features

Traditional machine learning based intelligent systems assist users by learning patterns in data and making recommendations. However, these systems are limited in that the user has little means of understanding the rationale behind the systems suggestions, communicating their own understanding of patterns, or correcting system behavior. In this project, we outline a model for intelligent software based on a human computer feedback loop. The Machine Learning (ML) systems recommendations are reviewed by the user, and in turn, this information shapes the systems decision making. Our model was applied to developing an HTML editor that integrates ML with user interaction to ascertain structural relationships between HTML document features and apply them for code completion. The editor utilizes the ID3 algorithm to build decision trees, sequences of rules for predicting code the user will type. The editor displays the decision trees rules in the Interactive Rules Interface System (IRIS), which allows developers to prioritize, modify, or delete them. These interactions alter the data processed by ID3, providing the developer some control over the autocomplete system. Validation indicates that, absent user interaction, the ML model is able to predict tags with 78.4 percent accuracy, attributes with 62.9 percent accuracy, and values with 12.8 percent accuracy. Based off of the results of the user study, user interaction with the rules interface corrects feature relationships missed or mistaken by the automated process, enhancing autocomplete accuracy and developer productivity. Additionally, interaction is proven to help developers work with greater awareness of code patterns. Our research demonstrates the viability of a software integration of machine intelligence with human feedback.

preprint2010arXiv

A Decentralized Approach for Service Discovery & Availability in P-Grids

The widespread emergence of the Internet as a platform for electronic data distribution and the advent of structured information have revolutionized our ability to deliver information to any corner of the world. Although Service Oriented Architecture (SOA) is a paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains and implemented using various technology stacks and every organization may not be geared up for this. To harness the various software / service resources placed on various systems, we have proposed and implemented a model that is able to establish discovery and sharing in load balanced P-grid environment. The experimental results show that the proposed approach has dramatically lowered the network traffic (nearly negligible), while achieving load balancing in P2P grid systems. Our model is able to support discovery and sharing of resources also.