Source author record

Matloob Khushi

Matloob Khushi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision q-fin.ST Computation and Language Computational Engineering, Finance, and Science Artificial Intelligence Databases eess.IV Machine Learning q-fin.PM

Catalog footprint

What is connected

9works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model

A user-generated text on social media enables health workers to keep track of information, identify possible outbreaks, forecast disease trends, monitor emergency cases, and ascertain disease awareness and response to official health correspondence. This exchange of health information on social media has been regarded as an attempt to enhance public health surveillance (PHS). Despite its potential, the technology is still in its early stages and is not ready for widespread application. Advancements in pretrained language models (PLMs) have facilitated the development of several domain-specific PLMs and a variety of downstream applications. However, there are no PLMs for social media tasks involving PHS. We present and release PHS-BERT, a transformer-based PLM, to identify tasks related to public health surveillance on social media. We compared and benchmarked the performance of PHS-BERT on 25 datasets from different social medial platforms related to 7 different PHS tasks. Compared with existing PLMs that are mainly evaluated on limited tasks, PHS-BERT achieved state-of-the-art performance on all 25 tested datasets, showing that our PLM is robust and generalizable in the common PHS tasks. By making PHS-BERT available, we aim to facilitate the community to reduce the computational cost and introduce new baselines for future works across various PHS-related tasks.

preprint2021arXiv

Event-Driven LSTM For Forex Price Prediction

The majority of studies in the field of AI guided financial trading focus on purely applying machine learning algorithms to continuous historical price and technical analysis data. However, due to non-stationary and high volatile nature of Forex market most algorithms fail when put into real practice. We developed novel event-driven features which indicate a change of trend in direction. We then build long deep learning models to predict a retracement point providing a perfect entry point to gain maximum profit. We use a simple recurrent neural network (RNN) as our baseline model and compared with short-term memory (LSTM), bidirectional long short-term memory (BiLSTM) and gated recurrent unit (GRU). Our experiment results show that the proposed event-driven feature selection together with the proposed models can form a robust prediction system which supports accurate trading strategies with minimal risk. Our best model on 15-minutes interval data for the EUR/GBP currency achieved RME 0.006x10^(-3) , RMSE 2.407x10^(-3), MAE 1.708x10^(-3), MAPE 0.194% outperforming previous studies.

preprint2021arXiv

Wavelet Denoised-ResNet CNN and LightGBM Method to Predict Forex Rate of Change

Foreign Exchange (Forex) is the largest financial market in the world. The daily trading volume of the Forex market is much higher than that of stock and futures markets. Therefore, it is of great significance for investors to establish a foreign exchange forecast model. In this paper, we propose a Wavelet Denoised-ResNet with LightGBM model to predict the rate of change of Forex price after five time intervals to allow enough time to execute trades. All the prices are denoised by wavelet transform, and a matrix of 30 time intervals is formed by calculating technical indicators. Image features are obtained by feeding the maxtrix into a ResNet. Finally, the technical indicators and image features are fed to LightGBM. Our experiments on 5-minutes USDJPY demonstrate that the model outperforms baseline modles with MAE: 0.240977x10EXP-3 MSE: 0.156x10EXP-6 and RMSE: 0.395185x10EXP-3. An accurate price prediction after 25 minutes in future provides a window of opportunity for hedge funds algorithm trading. The code is available from https://mkhushi.github.io/

preprint2020arXiv

Benchmarking database performance for genomic data

Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).

preprint2020arXiv

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) bioALBERT, an effective domain-specific language model trained on large-scale biomedical corpora designed to capture biomedical context-dependent NER. We adopted a self-supervised loss used in ALBERT that focuses on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction techniques to lower memory consumption and increase the training speed in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.

preprint2020arXiv

GA-MSSR: Genetic Algorithm Maximizing Sharpe and Sterling Ratio Method for RoboTrading

Foreign exchange is the largest financial market in the world, and it is also one of the most volatile markets. Technical analysis plays an important role in the forex market and trading algorithms are designed utilizing machine learning techniques. Most literature used historical price information and technical indicators for training. However, the noisy nature of the market affects the consistency and profitability of the algorithms. To address this problem, we designed trading rule features that are derived from technical indicators and trading rules. The parameters of technical indicators are optimized to maximize trading performance. We also proposed a novel cost function that computes the risk-adjusted return, Sharpe and Sterling Ratio (SSR), in an effort to reduce the variance and the magnitude of drawdowns. An automatic robotic trading (RoboTrading) strategy is designed with the proposed Genetic Algorithm Maximizing Sharpe and Sterling Ratio model (GA-MSSR) model. The experiment was conducted on intraday data of 6 major currency pairs from 2018 to 2019. The results consistently showed significant positive returns and the performance of the trading system is superior using the optimized rule-based features. The highest return obtained was 320% annually using 5-minute AUDUSD currency pair. Besides, the proposed model achieves the best performance on risk factors, including maximum drawdowns and variance in return, comparing to benchmark models. The code can be accessed at https://github.com/zzzac/rule-based-forextrading-system

preprint2020arXiv

Open source tools for management and archiving of digital microscopy data to allow integration with patient pathology and treatment information

Virtual microscopy includes digitisation of histology slides and the use of computer technologies for complex investigation of diseases such as cancer. However, automated image analysis, or website publishing of such digital images, is hampered by their large file sizes. We have developed two Java based open source tools: Snapshot Creator and NDPI-Splitter. Snapshot Creator converts a portion of a large digital slide into a desired quality JPEG image. The image is linked to the patients clinical and treatment information in a customised open source cancer data management software (Caisis) in use at the Australian Breast Cancer Tissue Bank (ABCTB) and then published on the ABCTB website www.abctb.org.au using Deep Zoom open source technology. Using the ABCTB online search engine, digital images can be searched by defining various criteria such as cancer type, or biomarkers expressed. NDPI-Splitter splits a large image file into smaller sections of TIFF images so that they can be easily analysed by image analysis software such as Metamorph or Matlab. NDPI-Splitter also has the capacity to filter out empty images. Snapshot Creator and NDPI-Splitter are novel open source Java tools. They convert digital slides into files of smaller size for further processing. In conjunction with other open source tools such as Deep Zoom and Caisis, this suite of tools is used for the management and archiving of digital microscopy images, enabling digitised images to be explored and zoomed online. Our online image repository also has the capacity to be used as a teaching resource. These tools also enable large files to be sectioned for image analysis.

preprint2020arXiv

Portfolio Optimization with 2D Relative-Attentional Gated Transformer

Portfolio optimization is one of the most attentive fields that have been researched with machine learning approaches. Many researchers attempted to solve this problem using deep reinforcement learning due to its efficient inherence that can handle the property of financial markets. However, most of them can hardly be applicable to real-world trading since they ignore or extremely simplify the realistic constraints of transaction costs. These constraints have a significantly negative impact on portfolio profitability. In our research, a conservative level of transaction fees and slippage are considered for the realistic experiment. To enhance the performance under those constraints, we propose a novel Deterministic Policy Gradient with 2D Relative-attentional Gated Transformer (DPGRGT) model. Applying learnable relative positional embeddings for the time and assets axes, the model better understands the peculiar structure of the financial data in the portfolio optimization domain. Also, gating layers and layer reordering are employed for stable convergence of Transformers in reinforcement learning. In our experiment using U.S. stock market data of 20 years, our model outperformed baseline models and demonstrated its effectiveness.

preprint2020arXiv

Wavelet Denoising and Attention-based RNN-ARIMA Model to Predict Forex Price

Every change of trend in the forex market presents a great opportunity as well as a risk for investors. Accurate forecasting of forex prices is a crucial element in any effective hedging or speculation strategy. However, the complex nature of the forex market makes the predicting problem challenging, which has prompted extensive research from various academic disciplines. In this paper, a novel approach that integrates the wavelet denoising, Attention-based Recurrent Neural Network (ARNN), and Autoregressive Integrated Moving Average (ARIMA) are proposed. Wavelet transform removes the noise from the time series to stabilize the data structure. ARNN model captures the robust and non-linear relationships in the sequence and ARIMA can well fit the linear correlation of the sequential information. By hybridization of the three models, the methodology is capable of modelling dynamic systems such as the forex market. Our experiments on USD/JPY five-minute data outperforms the baseline methods. Root-Mean-Squared-Error (RMSE) of the hybrid approach was found to be 1.65 with a directional accuracy of ~76%.

Matloob Khushi

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model

Event-Driven LSTM For Forex Price Prediction

Wavelet Denoised-ResNet CNN and LightGBM Method to Predict Forex Rate of Change

Benchmarking database performance for genomic data

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

GA-MSSR: Genetic Algorithm Maximizing Sharpe and Sterling Ratio Method for RoboTrading

Open source tools for management and archiving of digital microscopy data to allow integration with patient pathology and treatment information

Portfolio Optimization with 2D Relative-Attentional Gated Transformer

Wavelet Denoising and Attention-based RNN-ARIMA Model to Predict Forex Price