In this project, we provide a framework/pipeline for high frequency trading using machine/deep learning techniques. More advanced feature engineering (with depth trade and quote data) and models (such as pre-trained models) can be applied in this framework.
- Extract trading signals from multi-level orderbook data
- Replicate well-designed high frequency trading (HFT) strategies using machine learning and deep learning techniques
The SGX FTSE CHINA A50 INDEX Futures (新加坡交易所FTSE中国A50指数期货) tick depth data are used.
We use level-3 deep orderbook data to develop trading signals, including Depth Ratio, Rise Ratio, and Orderbook Imbalance (OBI).
- Simple average depth ratio and OBI:
- Weighted average depth ratio, OBI, and rise ratio:
-
Models:
- RandomForestClassifier
- ExtraTreesClassifier
- AdaBoostClassifier
- GradientBoostingClassifier
- Support Vector Machines
- Other classifiers: Softmax, KNN, MLP, LSTM, etc.
-
Hyperparameters:
- Training window: 30min
- Test window: 10sec
- Prediction label: 15min forward
- Prediction accuracy:
- Prediction Accuracy Series:
- Cross Validation Mean Accuracy:
- Best Model:
Feature Engineering
There are tons of potential powerful signals if we have both the trade and quote data, such as:
- volume imbalance signal
- trade imbalance signal
- technical indicators of bid and ask series (RSI, MACD...)
- WAP/WPR, weighted average price
- volume imbalance signal
- .....
These signals can also generate derivative version using techniques such as:
- consider different weights on different level of orderbook data for a particular signal
- consider moving average with period n (hyperparameter)
- consider weighted average of signals, such as weighted average of trade imbalance and orderbook imbalance
- .....
Models
More advanced classifiers are definitely welcomed! Include but not limit to:
- CNN
- GRU/LSTM
- XGBoost, AdaBoost, GBDT, LightGBM
- Attention, Auto-encoder
- TabNet
- GNN
- Pre-trained models
- .....
Performance Metrics The performance metrics are subject to amendment, including the PnL calculation, commission fee consideration, etc.
There are tons of excellent features to be explored with trade data and depth ordebook data. So does the numerous powerful classifiers. In the Kaggle optiver volatility competition, the training data includes both trade and quote/orderbook, and it contains level-2 data. Many insightful feature engineering techniques and models can be discovered from the top solutions, which can also be applied in this framework.