In a previous post on Numerai, I have described very basic code to get into a world of machine learning competitions. This one will be a continuation, so if you haven’t read it I recommend to do it- here. In this post, we will add little more complexity to the whole process. We will split out 20% of training data as validation set so we can train different models and compare performance. And we will dive into deep neural nets as predicting model.
Ok, let’s do some machine learning…
Let’s start with importing what will be required, this step is similar to what we have done in the first model. Apart from Pandas, we import “StandardScaler” to preprocess data before feeding them into neural net. We will use “train_test_split” to split out 20% of data as a test set. “roc_auc_score” is a useful metric to check and compare performance of the model, we will also need neural net itself – that will be classifier from ‘scikit-neuralnetwork’ (sknn).
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from sknn.mlp import Classifier, Layer
As we have all required imports, we can load the data from csv(remember to update the system path to downloaded files):
train = pd.read_csv("/home/m/Numerai/numerai_datasets/numerai_training_data.csv") test = pd.read_csv("/home/m/Numerai/numerai_datasets/numerai_tournament_data.csv") sub = pd.read_csv("/home/m/Numerai/numerai_datasets/example_predictions.csv")
Some basic data manipulation required:
sub["t_id"]=test["t_id"] test.drop("t_id", axis=1,inplace=True) labels=train["target"] train.drop("target", axis=1,inplace=True) train=train.values labels=labels.values
In next four lines, we will do what is called standardization. The result of standardization (or Z-score normalization) is that the features will be rescaled so that they’ll have the properties of a standard normal distribution with μ=0 and σ=1.
scaler = StandardScaler() scaler.fit(train) train = scaler.transform(train) test = scaler.transform(test)
Next line of code will split original downloaded train set to train and test set, basically we set aside 20% of original train data to make sure we can check the out of the sample performance – to avoid overfitting.
X_train, X_test, y_train, y_test = train_test_split(train,labels, test_size=0.2, random_state=35)
Having all data preprocessed we are ready to define model, set number of layers in neural network, and a number of neurons in each layer. Below few lines of code to do it:
nn = Classifier( layers=[ Layer("Tanh", units=50), Layer("Tanh", units=200), Layer("Tanh", units=200), Layer("Tanh", units=50), Layer("Softmax")], learning_rule='adadelta', learning_rate=0.01, n_iter=5, verbose=1, loss_type='mcc')
“units=50” – states a number of neurons in each layer, number of neurons in first layer is determined by a number of features in data we will feed in.
“Tahn” – this is kind of activation function, you can use other as well eg. rectifier, expLin, sigmoid, or convolution. In last layer the activation function is Softmax – that’s usual output layer function for classification tasks. In our network we have five layers with a different number of neurons, there are no strict rules about number of neurons and layers so it is more art than science, you just need to try different versions and check what works best.
In our network we have five layers with a different number of neurons, there are no strict rules about a number of neurons and layers so it is more art than science, you just need to try different versions and check what works best. After layers we set learning rule to ‘adadelta’ again more choice available: sgd, momentum, nesterov, adagrad or rmsprop just try and check what works best.
“learning_rule=’adadelta'” – sets learning algorithm to ‘adadelta’, more choice available: sgd, momentum, nesterov, adagrad or rmsprop just try and check what works best, you can mix them for different layers.
“learning_rate=0.01” – learning rate, often as rule of thumb you start with ‘default’ value of 0.01, but other values can be used, mostly anything from 0.001 to 0.1.
“n_iter=5” – number of iterations ‘epochs’, the higher the number the longer process of learning will take, 5 is as example only, one need to look at error after each epoch, at some point it will stop dropping, I have seen anything from 50 to 5000 so feel free to play with it.
“verbose=1” – this parameter will let us see progress on screen.
“loss_type=’mcc’ ” – loss function, ‘mcc’ typical for classification tasks.
As the model is set, we can feed data and train it, depending on how powerful your pc is it can take from seconds to days. It is recommended to use GPU computing for neural networks training.
Below line validates the model against 20% of data we have set aside before.
print('Overall AUC:', roc_auc_score(y_test, nn.predict_proba(X_test)[:,1]))
Using above code we can play around with different settings and neural networks architectures, checking the performance. After finding the best settings, they can be applied for prediction to be uploaded to Numerai, just run last three lines(just remember to update system path to save the file):
y_pred = nn.predict_proba(test) sub["probability"]=y_pred[:,1] sub.to_csv("/home/m/Numerai/numerai_datasets/Prediction.csv", index=False)
I hope above text was useful and you can now start playing around with deep learning for trading predictions for Numerai. If you have any comments or questions please feel free to contact me.
Full code below:
import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score from sknn.mlp import Classifier, Layer train = pd.read_csv("/home/m/Numerai/numerai_datasets/numerai_training_data.csv") test = pd.read_csv("/home/m/Numerai/numerai_datasets/numerai_tournament_data.csv") sub = pd.read_csv("/home/m/Numerai/numerai_datasets/example_predictions.csv") sub["t_id"]=test["t_id"] test.drop("t_id", axis=1,inplace=True) labels=train["target"] train.drop("target", axis=1,inplace=True) train=train.values labels=labels.values scaler = StandardScaler() scaler.fit(train) train = scaler.transform(train) test = scaler.transform(test) X_train, X_test, y_train, y_test = train_test_split(train,labels, test_size=0.2, random_state=35) nn = Classifier( layers=[ Layer("Tanh", units=50), Layer("Tanh", units=200), Layer("Tanh", units=200), Layer("Tanh", units=50), Layer("Softmax")], learning_rule='adadelta', learning_rate=0.01, n_iter=5, verbose=1, loss_type='mcc') nn.fit(X_train, y_train) print('Overall AUC:', roc_auc_score(y_test, nn.predict_proba(X_test)[:,1])) y_pred = nn.predict_proba(test) sub["probability"]=y_pred[:,1] sub.to_csv("/home/m/Numerai/numerai_datasets/Prediction.csv", index=False)
Was the above useful? Please share with others on social media.
If you want to look for more information on Python or Trading, check online courses available at udemy.com.
Recommended reading list:
|Pairs Trading: Quantitative Methods and Analysis
The first in-depth analysis of pairs trading
Pairs trading is a market-neutral strategy in its most simple form. The strategy involves being long (or bullish) one asset and short (or bearish) another. If properly performed, the investor will gain if the market rises or falls. Pairs Trading reveals the secrets of this rigorous quantitative analysis program to provide individuals and investment houses with the tools they need to successfully implement and profit from this proven trading methodology. Pairs Trading contains specific and tested formulas for identifying and investing in pairs, and answers important questions such as what ratio should be used to construct the pairs properly.
Ganapathy Vidyamurthy (Stamford, CT) is currently a quantitative software analyst and developer at a major New York City hedge fund.
|Machine Trading: Deploying Computer Algorithms to Conquer the Markets (Wiley Trading)
Dive into algo trading with step-by-step tutorials and expert insight
Machine Trading is a practical guide to building your algorithmic trading business. Written by a recognized trader with major institution expertise, this book provides step-by-step instruction on quantitative trading and the latest technologies available even outside the Wall Street sphere. You'll discover the latest platforms that are becoming increasingly easy to use, gain access to new markets, and learn new quantitative strategies that are applicable to stocks, options, futures, currencies, and even bitcoins. The companion website provides downloadable software codes, and you'll learn to design your own proprietary tools using MATLAB. The author's experiences provide deep insight into both the business and human side of systematic trading and money management, and his evolution from proprietary trader to fund manager contains valuable lessons for investors at any level.
Algorithmic trading is booming, and the theories, tools, technologies, and the markets themselves are evolving at a rapid pace. This book gets you up to speed, and walks you through the process of developing your own proprietary trading operation using the latest tools.
Utilize the newer, easier algorithmic trading platforms
Access markets previously unavailable to systematic traders
Adopt new strategies for a variety of instruments
Gain expert perspective into the human side of trading
The strength of algorithmic trading is its versatility. It can be used in any strategy, including market-making, inter-market spreading, arbitrage, or pure speculation; decision-making and implementation can be augmented at any stage, or may operate completely automatically. Traders looking to step up their strategy need look no further than Machine Trading for clear instruction and expert solutions.
|Applied Quantitative Methods for Trading and Investment
This much-needed book, from a selection of top international experts, fills a gap by providing a manual of applied quantitative financial analysis. It focuses on advanced empirical methods for modelling financial markets in the context of practical financial applications.
Data, software and techniques specifically aligned to trading and investment will enable the reader to implement and interpret quantitative methodologies covering various models.
The unusually wide-ranging methodologies include not only the 'traditional' financial econometrics but also technical analysis systems and many nonparametric tools from the fields of data mining and artificial intelligence. However, for those readers wishing to skip the more theoretical developments, the practical application of even the most advanced techniques is made as accessible as possible.
The book will be read by quantitative analysts and traders, fund managers, risk managers; graduate students in finance and MBA courses.
|Quantitative Technical Analysis: An integrated approach to trading system development and trading management
This book, the fifth by Dr. Howard Bandy, discusses an integrated approach to trading system development and trading management.
It begins with a discussion and quantification of the several aspects of risk.
1. The trader's personal tolerance for risk.
2. The risk inherent in the price fluctuations of the issue to be traded.
3. The risk added by the trading system rules.
4. The trade-by-trade risk experienced during trading.
An original objective function, called "CAR25," based on risk-normalized profit potential is developed and explained. CAR25 is as near a universal objective function as I have found.
The importance of recognizing the non-stationary characteristics of financial data, and techniques for handling it, are discussed.
There is a general discussion of trading system development, including design, testing, backtesting, optimization, and walk forward analysis. That is followed by two parallel development paths -- one using traditional trading system development platform and the second machine learning.
Recognizing the importance of position sizing in managing trading, an original technique based on empirical Bayesian analysis, called "dynamic position sizing" and quantified in a metric called "safe-f," is introduced. Computer code implementing dynamic position sizing is included in the book.
56 fully disclosed, ready-to-run, and downloadable programs are included.
|Finding Alphas: A Quantitative Approach to Building Trading Strategies
Design more successful trading systems with this practical guide to identifying alphas
Finding Alphas seeks to teach you how to do one thing and do it well: design alphas. Written by experienced practitioners from WorldQuant, including its founder and CEO Igor Tulchinsky, this book provides detailed insight into the alchemic art of generating trading signals, and gives you access to the tools you need to practice and explore. Equally applicable across regions, this practical guide provides you with methods for uncovering the hidden signals in your data. A collection of essays provides diverse viewpoints to show the similarities, as well as unique approaches, to alpha design, covering a wide variety of topics, ranging from abstract theory to concrete technical aspects. You'll learn the dos and don'ts of information research, fundamental analysis, statistical arbitrage, alpha diversity, and more, and then delve into more advanced areas and more complex designs. The companion website, www.worldquantchallenge.com, features alpha examples with formulas and explanations. Further, this book also provides practical guidance for using WorldQuant's online simulation tool WebSim® to get hands-on practice in alpha design.
Alpha is an algorithm which trades financial securities. This book shows you the ins and outs of alpha design, with key insight from experienced practitioners.
Learn the seven habits of highly effective quants
Understand the key technical aspects of alpha design
Use WebSim® to experiment and create more successful alphas
Finding Alphas is the detailed, informative guide you need to start designing robust, successful alphas.
|Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading
New edition of book that demystifies quant and algo trading
In this updated edition of his bestselling book, Rishi K Narang offers in a straightforward, nontechnical style—supplemented by real-world examples and informative anecdotes—a reliable resource takes you on a detailed tour through the black box. He skillfully sheds light upon the work that quants do, lifting the veil of mystery around quantitative trading and allowing anyone interested in doing so to understand quants and their strategies. This new edition includes information on High Frequency Trading.
Offers an update on the bestselling book for explaining in non-mathematical terms what quant and algo trading are and how they work
Provides key information for investors to evaluate the best hedge fund investments
Explains how quant strategies fit into a portfolio, why they are valuable, and how to evaluate a quant manager
This new edition of Inside the Black Box explains quant investing without the jargon and goes a long way toward educating investment professionals.
|Automated Trading with R: Quantitative Research and Platform Development
Learn to trade algorithmically with your existing brokerage, from data management, to strategy optimization, to order execution, using free and publicly available data. Connect to your brokerage’s API, and the source code is plug-and-play.
Automated Trading with R explains automated trading, starting with its mathematics and moving to its computation and execution. You will gain a unique insight into the mechanics and computational considerations taken in building a back-tester, strategy optimizer, and fully functional trading platform.
The platform built in this book can serve as a complete replacement for commercially available platforms used by retail traders and small funds. Software components are strictly decoupled and easily scalable, providing opportunity to substitute any data source, trading algorithm, or brokerage. This book will:
Provide a flexible alternative to common strategy automation frameworks, like Tradestation, Metatrader, and CQG, to small funds and retail traders
Offer an understanding of the internal mechanisms of an automated trading system
Standardize discussion and notation of real-world strategy optimization problems
What You Will Learn
Understand machine-learning criteria for statistical validity in the context of time-series
Optimize strategies, generate real-time trading decisions, and minimize computation time while programming an automated strategy in R and using its package library
Best simulate strategy performance in its specific use case to derive accurate performance estimates
Understand critical real-world variables pertaining to portfolio management and performance assessment, including latency, drawdowns, varying trade size, portfolio growth, and penalization of unused capital
Who This Book Is For
Traders/practitioners at the retail or small fund level with at least an undergraduate background in finance or computer science; graduate level finance or data science students
|Quantitative Trading with R: Understanding Mathematical and Computational Tools from a Quant's Perspective
Quantitative Finance with R offers a winning strategy for devising expertly-crafted and workable trading models using the R open source programming language, providing readers with a step-by-step approach to understanding complex quantitative finance problems and building functional computer code.
|Quantitative Momentum: A Practitioner's Guide to Building a Momentum-Based Stock Selection System (Wiley Finance)
The individual investor's comprehensive guide to momentum investing
Quantitative Momentum brings momentum investing out of Wall Street and into the hands of individual investors. In his last book, Quantitative Value, author Wes Gray brought systematic value strategy from the hedge funds to the masses; in this book, he does the same for momentum investing, the system that has been shown to beat the market and regularly enriches the coffers of Wall Street's most sophisticated investors. First, you'll learn what momentum investing is not: it's not 'growth' investing, nor is it an esoteric academic concept. You may have seen it used for asset allocation, but this book details the ways in which momentum stands on its own as a stock selection strategy, and gives you the expert insight you need to make it work for you. You'll dig into its behavioral psychology roots, and discover the key tactics that are bringing both institutional and individual investors flocking into the momentum fold.
Systematic investment strategies always seem to look good on paper, but many fall down in practice. Momentum investing is one of the few systematic strategies with legs, withstanding the test of time and the rigor of academic investigation. This book provides invaluable guidance on constructing your own momentum strategy from the ground up.
Learn what momentum is and is not
Discover how momentum can beat the market
Take momentum beyond asset allocation into stock selection
Access the tools that ease DIY implementation
The large Wall Street hedge funds tend to portray themselves as the sophisticated elite, but momentum investing allows you to 'borrow' one of their top strategies to enrich your own portfolio. Quantitative Momentum is the individual investor's guide to boosting market success with a robust momentum strategy.
|Quantitative Trading: Algorithms, Analytics, Data, Models, Optimization
The first part of this book discusses institutions and mechanisms of algorithmic trading, market microstructure, high-frequency data and stylized facts, time and event aggregation, order book dynamics, trading strategies and algorithms, transaction costs, market impact and execution strategies, risk analysis, and management. The second part covers market impact models, network models, multi-asset trading, machine learning techniques, and nonlinear filtering. The third part discusses electronic market making, liquidity, systemic risk, recent developments and debates on the subject.