In this post, I want to share, how simple it is to start competing in machine learning tournaments – Numerai. I will go step by step, line by line explaining what is doing what and why it is required.
Numerai is a global artificial intelligence competition to predict the behavior. Numerai is a little bit similar to Kaggle but with clean datasets, so we can pass over long data cleansing process. You just download the data, build a model, and upload your predictions, that’s it. To extract most of the data you would initially do some feature engineering, but for simplicity of this intro, we will pass this bit over. One more thing we will pass on is splitting out validation set, the main aim of this exercise is to fit ‘machine learning’ model to training dataset. Later using fitted model, generate a prediction. All together it shouldn’t take more than 14 simple lines of python code, you can run them as one piece or run part by part in interactive mode.
Let’s go, let’s do some machine learning…
A first thing to do is to go to numer.ai, click on ‘Download Training Data’ and download datasets, after unzipping the archive, you will have few files in there, we are interested mainly in three of them. It is worth noting what is a path to the folder as we will need it later.
I assume you have installed python and required libraries, if not there is plenty of online tutorials on how to do it, I recommend installing Anaconda distribution. It it time to open whatever IDE you use, and start coding, first few lines will be just importing what we will use later, that is Pandas and ScikitLearn.
import pandas as pd from sklearn.ensemble import GradientBoostingClassifier
Pandas is used to import data from csv files and do some basic data manipulations, GradientBoostingClassifier as part of ScikitLearn will be the model we will use to fit and do predict. As we have required libraries imported let’s use them… in next three lines, we will import data from csv to memory. We will use ‘read_csv’ method from pandas, all you need to do is amend the full path to each file, wherever you have extracted numerai_datasets.zip.
train = pd.read_csv("/home/m/Numerai/numerai_datasets/numerai_training_data.csv") test = pd.read_csv("/home/m/Numerai/numerai_datasets/numerai_tournament_data.csv") sub = pd.read_csv("/home/m/Numerai/numerai_datasets/example_predictions.csv")
What above code does it creates three data frames and imports the csv files we have we have previously extracted from downloaded numerai_datasets.zip.
‘train’ – this dataset contains all required data to train our model, so it has both ‘features’ and ‘labels’, so you can say it has both questions and answers that our model will ‘learn’
‘test’ – this one contains features but does not contain ‘labels’, you can say it contains questions and our model will deliver answers.
‘sub’ – it is just template for uploading our prediction
Let’s move on, in next line will copy all unique row id’s from ‘test’ to ‘sub’ to make sure each predicted value will be assigned to a right set of features, let’s say we put question number next to our answer so whoever checks the test would now.
As we have copied the ids to ‘sub’, we don’t need them anymore in ‘test’ (all rows will stay in same order), so we can get rid of them.
In next two lines, we will separate ‘labels’ or target values from train dataset.
labels=train["target"] train.drop("target", axis=1,inplace=True)
As we have prepared ‘train’ dataset, we can get our model to learn from it. First, we select model we want to use, it will be Gradient BoostingClassifier from ScikitLearn – no specific reason for using this one, you can use whatever you like eg. random forest, linear regression…
grd = GradientBoostingClassifier()
As we have a model defined, let’s have it learn from ‘train’ data.
Ok, now our model is well trained and ready to make predictions, as the task is called ‘classification’ we will predict what is a probability of each set of features belongs to one of two classes ‘0’ or ‘1’.
y_pred = grd.predict_proba(test)
We have a long list of predicted probabilities called ‘y_pred’, let’s attach it to ‘id’ we had separated previously.
And save it in csv format, to get uploaded.
The last thing to do is go back to numer.ai website and click on ‘Upload Predictions’… Good luck.
This was very simplistic and introductory example to start playing with numer.ai competitions and machine learning. I will try and come back with gradually more complicated versions, if you have any questions, suggestions or comments please go to ‘About’ section and contact me directly.
The full code below:
import pandas as pd from sklearn.ensemble import GradientBoostingClassifier train = pd.read_csv("C:/Users/Downloads/numerai_datasets/numerai_training_data.csv") test = pd.read_csv("C:/Users/Downloads/numerai_datasets/numerai_tournament_data.csv") sub = pd.read_csv("C:/Users/Downloads/numerai_datasets/example_predictions.csv") sub["t_id"]=test["t_id"] test.drop("t_id", axis=1,inplace=True) labels=train["target"] train.drop("target", axis=1,inplace=True) grd = GradientBoostingClassifier() grd.fit(train,labels) y_pred = grd.predict_proba(test) sub["probability"]=y_pred[:,1] sub.to_csv("C:/Users/Downloads/numerai_datasets/SimplePrediction.csv", index=False)
Was the above useful? Please share with others on social media.
If you want to look for more information on Python or Trading, check online courses available at udemy.com.
Recommended reading list:
|Pairs Trading: Quantitative Methods and Analysis
The first in-depth analysis of pairs trading
Pairs trading is a market-neutral strategy in its most simple form. The strategy involves being long (or bullish) one asset and short (or bearish) another. If properly performed, the investor will gain if the market rises or falls. Pairs Trading reveals the secrets of this rigorous quantitative analysis program to provide individuals and investment houses with the tools they need to successfully implement and profit from this proven trading methodology. Pairs Trading contains specific and tested formulas for identifying and investing in pairs, and answers important questions such as what ratio should be used to construct the pairs properly.
Ganapathy Vidyamurthy (Stamford, CT) is currently a quantitative software analyst and developer at a major New York City hedge fund.
|Machine Trading: Deploying Computer Algorithms to Conquer the Markets (Wiley Trading)
Dive into algo trading with step-by-step tutorials and expert insight
Machine Trading is a practical guide to building your algorithmic trading business. Written by a recognized trader with major institution expertise, this book provides step-by-step instruction on quantitative trading and the latest technologies available even outside the Wall Street sphere. You'll discover the latest platforms that are becoming increasingly easy to use, gain access to new markets, and learn new quantitative strategies that are applicable to stocks, options, futures, currencies, and even bitcoins. The companion website provides downloadable software codes, and you'll learn to design your own proprietary tools using MATLAB. The author's experiences provide deep insight into both the business and human side of systematic trading and money management, and his evolution from proprietary trader to fund manager contains valuable lessons for investors at any level.
Algorithmic trading is booming, and the theories, tools, technologies, and the markets themselves are evolving at a rapid pace. This book gets you up to speed, and walks you through the process of developing your own proprietary trading operation using the latest tools.
Utilize the newer, easier algorithmic trading platforms
Access markets previously unavailable to systematic traders
Adopt new strategies for a variety of instruments
Gain expert perspective into the human side of trading
The strength of algorithmic trading is its versatility. It can be used in any strategy, including market-making, inter-market spreading, arbitrage, or pure speculation; decision-making and implementation can be augmented at any stage, or may operate completely automatically. Traders looking to step up their strategy need look no further than Machine Trading for clear instruction and expert solutions.
|Applied Quantitative Methods for Trading and Investment
This much-needed book, from a selection of top international experts, fills a gap by providing a manual of applied quantitative financial analysis. It focuses on advanced empirical methods for modelling financial markets in the context of practical financial applications.
Data, software and techniques specifically aligned to trading and investment will enable the reader to implement and interpret quantitative methodologies covering various models.
The unusually wide-ranging methodologies include not only the 'traditional' financial econometrics but also technical analysis systems and many nonparametric tools from the fields of data mining and artificial intelligence. However, for those readers wishing to skip the more theoretical developments, the practical application of even the most advanced techniques is made as accessible as possible.
The book will be read by quantitative analysts and traders, fund managers, risk managers; graduate students in finance and MBA courses.
|Quantitative Technical Analysis: An integrated approach to trading system development and trading management
This book, the fifth by Dr. Howard Bandy, discusses an integrated approach to trading system development and trading management.
It begins with a discussion and quantification of the several aspects of risk.
1. The trader's personal tolerance for risk.
2. The risk inherent in the price fluctuations of the issue to be traded.
3. The risk added by the trading system rules.
4. The trade-by-trade risk experienced during trading.
An original objective function, called "CAR25," based on risk-normalized profit potential is developed and explained. CAR25 is as near a universal objective function as I have found.
The importance of recognizing the non-stationary characteristics of financial data, and techniques for handling it, are discussed.
There is a general discussion of trading system development, including design, testing, backtesting, optimization, and walk forward analysis. That is followed by two parallel development paths -- one using traditional trading system development platform and the second machine learning.
Recognizing the importance of position sizing in managing trading, an original technique based on empirical Bayesian analysis, called "dynamic position sizing" and quantified in a metric called "safe-f," is introduced. Computer code implementing dynamic position sizing is included in the book.
56 fully disclosed, ready-to-run, and downloadable programs are included.
|Finding Alphas: A Quantitative Approach to Building Trading Strategies
Design more successful trading systems with this practical guide to identifying alphas
Finding Alphas seeks to teach you how to do one thing and do it well: design alphas. Written by experienced practitioners from WorldQuant, including its founder and CEO Igor Tulchinsky, this book provides detailed insight into the alchemic art of generating trading signals, and gives you access to the tools you need to practice and explore. Equally applicable across regions, this practical guide provides you with methods for uncovering the hidden signals in your data. A collection of essays provides diverse viewpoints to show the similarities, as well as unique approaches, to alpha design, covering a wide variety of topics, ranging from abstract theory to concrete technical aspects. You'll learn the dos and don'ts of information research, fundamental analysis, statistical arbitrage, alpha diversity, and more, and then delve into more advanced areas and more complex designs. The companion website, www.worldquantchallenge.com, features alpha examples with formulas and explanations. Further, this book also provides practical guidance for using WorldQuant's online simulation tool WebSim® to get hands-on practice in alpha design.
Alpha is an algorithm which trades financial securities. This book shows you the ins and outs of alpha design, with key insight from experienced practitioners.
Learn the seven habits of highly effective quants
Understand the key technical aspects of alpha design
Use WebSim® to experiment and create more successful alphas
Finding Alphas is the detailed, informative guide you need to start designing robust, successful alphas.
|Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading
New edition of book that demystifies quant and algo trading
In this updated edition of his bestselling book, Rishi K Narang offers in a straightforward, nontechnical style—supplemented by real-world examples and informative anecdotes—a reliable resource takes you on a detailed tour through the black box. He skillfully sheds light upon the work that quants do, lifting the veil of mystery around quantitative trading and allowing anyone interested in doing so to understand quants and their strategies. This new edition includes information on High Frequency Trading.
Offers an update on the bestselling book for explaining in non-mathematical terms what quant and algo trading are and how they work
Provides key information for investors to evaluate the best hedge fund investments
Explains how quant strategies fit into a portfolio, why they are valuable, and how to evaluate a quant manager
This new edition of Inside the Black Box explains quant investing without the jargon and goes a long way toward educating investment professionals.
|Automated Trading with R: Quantitative Research and Platform Development
Learn to trade algorithmically with your existing brokerage, from data management, to strategy optimization, to order execution, using free and publicly available data. Connect to your brokerage’s API, and the source code is plug-and-play.
Automated Trading with R explains automated trading, starting with its mathematics and moving to its computation and execution. You will gain a unique insight into the mechanics and computational considerations taken in building a back-tester, strategy optimizer, and fully functional trading platform.
The platform built in this book can serve as a complete replacement for commercially available platforms used by retail traders and small funds. Software components are strictly decoupled and easily scalable, providing opportunity to substitute any data source, trading algorithm, or brokerage. This book will:
Provide a flexible alternative to common strategy automation frameworks, like Tradestation, Metatrader, and CQG, to small funds and retail traders
Offer an understanding of the internal mechanisms of an automated trading system
Standardize discussion and notation of real-world strategy optimization problems
What You Will Learn
Understand machine-learning criteria for statistical validity in the context of time-series
Optimize strategies, generate real-time trading decisions, and minimize computation time while programming an automated strategy in R and using its package library
Best simulate strategy performance in its specific use case to derive accurate performance estimates
Understand critical real-world variables pertaining to portfolio management and performance assessment, including latency, drawdowns, varying trade size, portfolio growth, and penalization of unused capital
Who This Book Is For
Traders/practitioners at the retail or small fund level with at least an undergraduate background in finance or computer science; graduate level finance or data science students
|Quantitative Trading with R: Understanding Mathematical and Computational Tools from a Quant's Perspective
Quantitative Finance with R offers a winning strategy for devising expertly-crafted and workable trading models using the R open source programming language, providing readers with a step-by-step approach to understanding complex quantitative finance problems and building functional computer code.
|Quantitative Momentum: A Practitioner's Guide to Building a Momentum-Based Stock Selection System (Wiley Finance)
The individual investor's comprehensive guide to momentum investing
Quantitative Momentum brings momentum investing out of Wall Street and into the hands of individual investors. In his last book, Quantitative Value, author Wes Gray brought systematic value strategy from the hedge funds to the masses; in this book, he does the same for momentum investing, the system that has been shown to beat the market and regularly enriches the coffers of Wall Street's most sophisticated investors. First, you'll learn what momentum investing is not: it's not 'growth' investing, nor is it an esoteric academic concept. You may have seen it used for asset allocation, but this book details the ways in which momentum stands on its own as a stock selection strategy, and gives you the expert insight you need to make it work for you. You'll dig into its behavioral psychology roots, and discover the key tactics that are bringing both institutional and individual investors flocking into the momentum fold.
Systematic investment strategies always seem to look good on paper, but many fall down in practice. Momentum investing is one of the few systematic strategies with legs, withstanding the test of time and the rigor of academic investigation. This book provides invaluable guidance on constructing your own momentum strategy from the ground up.
Learn what momentum is and is not
Discover how momentum can beat the market
Take momentum beyond asset allocation into stock selection
Access the tools that ease DIY implementation
The large Wall Street hedge funds tend to portray themselves as the sophisticated elite, but momentum investing allows you to 'borrow' one of their top strategies to enrich your own portfolio. Quantitative Momentum is the individual investor's guide to boosting market success with a robust momentum strategy.
|Quantitative Trading: Algorithms, Analytics, Data, Models, Optimization
The first part of this book discusses institutions and mechanisms of algorithmic trading, market microstructure, high-frequency data and stylized facts, time and event aggregation, order book dynamics, trading strategies and algorithms, transaction costs, market impact and execution strategies, risk analysis, and management. The second part covers market impact models, network models, multi-asset trading, machine learning techniques, and nonlinear filtering. The third part discusses electronic market making, liquidity, systemic risk, recent developments and debates on the subject.