A Portfolio of my Work with Analytics, Finance and Technology

Saturday, 24 June 2017

Financial Analytics with R (Part 1) - Upside and Downside Capture Ratio

Introduction

I will be taking a bit of a break from Oil & Gas research. Recently, I was going through the performance of several hedge funds when I noticed an interesting metric: the Upside and Downside Capture Ratio. The Upside and Downside captures an investment's performance against a reference. This reference is usually an index. The goal of this tutorial is to quantify how a common stock, say Apple, performs against the market. The index chosen to represent the market, Wilshire 5000, is the broadest measure of the U.S economy.

Unlike a linear correlation, the ratios gives better perspective of how dependent an investment is on the market's rise or fall. For instance, Algonquin Capital, a Toronto-based hedge fund, had an downside capture of -0.51 against the S&P 500 (Algonquin Capital, 2017). This means that, since inception, the value of the investment has grown significantly even in a bear market. The downside capture for a hedge fund should be as low as possible.

Background

Figure 1, As far as mathematics go, the Upside and Downside Capture Ratio is surprisingly intuitive

The Upside and Downside Capture Ratio looks at cumulative return over during market gain and decline respectively (Figure 1). The exponential term gives greater weight to gains/losses reaped over a shorter period of time.

Figure 2, A good hedge fund should be able to deliver above-average returns in both a bull and bear economy. The Upside and Downside Capture Ratio successfully quantifies its ability to do so

Technique

I broke the calculations into a series of functions with R. In my next post, I will upload all coding into Github.

library("Quandl")

library("quantmod")

#generates data from ticker, calls monthlyReturn and year_sep

cfa<-function(symbol)

{

getSymbols(symbol)

testa<-monthlyReturn(get(paste(symbol)))

year_sep(testa,2016)

}

#divide index data into specified range and call the bull and bear functions respectively

year_sep_index<-function(x,year,year2)

{

annual_x<<- (x[substr(index(x),1,4)==year | substr(index(x),1,4)==year2])

bull_market(annual_x)

bear_market(annual_x)

}

#divide investment data into specified range and call the bull and bear functions respectively

year_sep<-function(y,year,year2)

{

annual_y<<-(y[substr(index(y),1,4)==year | y[substr(index(y),1,4)==year2])

test<<-annual_y[index(annual_y)%in%index(bull)]+1

upside(test)

downside(test)

}

#isolate periods of monthly cumulative gain

bull_market<-function(x)

{

bull<<-x[x[,1]>=0]+1

annualized_bullreturn<<-((prod(bull))^(12/nrow(bull)))-1

}

#isolate periods of monthly cumulative loss

bear_market<-function(x)

{

bear<<-x[x[,1]<=0]+1

annualized_bearreturn<<-((prod(bear))^(12/nrow(bear)))-1

}

#calculate upside gain

upside<-function(y)

{

upside_return<<-(annualized_bullreturn)/((prod(y)^(12/nrow(bull)))-1)

}

#calculate downside gain

downside<-function(y)

{

downside_return<<-(annualized_bearreturn)/((prod(y)^(12/nrow(bear)))-1)

}

The Wilshire 5000 Total Market Index was chosen because it provided the broadest measure of the U.S economy. Therefore, it gave a very reasonable approximation of the direction the market was moving overall.

getSymbols("^W5000",from = '2014-01-01',to = '2016-12-31')
#clean data with na.omit

W5000_monthly<-(monthlyReturn(na.omit(W5000)))

year_sep_index(W5000_monthly,2016,2015)

Extracting financial data from Apple was not difficult. I called the cfa method with "AAPL" (Figure 3). The upside return and downside return are also global variables.

Results

Figure 3, Screenshot of my output. Simple, easy and fast.

The results show that the upside and downside capture ratio is 334.9% and -183% respectively across 2015 and 2016. Small sample size for the index was not an issue. 9 months, out of 24, saw a fall in cumulative return for Wilshire 5000 Total Market Index. The remaining months marked a period of growth. Altogether, an investment made in January 2015 on the Wilshire 5000 would yeild 8.1% by December 2016.

Even during a weak market, Apple Inc. (AAPL) demonstrates profitability (Figure 3). The stock price continues to grow - but at approximately half the rate as it would in a rising market. Future steps include cycling through a large database of stocks to rank securities that perform under different circumstances.

Conclusions

The Upward and Downward Capture Ratio quantifies a company's susceptibility to changes in the market. In creating a portfolio robust enough to withstand market fluctuations, the success of an investment can be attributed to shrewd security selection rather than from outside forces as the market. In the example performed with Apple Inc., I established that the stock will grow even when the market is experiencing a downturn. However, it can not be considered fully independent from the Wilshire 5000 Total Market Index.

References

Morning Side Investing Glossary. Upside and Downside Capture Ratio. Retrieved 06/23/2017, from http://www.morningstar.com/InvGlossary/upside-downside-capture-ratio.aspx

Algonquin Capital. Funds Performance. Retrived 06/21/2017
http://www.algonquincap.com/funds/performance/

Monday, 15 May 2017

Looking at Oil with R (Part 4) - Dollar-Cost Averaging of Oil & Gas Companies

Introduction

On the fourth part of my Oil&Gas series, I wanted to go deeper into formula investing using O&G firms only as an example. Dollar-Cost Averaging (DCA) is the term given to the process of buying a fixed dollar amount of stocks every week, month or quarter. DCA averages decline and upturn in stocks, building a portfolio more robust against fluctuations.

I first came across this concept in The Wealthy Barber, a book I found lying in my basement and put to good use. Today, I will be using R to perform DCA analysis of Suncor. The analysis will be performed from a worst-case perspective: buying stock as it increases in value - only to have oil crash and plateau at a price much lower than what you began with.

Suncor (SU) stocks will be brought on the first Monday of each month beginning on May 2014, when Suncor was showing a strong growth at $38, till May 2017, when it plateaued around $31. I assumed a monthly investment of USD $1,000 into Suncor with a $30 commission per trade. I found that using DCA yielded an average price per share of $30.18 by May 2017. If sold on May 15th, The net profit is $1,810. With commission, the profit is a modest $700. With dividends, we can expect a greater yeild.

If spread over time, even a loss can be profitable. Or, at the very least, safe.

2015's Oil Glut caused a drop on Suncor's stock, which it is still recovering from

Technique

All coding is done in R, a statistical programming language. First, I loaded the necessary packages. I used Quandl to extract historical financial data.

#load necessary libraries
library("Quandl")
library("sqldf")

Quandl captures financial data from a range of companies, as Suncor

I defined necessary variables. Feel free to experiment with different values.

#define variables
start_date = "01-05-2014"
end_date = "15-05-2017"
ticker = "GOOG/NYSE_SU"

To simplify repetitive simulations, I put most of the computation in its own function. My secret ingredient was sqldf. Sqldf is a package that allows me to code in SQL within a R environment, opening the way for powerful querying and analysis.

#Dollar Cost Averaging Function
DCA_calc <- function(x){
names(x)[1]="date"
names(x)[2]="price"
x$date = as.character(x$date)
a<-sqldf("select min(substr(date,9,2)), date, price from x group by substr(date,1,7)")
b<-sqldf("select sum(1000/price) d,(select count(price) from a) e from a")
c<-sqldf("select (e*1000)/(d) AveragePricePerShare, d TotalNumberofStocks, e*1000 TotalCapitalSpent from b")
return (c)
}

Finally call the function.

#call Quandl function
Suncor_price=data.frame(Quandl(ticker,trim_start=start_date,trim_end=end_date,type="raw"))
result=DCA_calc(Suncor_price)

Running my code yields:

Conclusions

DCA only works when the stock has spent a considerable time at a much lower value than what it had originally began. Thus, even a slight increase in price can yeild a profit or dramatically cut losses. Additionally, this post does not factor in the dividends gained from the company. Factoring dividends can yeild higher profits.

References

Chilton, D. (1989). The Wealthy Barber . Toronto: Stoddart Publishing Co. Limited.

Wednesday, 1 February 2017

Looking at Oil with R (Part 3) - Portfolio Optimization of Oil and Gas Companies

Introduction

In my previous post, I compared oil futures against current OPEC oil price. In this post, I want to take advantage of R to optimize a portfolio of oil and gas companies. Portfolio optimization is a method of diversifying investments to to reach a specified level of risk and return. The portfolio I am building is comprised of companies specializing in oil and gas. Specifically, those in the extraction and processing stage.

My target portfolio will be minimizing variance - or risk. By minimizing risk, I am accepting less return on my investment.

Goal

To identify O&G firms from NASDAQ with the least variance in stock price after January 2014. These firms would be the best choice for a low-risk portfolio focusing on petrochemicals.

Technique

I learned about the basics of portfolio optimization from an online thesis (Engels, 2004). The first two chapters cover the fundamentals of optimization theory.

The names of companies came from the NASDAQ website. NASDAQ has their data available in a .csv format which I downloaded and imported into R. I filtered the data to show only those in Oil & Gas.

NASDAQ = read.csv("companylist.csv",header=T) #import data
NASDAQ_oil = subset(NASDAQ, grepl("Oil",as.character(Industry),ignore.case = TRUE) & grepl("Energy",as.character(Sector),
ignore.case = TRUE)) #filter data

oil_list = as.vector(as.matrix(NASDAQ_oil)[,1]) #extract ticker symbol from dataset

Next, I got stock prices for each ticker symbol. I chose to 'train' the portfolio algorithm with data from January 2014 to present. I chose this period because it included the 2015 oil glut - meaning a more holistic view.

library (quantmod)
library(PerformanceAnalytics)
getSymbols(oil_list,from = '2014-01-01')
return_matrix = NULL
for( symbol in oil_list)
{
return_matrix = merge.xts(return_matrix,Return.calculate(Ad(get(paste(symbol))),method='discrete'))
}

The 'quantmod' package is useful because it extracts data from multiple sources, including Google Finance and Yahoo Finance. The getSymbol() function extracts stock prices for the array of ticker symbols. I used the 'PerformanceAnalytics' package to calculate cumulative returns matrix. The 'PerformanceAnalytics' package can create some aesthetic graphs as well.

chart.CumReturns(return_matrix, main = "Cumulative Returns of Oil & Gas Companies on NASDAQ",ylab = "Percent Increase (%)",ylim=c(-2,2),colorset=redmono)

(Figure 1) Cumulative Returns of Various O&G Companies

Since October 2014, the performance of many O&G firms had a downward trend. As explained in my previous post, the trend came from uncontrolled overproduction from OPEC. Around July 2016, the downward trend plateaus. The plateau could come from recent news that OPEC, and other countries, would agree to cut production (Cunningham, 2017).

The 'quantmod' package is useful because it extracts data from multiple sources, including Google Finance and Yahoo Finance. I took this from an amazing tutorial by Kyle Balkissoon (Balkissoon, 2016). Kyle's tutorial builds on portfolio optimization in greater detail.

The getSymbol() function extracts stock prices for the array of ticker symbols. I used the 'PerformanceAnalytics' package to calculate cumulative returns matrix. The 'PerformanceAnalytics' package can create some aesthetic graphs as well.
The next step is to construct the actual portfolio.

library(PortfolioAnalytics)

minSD.opt=portfolio.spec(

assets=colnames(return_matrix))

minSD.opt=add.objective(

portfolio=minSD.opt

type='risk',

name='StdDev')

minSD.opt=add.constraint(

portfolio = minSD.opt,

type="full_investment")

minSD.opt=add.constraint(

portfolio = minSD.opt,

type="long_only")

minSD.opt=add.constraint(

portfolio = minSD.opt,

type="box",

min=0,max=0.2)

What is essentially happening here is that I am creating an object called minSD.opt. This object is essentially a list of user-defined constraints. My minimum variance portfolio reduces the risk to within the defined standard deviation of 0 to 0.3. The "full_investment" constraint means the sum of portfolio weights should add to 1 - meaning the whole portfolio is currently invested. I assume the "long_only" constraint means that the portfolio size does not change. Further clarification on these constraints would be appreciated.

library(DEoptim)
OptimizedPortfolioMinVariance=optimize.portfolio(
R=return_matrix,
portfolio=MinimumVariancePortfolio,
trace=TRUE)

At this point, I attempt to optimize my portfolio with the list of constraints in the MinimumVariancePortfolio object. The optimization should take no more than a minute max.

x<-extractWeights(OptimizedPortfolioMinVariance) #extract weights of portfolio
x=x[order(-x)] #order list based on proportion
z1<-x[x>0] #remove companies that have zero weights
z<-as.data.frame(z1) #turn into data frame
r<-sub('.Adjusted', '', rownames(z)) #by default, name has ".Adjusted" concatenated to it. Remove the ".adjusted" suffix
b = (data.frame(r,as.vector(z1))) #turn list into dataframe

barplot(b$as.vector.z1.,names.arg = r, las = 2, ylab = "Portfolio Proportion",border = NA,main = "Oil & Gas Portfolio with Minimum Variance",col = 'brown1') #get a barplot

Finally, I extract the proportion of firms in my portfolio as a list. I clean and order the data to make it look more appealing. In the end, I create a barplot to visually illustrate the O&G firms with the lowest risk since January 2014 (Figure 2).

(Figure 2) Weights of Minimum Variance O&G Portfolio

The top three least riskiest companies, Centennial Resource Development, Dorchester Minerals and Viper Energy Partners, comprise 49.8% of our portfolio. I was surprised that all three companies came from Texas.

After further research, I learned that Texas is naturally rich in previously inaccessible reserves of oil and natural gas in the form of shale formations. Advances in extraction technologies as hydraulic fracking has significantly increased production (Klare, 2016). As NASDAQ has many international oil and gas companies, it is safe to assume that petrochemical companies from Texas are the least riskiest to invest in.

Resources

Balkissoon, K. (2016). Portfolio Construction and Optimization with R. Retrieved 1 10, 2017, from http://kkb.io/2016/10/26/portfolio-optimization/

Cunningham, N. (2017, 1 19). Despite OPEC Deal Oil Prices Could Fall Sharply From Here. Retrieved 1 20, 2017, from http://oilprice.com/Energy/Energy-General/Despite-OPEC-Deal-Oil-Prices-Could-Fall-Sharply-From-Here.html

Klare, M. T. (2016, 3 18). The Future of Oil Is Here—and It Doesn’t Look Pretty. Retrieved 1 15, 2017, from https://www.thenation.com/article/the-future-of-oil-is-here-and-it-doesnt-look-pretty/

Thursday, 17 November 2016

Looking at Oil with R (Part 2) - Moving Averages, Continuously Compounded Returns and Oil Futures.

Introduction

In my previous post, I looked at techniques R offers to forecast stock prices. I learned how to decompose time series data to separate cyclical and long-term trends. I decided to further my research by investigating the relationship between oil futures and the current oil price. I proved that when current oil quotes are low, oil futures will tend to be consistently higher than the current quote. When oil prices are high, there is a greater degree of inconsistency. For the most part, the expectation was that the price would drop.

I also proved that the correlation between oil futures and the current oil price is a good indicator of the public's confidence that oil prices will continue in its current path. The average correlation between 2005 - 2016 is 0.71. An indicator of crude oil surplus is a decrease in correlation between the continuously compounded returns of oil and oil futures. The correlation was around 0.5 during 2007 financial crisis. Confidence in the oil market has been steadily rising since 2015. The increase in positive correlation while oil prices are decreasing means that the public is confident that the prices of oil will continue to drop.

Technique

Like my previous post, all my data comes from Quandl API. For oil quotes, I relied on the OPEC Basket Price. My values for the daily crude oil futures came from Quandl. As approximation, I averaged the open and close price.

library("Quandl")

OilFuture = Quandl("CHRIS/CME_CL1",trim_start="01-02-2004", trim_end="10-01-2016", type="zoo")

Oil = Quandl("OPEC/ORB", trim_start="01-02-2004", trim_end="10-01-2016", type="zoo")

AvgOilFuture = ((OilFuture[,2]+OilFuture[,1])/2)

Figure 1, Oil Futures vs. Oil

First, I wanted a high level view of what I wanted to accomplish. I wanted to compare OPEC prices against Oil Futures. By comparing oil futures against oil prices from 2004 to 2016, I was able to gain some back-of-the-envelope insights (Figure 1). The blue represents the expectation that future oil prices will equal the current price.

At low prices (<$80), the expectation was that the prices will increase.
At high prices (>$80), the expectation had greater degree of uncertainty. For the most part, the expectation was that the price would drop.

I got the idea of converting the price to continuously compounded returns from an excellent piece by Guangming Lang. Converting the price to returns meant looking at the rate of increase as opposed to the prices themselves.

plot(MergedOil[,1],MergedOil[,2],pch=".",col="red",main = "Oil Futures vs. Oil Price",xlab = "OPEC Crude Oil Basket Price",ylab = "Crude Oil Futures")

lines(MergedOil[,1],MergedOil[,1],col="blue")

Figure 2, Experimenting with Moving Averages

I ran into some difficulty trying to grasp the concept of moving average. Simply put, for a certain time point, moving averages would take a segment of values around it and average them. Then, it would move onto the next point. Increasing the 'width' of the segment can minimize outliers, reduce the impact of short-term panic and remove random noise that are not relevant to the underlying trend. Increasing 'width' can erase useful information too. It is a balancing game.

I experimented with different widths in Figure 2. At 5 points, short-term fluctuations are not smoothed out. At 500 points, long-term trends are erased. 100 points would be the optimal width.

plot(rollapply(ccret[,1],5,mean),col='red',xlab = "Year",ylab="Continuously Compounded Returns",main = "Returns from OPEC Basket Price with Moving Average")
legend('top', c("5 Points","50 Points","100 Points","500 Points"), lty=1, col=c('red','black', 'blue', 'green',' brown'), bty='n', cex=.9)
lines(rollapply(ccret[,1],50,mean),col='black')
lines(rollapply(ccret[,1],100,mean),col='blue')
lines(rollapply(ccret[,1],500,mean),col='green')

Figure 3, Comparison between Oil and Oil Futures

The next step was to compare the actual crude oil price and its future - with the moving average applied to remove any noise. Figure 3 shows that returns from both datasets are pretty much equal: when the future's quote increase, the current oil's quote increases proportionally. There is actually a way to quantify the correlation between both datasets.

plot(movingavg[,1], xlab="Time", ylab="Continuously Compounded Returns",main="Comparison between Crude Oil Current Prices and Futures")
lines(movingavg[,2],col="red")
legend('topleft', c("OPEC Crude Oil Basket Price","Crude Oil Futures"), lty=1, col=c('black','red'), bty='n', cex=.9)

Figure 4, Correlation between Oil and Oil Futures

Figure 4 illustrates the correlation between Oil and Oil Futures. I can determine that that there is an overall moderate positive correlation between the two datasets. Additionally, I can see that the weakest relationship were during the 2007-2008 financial crisis and 2014-2015 oil glut. A weak relationship means that values are more spread around the expected value.

Simply put, a weak positive relationship means that the investors are uncertain that the oil market will continue in its current trajectory. I can also determine that the 2007 recession took the investing public by surprise (judging by the steep fall) but confidence was decreasing well before the 2014-2015 oil glut

Also, the correlation has been growing stronger since 2015. This means that the belief that the price of oil will stay in its current trajectory has been growing steadily stronger. Unfortunately, the current trajectory for the price of oil is decreasing.

correlate_oil= function(x) cor(x)[1, 2]
oil_cor= rollapply(ccret, width=100, FUN=correlate_oil, by.column=FALSE, align="right")
plot(oil_cor,ylab="Correlation",xlab="Time",main="Correlation between Continuously Compounded Returns of Oil and Oil Futures")

I was pleasantly surprised about my conclusions. I learned that during major economic crisis, the public will lose confidence that the oil price will increase. I also learned that the 2014-2015 oil glut could have been predicted as confidence started to dip around 2013. This is surprising as major oil companies should have prepared for this by reducing production instead of later laying off employees. In July 2016, Calgary, a major city dependent on the petroleum industry, had an unemployment rate close to 9% because of the layoffs.

Next steps into my research will be to look at commodities to forecast oil prices - such as gold, copper and even rice.

Resources

Freaknomics . (2008, July ). Forecasting Oil Prices: It’s Easy to Beat the Experts. Retrieved November 2016 , from Freaknomics : http://freakonomics.com/2008/07/21/forecasting-oil-prices-its-easy-to-beat-the-experts/

Lang, G. (2014, October ). Analyze Stock Price Data Using R (Part 3) . Retrieved November 2016, from Master R : http://masterr.org/r/analyze-stock-price-data-using-r-part3/

Ron Alquist, L. K. (2011). Forecasting the Price of Oil. Board of Governors of the Federal Reserve System.