• LOGIN
  • No products in the cart.

The State of Machine Learning in Python

by Matt Kirk

Matt Kirk is a data architect, software engineer, and entrepreneur based out of Seattle, WA. For years, he struggled to piece together his quantitative finance background with his passion for building software. Then he discovered his affinity for solving problems with data. Now, he helps multi-million dollar companies with their data projects. From diamond recommendation engines to marketing automation tools, he loves educating engineering teams about methods to start their big data projects. To learn more about how you can get started with your big data project (Beyond reading this book), check out matthewkirk.com for tips.

Stock prices.

Temperature.

Web app interactions.

At this very moment right now, I can pull data on any of these three things down to the millisecond. And that’s pretty amazing.

Though the possibilities are endless, so what?

What exactly are you supposed to do with it? Having data doesn’t mean that you’ll end up with lots of insight that solve real problems. Instead, data can become a huge distraction and maintenance cost to any project.

In this article, I’ll walk through how to find insight from data using Machine Learning in Python. This article will focus on the three classes of machine learning algorithms and how it applies to different problems and also give you something to try out in your console.

By the end of this article, you won’t have a Ph.D. in machine learning – that would take seven years – but you will at least have enough information to get started learning about this fascinating subject. Before we discuss algorithms, we will discuss what machine learning aims to solve: transforming data into insight.

Machine learning transforms data into insight

Our goal is to achieve insight but to take it a bit further let’s look at something called the Data, Information, and Insight pyramid. This structure serves as a hierarchy of needs when it comes to knowledge.

At the very bottom, you have data which is the cornerstone of all of our knowledge. Data could be stock prices throughout the day or temperature measurements. These aren’t interesting, and looking through data sometimes is an impossible task. You can’t look through all billion records of temperature data.

In the middle is information. Information is where you take some data and aggregate it together into something more useful. For instance, this could be a max, a min, or an average of data points. If you’ve ever tuned into CNBC, you most likely see stock price highs and lows or the average return for that day. This aggregation is information.

Lastly which is the most useful is insight. Insight is subjective. For instance, if the weather is cold for an extended period, then most likely it is winter. As humans, we are naturally gifted at pattern matching and therefore, determining insight.

How do we find insights: deduction vs. induction?

There are lots of ways of going about finding insights. The most obvious way of finding insight is to explicitly code it. For example, we could match terms or select specific instances in the data.

But there’s a limit to that, humans only can find so much. Also, while we’re good at matching patterns, they can have difficulty at statistical reasoning. We are almost always biased.

Explicit programming is a deductive approach to finding insight. A lot of artificial intelligence algorithms will spend computing power “planning” or “searching” based on heuristics. However, deductive reasoning is difficult to apply towards data since we sometimes don’t know what to look for.

Another option is to use data as the center of attention. If we were able to take data and derive some commonalities in it, then we could build our software off of that. Using data to influence programming is inductive. Induction is at the center of machine learning.
Inductive reasoning: Machine Learning

Machine Learning is a collection of algorithms that learn from data without being explicitly programmed.

This point is important: Machine Learning is a group of algorithms that don’t have to be intervened by humans but instead can just be set to run on specific data.

Inside of machine learning though there are subclasses of problems or different classes of learning. The classes are either supervised, unsupervised, and reinforcement learning.

To explain the differences between each of the learning classes I like to use the following table. It outlines what the goal of each class is conceptual:

Class Function Goal Algorithms Packages
Supervised f(x) = y Map inputs to outputs
  • Naive Bayes
  • K-Nearest Neighbors
  • Support Vector Machines
  • Neural Nets
  • Classifiers
  • Regressors
  • Decision Trees
  • Sklearn
  • Turi / Graphlab
  • Tensorflow
  • Theano
Unsupervised f(x)=x Map input onto itself
  • Clustering
  • Data transformation
  • Autoencoders
  • Tensorflow
  • Graphlab
  • sklearn
Reinforcement max R Maximize long term reward
  • Q-Learning
  • TD-Lambda
  • Multi Armed Bandits
  • OpenAI

Supervised Learning: Map Inputs to Outputs

This class of machine learning algorithms is the most popular. Most individuals want to take data inputs and outputs and build some model. Whether it’s finding the optimal portfolio, spam filtering, churn analysis or anything else.

In this section, I’ll explain what the intuitive goal of supervised learning problems are, what algorithms exist, and what sort of Python libraries are available for use as well as their pros and cons.

The Goal

Supervised learning is probably the most intuitive of each of the learning problems. It asks the following question:

Based on previous data, can we build a model that predicts new data?

Supervised Learning can come in two different styles: regression or classification.

Regression is where you want to determine a number given some data. For instance, let’s say we want to find the predicted return of a stock investment. Regressions base their predictions on factors that we have available. We could then take that information, feed it through a model and then receive a number.

Classification is when we want a particular class label as the answer. A lot of classification problems are binary classification (true/false) where you are just looking for a yes or no answer. Spam classification is a pretty classic example where you want to determine whether a given email is spammy or not.

The Algorithms

The traditional algorithms for supervised learning are:

  • Decision Trees
  • K-Nearest Neighbors
  • Linear Regression
  • Logistic Regression
  • Naive Bayes
  • Neural Networks

Here’s an overview of each of them.

Decision Trees
Decision trees are one of the most intuitive algorithms and create a tree that branches on conditions. Trees are similar to how if/else works inside of most programming languages (like Python). An example I used in my book Thoughtful Machine Learning with Python is using attributes of mushrooms to classify whether they were poisonous or not. (link to scikit-learn, turi) these are available in scikit-learn or turi.

```python
 from sklearn import datasets
 from sklearn import tree

iris = datasets.load_iris()
 clf = tree.DecisionTreeClassifier()
 y_pred = clf.fit(iris.data, iris.target).predict(iris.data)
 print("Number of mislabeled points out of a total %d points : %d"
 % (iris.data.shape[0],(iris.target != y_pred).sum()))

# Number of mislabeled points out of a total 150 points : 0
 ````

K-Nearest Neighbors
K-Nearest Neighbors is a simple algorithm that looks at a query point and determines a classification or nominal value based off of the “k” nearest points (or neighbors). This simple algorithm is used in estimating the value of real estate properties and can be quite good for a simple classifier or regression.

```python
 from sklearn import datasets
 from sklearn import neighbors

iris = datasets.load_iris()
 knn = neighbors.KNeighborsClassifier()
 y_pred = knn.fit(iris.data, iris.target).predict(iris.data)
 print("Number of mislabeled points out of a total %d points : %d"
 % (iris.data.shape[0],(iris.target != y_pred).sum()))

# Number of mislabeled points out of a total 150 points : 5
 ````

Linear Regression
Linear regression is a class of algorithms that attempts to fit a line to some data points. Linear regressions work exceptionally well and have applications in finance, as well as mapping movie reviews to specific factors like actors or directors.

```python
 from sklearn import datasets
 from sklearn import linear_model
 from sklearn import metrics
 import math

iris = datasets.load_iris()
 lm = linear_model.LinearRegression()
 y_pred = lm.fit(iris.data, iris.target).predict(iris.data)

print("Root Mean Squared Error : %f"
 % math.sqrt(metrics.mean_squared_error(iris.target, y_pred)))

# Root Mean Squared Error : 0.215372
 ````

Logistic Regression
Logistic regression is a take on a regression that attempts to attach a probability of classification by using something called a sigmoid function (which is a learning curve). Logistic regression is used a lot because of how fast it is. Google uses this all the time to train models for simple classifications.

```python
 from sklearn import datasets
 from sklearn import linear_model

iris = datasets.load_iris()
 lr = linear_model.LogisticRegression()
 y_pred = lr.fit(iris.data, iris.target).predict(iris.data)
 print("Number of mislabeled points out of a total %d points : %d"
 % (iris.data.shape[0],(iris.target != y_pred).sum()))

# Number of mislabeled points out of a total 150 points : 6
 ````

Naive Bayesian Classifier
Naive Bayes Classifiers are one of the most famous algorithms. It takes a bunch of features and probabilistically determines whether those features are uncommon or not. The Naive part comes from the fact that each feature is independent. So for instance with spam classification the word “prince” inside of an e-mail might show up more often in spammy e-mails.

```python
 from sklearn import datasets
 iris = datasets.load_iris()
 from sklearn.naive_bayes import GaussianNB
 gnb = GaussianNB()
 y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
 print("Number of mislabeled points out of a total %d points : %d"
 % (iris.data.shape[0],(iris.target != y_pred).sum()))

# Number of mislabeled points out of a total 150 points : 6
 ````

Neural Networks
Neural Networks are the area of supervised learning that has the most future right now, thanks to the popularity of Deep Learning (which is just bigger and more exciting neural networks. Neural Nets are a class of algorithms that put together nodes for simple calculations. Together they can do things like language classification, image detection and much much more.

```python
 from sklearn import datasets
 from sklearn import neural_network

iris = datasets.load_iris()
 mlp = neural_network.MLPClassifier(hidden_layer_sizes=(20), max_iter=1000)
 y_pred = mlp.fit(iris.data, iris.target).predict(iris.data)
 print("Number of mislabeled points out of a total %d points : %d"
 % (iris.data.shape[0],(iris.target != y_pred).sum()))

# Number of mislabeled points out of a total 150 points : 4
 ````

The Packages
For supervised learning problems, there exists a lot of good python packages out there like Turi (graphlab), Scikit Learn, Theano, and Tensorflow.

Algorithm Packages Available
Decision Trees Scikit-learn, Turi
K-Nearest Neighbors Scikit-learn, Turi
Linear Regression Scikit-learn, Turi
Logistic Regression Scikit-learn, Turi
Naive Bayes Scikit-learn, Turi
Neural Networks Tensorflow, scikit-learn, Turi, theano
Support Vector Machines Libsvm, scikit-learn, Turi

Unsupervised Learning: Map Data onto itself

The Goal

Unsupervised learning is a little peculiar in that it’s trying to take inputs and build a model that predicts itself. This might not make a lot of sense on first reading, but the idea is to build a representation of what exists instead of trying to build some prediction mechanism.

The Algorithms

Unsupervised Learning methods fall into a couple of subcategories: clustering, dimensionality reduction, and deep learning.

Clustering takes a set of data and tries to build a cluster mapping for each data point. So the idea is making some vector x which might have many different elements and attaching some label to it. Labels can then be utilized in visualization either by coloring or splitting up the data.

Dimensionality reduction is used as a way of overcoming the curse of dimensionality which is a problem in many machine learning algorithms. The idea here is to take dimensions, which can number in the thousands, and represent the same data with fewer dimensions. As you can imagine the simpler, the data is the more stable the results will be.

Deep learning might surprise you to be existing here. Deep learning is becoming quite popular lately but what it is effectively doing is detecting features without any intervention from the human. Some deep learning algorithms have this amazing ability to take an image, identify features about it and feed that into another model (like a supervised model).

Some popular algorithms in this category are:

Algorithm Subclass
K-Means Clustering Clustering
EM-Clustering Clustering
Principal Component Analysis Dimensionality Reduction
Independent Component Analysis Dimensionality Reduction
Autoencoders / Convolutional Neural Nets Deep Learning

Again I could spend an entire article speaking about each one of these individually but instead want you to get a good idea as to what they are. In general, there are two classes: clustering and dimension transformations.

Clustering

Clustering algorithms aim to take a dataset that is unlabeled and put it into labeled categories. So, for instance, taking some dataset like Iris and put it into three categories. These classes might not have any information attached to them (or a name), but they are categorized.

K-Means clustering is one of the simplest clustering algorithms. The idea is simply to state that you want K clusters and to find K centroids to cluster. In practice, this will take a dataset like the Iris dataset and put it into K clusters.

```python
 import numpy as np
 import matplotlib.pyplot as plt
 from mpl_toolkits.mplot3d import Axes3D

from sklearn import cluster
 from sklearn import datasets

np.random.seed(5)

iris = datasets.load_iris()
 k_means = cluster.KMeans(3)
 k_means.fit(iris.data)

fig = plt.figure(1, figsize=(4, 3))
 plt.clf()
 ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)

plt.cla()

labels = k_means.labels_

ax.scatter(iris.data[:, 3], iris.data[:, 0], iris.data[:, 2], c=labels.astype(np.float))

ax.w_xaxis.set_ticklabels([])
 ax.w_yaxis.set_ticklabels([])
 ax.w_zaxis.set_ticklabels([])
 ax.set_xlabel('Petal width')
 ax.set_ylabel('Sepal length')
 ax.set_zlabel('Petal length')

plt.show()
 ```

 

EM Clustering is an extension of K-Means that looks for clusters that aren’t circular in shape. Non circular clusters can be quite useful for determining datasets that don’t follow a normal distribution. They also have the added benefit of not having to have a specific number of clusters defined.

```python
 import numpy as np
 import matplotlib.pyplot as plt
 from mpl_toolkits.mplot3d import Axes3D

from sklearn import mixture
 from sklearn import datasets

np.random.seed(5)

iris = datasets.load_iris()
 em = mixture.GaussianMixture(n_components=3, covariance_type='full').fit(iris.data)

fig = plt.figure(1, figsize=(4, 3))
 plt.clf()
 ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)

plt.cla()

labels = em.predict(iris.data)

ax.scatter(iris.data[:, 3], iris.data[:, 0], iris.data[:, 2], c=labels.astype(np.float))

ax.w_xaxis.set_ticklabels([])
 ax.w_yaxis.set_ticklabels([])
 ax.w_zaxis.set_ticklabels([])
 ax.set_xlabel('Petal width')
 ax.set_ylabel('Sepal length')
 ax.set_zlabel('Petal length')

plt.show()
 ```

Dimension Transformations

Dimension transformations aim to take a dataset with n dimensions and transform it into a new dataset with either less or more dimensions. Famous transformations are Principal Component Analysis, Independent Component Analysis, or Deep Learning methods like Autoencoders or Convolutional Neural Nets.

Principal Component Analysis is a dimension reduction algorithm that takes numerical data and turns it into a matrix factorization of the same. The idea is to take a linear component space and determine the most important vectors out of it. An excellent visualization is something called Eigenfaces which takes human faces and turns it into a sort of average looking face.

Independent Component Analysis (ICA) unlike Principal component analysis (PCA) aims to take out independent components like “nose,” “ears.” So instead of averaging out all the features, it would detect very specific features. ICA can be useful for reducing noise in data.

Lastly, we have the frontier which is Deep Learning. Autoencoders and Convolutional Neural Nets aim to take something like an image and determine new features off of it. Autoencoders are trying to find a compact version of the original information whereas convolutional neural networks will aim to create new data points.

The Packages

In general, there are quite a few tools at our disposal for unsupervised learning either through scikit-learn, Turi, or Tensorflow.

Algorithm Package available in
K-Means Clustering Scikit-learn, Turi
EM-Clustering Scikit-learn, Turi
Principal Component Analysis scikit-learn
Independent component Analysis scikit-learn
Autoencoders tensorflow
Convolutional Neural Nets tensorflow

Reinforcement Learning: Win the game

Lastly, my favorite category is reinforcement learning. This category is about winning over time instead of at a particular moment in time. Think of this class of algorithms as not worrying about losing the battle if it wins the war.
The Goal

Reinforcement learning is entirely different from supervised and unsupervised learning. Instead of trying to map a function to some data at a particular point in time instead reinforcement learning algorithms work to maximize some long term reward.

So, for instance, think of a game like Chess. You want to win. Reinforcement Learning attempts to take into consideration the actions you can take as well as the state you are currently in to determine the best move or policy.
The Algorithms
Reinforcement Learning is a bit newer than supervised learning and unsupervised learning and as such doesn’t have nearly as many algorithms but there are still some highly useful algorithms like these:

  • Q-Learning
  • TD-Lambda
  • Multi armed bandits

Each of these algorithms is intriguing in their right and could easily constitute an article a piece. I highly recommend you read up on the Sutton book about reinforcement learning as well as (link to python code).

Q-Learning

Q-Learning which can sometimes be called Value Iteration or Policy Iteration is attempting to solve something called the Bellman equations. In the 50s Bellman was studying optimal control theory and wanted to maximize some long term discounted reward. He came up with an equation called the Bellman equation which was a recursive function. Q-Learning takes that a bit further to solve for a particular value Q (which is quality). The higher the Q value, the more likely you are to have a good policy for playing whatever game it is.

TD Lambda

TD Lambda or Temporal Difference learning can be explained using weather. While you could try and predict the weather based on years of weather data the more practical solution is to look at the last weeks worth of data. TD Lambda takes this further by regressing on data points in a weighted fashion towards what is newer. So today’s weather is mostly influenced by what the weather was yesterday

Multi-armed Bandits

Finally, multi-armed bandits or n-armed bandits is an algorithm highly useful for things like A/B tests. Imagine you are running an A/B test and want to determine the winner. Naively we all think about splitting the traffic 50/50 between A and B. But that, unfortunately, doesn’t take into consideration that perhaps A is better than B. What multi-armed bandits do is split the traffic using the information it’s collected. It does this by trading off exploitation with exploration and eventually comes to a good enough answer.

Together these algorithms are the new frontier with things like AlphaGo, and Deep Learning taking off. There aren’t a lot of open source packages for reinforcement learning yet, but I hope that will change soon.

The Packages

While you can’t find a lot of packages out there off the shelf for programming reinforcement learning algorithms, there is the OpenAI gym which serves as a way to test out different reinforcement learning algorithms. I do recommend checking out the great work by Shangtong Zhang who converted Sutton’s original book into Python examples (https://github.com/ShangtongZhang/reinforcement-learning-an-introduction)

Algorithm Package
Q-Learning No packages
TD-Lambda No packages
Multi-armed bandits slots

 

How it all relates

Together all these algorithms come together to make the space of Machine Learning. We’ve talked about Supervised Learning, Unsupervised Learning, and Reinforcement Learning. As well as how they relate to Python the language and packages usable for them.

Machine Learning is a fascinating subject that has all kinds of applications: from classifying spam to playing Chess. As you’ve seen in this article it is really suited for deriving insight out of data. The added benefit is that through the use of Python it is quite simple to implement these algorithms either by using packages like Scikit-learn or Tensorflow.

If you want to read some books on machine learning I recommend these to check out: Thoughtful Machine Learning with Python, Machine Learning by Peter Flach, and Python Machine Learning. Of course you can take the Coursera Course from Andrew Ng as well

I hope you have followed along with the examples and have found something useful out of them. I highly recommend you check out one of the books on machine learning that is out there for a more in depth approach.

If you found this article interesting, check out my e-mail list that talks about starting a successful machine learning project https://matthewkirk.com/?ml or follow me on Twitter (https://www.twitter.com/mjkirk). I’d love to hear from you.

September 14, 2017

Leave a Reply

avatar

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of
© HAKIN9 MEDIA SP. Z O.O. SP. K. 2013