Monday, October 8, 2018

What are Machine Learning Algorithms?

In this post you will know some the mostly used Machine Learning Algorithms. If I had to choose one, I'd say my favorite algorithm is the Ensemble, which I consider my own “Master Algorithm”. Whatever algorithm you start from, you can always use an ensemble to improve it. Ensembles won the Netflix Prize and routinely show their great performance, but they are also relatively easy to understand, optimize, and inspect.  My Second Algorithm is  Logistic Regression(LR )is a very simple but efficient and flexible algorithm that can be used for many applications, notably classification, but also ranking.

The following are some of the commonly used ML Algorithms:


  1. Back-propagation algorithm - a relatively simple idea that made neural networks useful.
  2. Dropout - if we can consider it an algorithm (I like to think about it as a useful idea to use on neural networks). It makes you create ensemble classifiers without making many models.
  3. PCA - mathematics at its best (or thereabouts). Very useful a lot of times, although probably a bit overused.
  4. Perceptrons - just to study the convergence theorem, a thing of beauty.
  5. Linear regression - the mathematics behind how the cost function comes is so nice.
  6. Logistic regression - well, we want a simple but powerful model to do classification.
  7. KNN - why not? Dumb, but very useful.
  8. K-means - one of the most famous algorithms ever. There are so many much more sophisticated clustering algorithms, and k-means is still as useful as it ever was.
  9. Replicator dynamics - because game theory is interesting and fun. And because it can be used as a building tool in doing a lot of completely different things, like graph matching or clustering (dominant sets).
  10. Szemeredi’s regularity lemma - not a machine learning algorithms, but something that can be used on machine learning (and in any graph problems when we want to reduce them). And more importantly, because I worked on it in my master thesis, and did my first ever paper on it.
  11. Support Vector Machines - because the maths there is stunning. And because all those hours of learning function analysis, are finally put to work.
  12. Regularization - because nothing works without it.



Most elegant: The Perceptron algorithm. Developed back in the 50s by Rosenblatt and colleagues, this extremely simple algorithm can be viewed as the foundation for some of the most successful classifiers today, including suport vector machines and logistic regression, solved using stochastic gradient descent. The convergence proof for the Perceptron algorithm is one of the most elegant pieces of math I’ve seen in ML.  

Most useful: Boosting, especially boosted decision trees. This intuitive approach allows you to build highly accurate ML models, by combining many simple ones. Boosting is one of the most practical methods in ML, it’s widely used in industry, can handle a wide variety of data types, and can be implemented at scale. I recommend checking out XGBoost for really scalable implementation of boosted trees. Boosting also lends itselft to very elegant proofs.

Biggest comeback: Convolutional neural network deep learning This type of neural network has been around since the early 80s. Although there was a decline in interest in them from the late nineties to late 2000s, they have seen an amazing comeback in the last 5 years. In particular, convolutional neural networks form the core of the deep learning models that have been having a huge impact, especially in computer vision and speech recognition. 

Most beautiful algorithm: Dynamic programming (e.g., Viterbi, forward-backward, variable elimination & belief propagation algorithms). Dynamic programming is one of the most elegant algorithmic techniques in computer science, since it allows you to search through an exponentially-large space to find the optional solution. This idea has been applied in various ways in ML, especially for graphical models, such as hidden Markov models, Bayesian networks and Markov networks. 

Unbeatable baseline: Nearest-neighbor algorithm. Often, when you are trying to write a paper, you want to show that “your cuve is better than my curve”. :) One way to do that is to introduce a baseline approach, and show that your method is more accurate. Well… nearest-neighbor is the simplest baseline to implement, so often folks will try first, thinking they’ll easily beat it and show their method is awesome. To their surprise, nearest-neighbor can be extremely hard to beat! In fact, if you have enough data, nearest neighbor is extremely powerful! And, this method is really useful in practice.

No comments:

Post a Comment

High Paying Jobs after Learning Python

Everyone knows Python is one of the most demand Programming Language. It is a computer programming language to build web applications and sc...