Machine Learning: Models with Learned Parameters

antirez · on Sept 22, 2016

I strongly advise everybody with one day free (and not much better things to do) to implement a basic fully connected feedforward neural network (the classical stuff, basically), and give it a try against the MNIST handwritten digits database. It's a relatively simple project that learns you the basic. On top of that to understand the more complex stuff becomes more approachable. To me this is the parallel task to implement a basic interpreter in order to understand how higher level languages and compilers work. You don't need to write compilers normally, as you don't need to write your own AI stack, but it's the only path to fully understand the basics.

You'll see it learning to recognize the digits, you can print the digits that it misses and you'll see they are actually harder even for humans sometimes, or sometimes you'll see why it can't understand the digit while it's trivial for you (for instance it's an 8 but just the lower circle is so small).

Also back propagation is an algorithm which is simple to develop an intuition about. Even if you forget the details N years later the idea is one of the stuff you'll never forget.

snippyhollow · on Sept 22, 2016

I wrote this quite some time ago exactly to hold the hands of neophytes who want to do such endeavour. http://nbviewer.jupyter.org/github/syhw/DL4H/blob/master/fro...

Omnipresent · on Sept 22, 2016

Is there a part 2 to this?

snippyhollow · on Sept 22, 2016

Part 2 was never finished but supposed to be text around this http://nbviewer.jupyter.org/github/syhw/DL4H/blob/master/DNN...

namelezz · on Sept 22, 2016

Did that once and realized the hard part was how you tested the correctness of your NN implementation.

toxik · on Sept 22, 2016

Numeric differentiation, huzzah!

hood_syntax · on Sept 22, 2016

I fully agree, in the AI class I took, the first two projects we implemented feedforward neural networks with backpropagation. The math may take some time to understand, but the concept is very natural. Definitely my favorite part of that class.

sandGorgon · on Sept 22, 2016

Quick question - what, modern day tutorial/mooc would you recommend for this ?

ahmedbaracat · on Sept 22, 2016

I would highly recommend Andrej Karpathy's Hacker's guide to Neural Networks. A step by step guide with little math to cover the basics of NN. http://karpathy.github.io/neuralnets/

antirez · on Sept 22, 2016

Wow, that surely is thought for programmers.

Uehreka · on Sept 22, 2016

neuralnetworksanddeeplearning.com is a free online book by Michael Nielsen. I'm reading it right now, and it's wonderful. In the first chapter, he walks you through implementing a basic neural network in Python that works with the MNIST data and uses no libraries (except numpy). He's really good at explaining a lot of the math for a more programmer-ish audience that might not have studied multivariate calculus.

antirez · on Sept 22, 2016

Not sure, I think it's better to use older documentation which is not so centered on RNNs and Convolutional networks. The old docs very well describe a basic net with Backpropagation.

nkozyra · on Sept 22, 2016

This is well-written and I applaud any step toward demystifying the sometimes scary sounding concepts that drive much of the ML algorithms.

Knowing you can pretty quickly whip up a KNN or ANN in a few hundred lines of code or fewer is one of the more eye-opening parts of the delving in. For the most part, supervised learning follows a pretty reliable path and each algorithm obviously varies in approach, but I know I originally thought "deep learning? ugh, sounds abstract and complicated" before realizing it was all just a deep ANN.

Long story short: dig in. It's unlikely to be as complex as you think. And if you've ever had an algorithms class (or worked as a professional software dev) none of it should be too daunting. Your only problem will be keeping up the charade if people around you think ML/AI is some sort of magic.

arrmn · on Sept 22, 2016

I was also thinking ML is a dark art and only a few selected ones are able to understand it, but then I started to look more into it took the coursera ML course and now I'm working trough some other lectures and it's easier than I imagined. Like you said, you just need to start working with it.

enraged_camel · on Sept 22, 2016

Excuse my ignorance but what do KNN/ANN stand for?

nkozyra · on Sept 22, 2016

As mentioned, k-nearest neighbor, a x-dimensional similarity search algorithm (you can write this in python in like 25 lines, honest to god).

ANN, artificial neural network, which is what things like tensorflow were borne. Neural networks were kind of passe before they were revived by "deep learning"; their essentially a self-correcting, self-referencing set of logic gates.

qiemem · on Sept 22, 2016

k-nearest neighbors (the k is just the number of nearby data points you consider when classifying a point), a clustering algorithm, and artificial neutral network

dev1n · on Sept 22, 2016

k-nearest neighbor vs artificial neural network.

djkust · on Sept 22, 2016

Hi folks, authors here in case you have questions.

This is actually part 3 in a series. For developers who are still getting oriented around machine learning, you might enjoy the first two articles, too. Part 1 shows how the machine learning process is fundamentally the same as the scientific thinking process. Part 2 explains why MNIST is a good benchmark task. Future parts will show how to extend the simple model into the more sophisticated stuff we see in research papers.

We intend to continue until as long as there are useful things to show & tell. If there are particular topics you'd like to see sooner than later, please leave a note!

yazr · on Sept 22, 2016

Can you do anything on RL?! I feel this is a huge area since so many projects start with no enough labeled data and can be gradually improved

djkust · on Sept 22, 2016

Reinforcement learning is definitely on the docket! But we probably need to cover more fundamental topics like optimizers, loss functions, distributed representations, attention, and maybe memory states first. RL seems to have all the things ;)

yodsanklai · on Sept 22, 2016

I took Andrew NG's ML class on Coursera. It was certainly interesting to see how ML works but I'm not sure what to do with this. Particularly, I'm still unsure how to tell beforehand if a problem is too complex to be considered, how much data it'll require, what computing power is needed.

Are there a lot of problems that fall between the very hard and the very easy ones? and for which enough data can be found?

gm-conspiracy · on Sept 22, 2016

You should take a look at Kaggle.com.

There are lots of interesting problems, but what really was enlightening to me, was to look at the solved contests, and see what the winning solution implemented.

http://blog.kaggle.com/category/winners-interviews/

yodsanklai · on Sept 22, 2016

Thanks, exactly what I was looking for.

throwaway13048u · on Sept 22, 2016

So this may be a place as good as any -- I've got a decent math background, and am self teaching myself ML while waiting for work to come in.

I'm working on undertstanding CNNs, and I can't seem to find the answer (read: don't know what terms to look for) that explain how you train the convolutional weights.

For instance, a blur might be

[[ 0 0.125 0 ] , [ 0.125 0.5 0.125 ] , [0 0.125 0]]

But in practice, I assume you would want to have these actual weights themselves trained, no?

But, in CNNs, the same convolutional step is executed on the entire input to the convolutional step, you just move around where you take your "inputs".

How do you do the training, then? Do you just do backprop on each variable of the convolution stem from its output, with a really small learning rate, then repeat after shifting over to the next output?

Sorry if this seems like a poorly thought out question, I'm definitely not phrasing this perfectly.

ronald_raygun · on Sept 22, 2016

It turns out a convolution can be thought of like a special type of matrix https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convo... You could imagine do the backprop through the matrix multiply, and you are going to get several terms with the same coefficent, since the diagonals are the same - and you could just sum up all the gradients from the relevant terms and use that.

throwaway13048u · on Sept 23, 2016

Huh, this almost got it for me after ten seconds of looking at it. Still have questions about the nitty gritty, but huge matrix multiplication clearly isn't how this is done in production.

augustt · on Sept 22, 2016

That's right, you basically want to add the contributions of each gradient of the output volume to get the full picture of how a weight affected the entire output.

In practice, this can implemented as a convolution of the gradients for one feature map on an input feature map: https://github.com/augustt198/ml/blob/master/ml/nn.py#L201

djkust · on Sept 22, 2016

Yup, the filters are learned. Here are two great resources for convolutional neural networks:

https://www.youtube.com/watch?v=bEUX_56Lojc

http://cs231n.github.io/convolutional-networks/

aantix · on Sept 22, 2016

There's been a couple of times where I needed to classify a large set of web pages and used a Bayes classifier.

I would start to get misclassified pages and it was so difficult to diagnose as to why these misclassifications were occurring. Bad examples? Bad counter examples? Wrong algorithm for the job? Ugh.

I ended up writing a set of rules. It wasn't fancy but at the end of the day, I understood the exact criteria for each classification and they were easily adjustable.

hood_syntax · on Sept 22, 2016

The best thing to do is compare your results to a widely known dataset with information about the accuracy rate for what you implemented. That will give you a baseline to figure out what, if anything, is going wrong.

> Wrong algorithm?

Almost certainly not, Bayes is a very solid classification technique.