This installment of our explainer series centers around a technique that is nearly synonymous with modern day AI: Deep Learning.
00:00 – Intro
00:38 – Level 1: Kindergartener
01:42 – Level 2: Teenager
02:56 – Level 3: Non-expert adult
06:00 – Level 4: Computer science major
08:41 – Level 5: Machine learning expert
Contact us to learn how we can help your team build better models with less data. We’d love to show you how it works!
A 5 year old
I’m sure you’ve seen movies with robots and you might have wondered to yourself how is it that they’re so smart when the robots in your house that is your appliances and your toys are just not as smart.
Well, the good news is that scientists are working really hard to make them smarter and smarter and it turns out the best way to do that is to make them more human-like that is to give them human intelligence.
How do we do that? Well, we use something called Deep Learning to try and model the human brain. Now, the bad news is that doctors don’t even understand in this day and age how the human brain works, why it makes the decisions that it does.
This becomes an extremely challenging problem for us and their inputs are not as helpful, because we still don’t understand how most of the things in the mind works.
Our brains are very great at processing information, recognizing patterns, whether it’s trying to predict the next word of a sentence when someone is speaking or whether it’s trying to detect an object from afar. Our brains simply just understand these patterns and recognize these objects.
At a very high level, the human brain is composed of neurons stacked neurons and these neurons allow us to recognize simpler patterns to more complex ones. You could take, for example, a jigsaw puzzle. Let’s say, you don’t have the box the jigsaw puzzle, but then you start to build the puzzle and you recognize that there’s some patterns that you recognize.
There’s a crowd of people and then from the crowd of people you recognize that they’re at a park. And then, that’s exactly how the human brain works you take simpler patterns to more complex ones and you build context off of that.
This is what motivates Machine Learning scientists to study the human brain and emulate the human brain using neural networks and like I mentioned before, the human brain is composed of neurons, stacked neurons. And what makes Deep Learning deep is the fact that you have many of these stacked neurons.
A non-expert adult
One of the things that I personally find to be the most interesting about Deep Learning is that the mathematical framework behind it has been around for a really long time now. In fact, it’s been around for more than 50 years.
So how come they became so popular just recently? Well, the thing is Deep Learning is actually a ridiculously compute greedy and data greedy technology, so it’s only when we suddenly started having the proper computers to collect sufficiently enough data or to train these models that it became possible for data scientists to use them in practice.
And what differentiates those Deep Learning models compared to other Machine Learning models is mostly the fact that they don’t attempt to make sense of an entire record like an entire image all at once. Instead, they have this interesting layered architecture that enables them to progressively convert raw information into something more sophisticated.
For example, when a Deep Learning model try to make sense of image data, it would first attempt to combine several pixels together into an edge and then in the next layer it will try to combine several edges together into a small object. And then, later on, combine several small objects into more sophisticated objects until it can finally make sense of the entire image and the entire scene. This has enabled us to do things we could never do before: for example, in the context of natural language processing the task of trying to make sense of human language until the advent of Deep Learning data scientists did not have a lot of options
In fact, most of them were using something called bag of words and this approach was basically about trying to count the number of occurrences of specific terms or specific words in a sentence, a paragraph or a document to try to make sense of its topic.
And as you can imagine this approach was making it extremely hard to differentiate a sentence such as this bagel is so bad and another sentence such as: “I want a bagel so bad!”, because the wording is so similar. But today, with Deep Learning, we can differentiate these two sentences with no problem at all.
In fact, a Deep Learning model would be able to tell you that these sentences are almost polar opposites of one another, so this is why data scientists are so excited about Deep Learning and they’re just barely starting to scroll to scratch the surface of what can be done with Deep Learning.
A CS Student
If you’ve taken a deep learning class, you probably realize that image classification and natural language processing are two fields in which Deep Learning really has a tremendous impact in the field of image classification. You’ve probably heard of the AlexNet convolutional neural network. It’s one of the canonical examples that represents one of the first times where convolutional deep and dense layers as well as pooling layers were used in combination to predict what the label of an image is, making use of the natural properties of images we see in the real world.
However, AlexNet is by no means a silver bullet to solve all of our image classification problems. It really was just a starting point; from there, we’ve seen recent progress in this field of image classification with the advent of new convolutional neural networks and related networks that are inspired by the fields of biophysics computer graphics and even human perception cognition.
This similar process has also occurred in the field of natural language processing. Take, for example, the RNN (or recurrent neural network): this neural network is one of the first to understand that there is this time invariant property of language where words in certain contexts that appear at one time point will occur in the similar context later. They use this property to predict what language will be spoken in the future. This property has been taken and added on to by several other models including the LSTM and the transformer networks, both which use context in different ways mirroring the way in which we believe humans to perceive and process language. But, by no means should we trivialize the amount of work that has been gone into making these models work and deployable in the real world there’s been tremendous amount of interdisciplinary research that goes into really figuring out what parts of other sciences and studies should be used along with Deep Learning machinery to make the most effective model.
There’re also many hyper parameters such as the number of layers in the neural network. The Deep Learning network and then the types of layers that are used and even the connectivity of these layers that really has to be thought about in order to create a model that has high accuracy and is trained efficiently. When we come and we want to trade our train our own models, it’s important to really think about the architecture and the different ways in which the individual components come together to create the larger model, because if we don’t do so there might be biases that our model learns that we would be unaware of so if we can’t really think of these components individually and holistically as a larger part of our model it might be best to just start with a simpler model and see where we can go from there.
A ML Expert
As a ML practitioner, I’m sure you’re familiar with the concept of Deep Learning. I’m sure you know that Deep Learning is something that scientists designed to mimic the cognitive ability of the human brain. From its inception in 1958 as a perceptron by Frank Rosenweld, neural networks have seen significant amounts of progress over the years. Neural networks have taken giant strides in terms of like architectures, loss functions or even like applicability of neural networks for different tasks.
You would have seen neural networks being used to classifying single objects in a single image to detecting or localizing multiple objects in the same image. You would have also seen neural network architectures grow from deep fully connected neural networks to single stage detectors, two stage detectors encoder decoded kinds of architectures and so on. You would have also seen different types of loss functions being applied for different sets of tasks.
That being said, as an ml practitioner, I’m sure you would have faced your own share of difficulties when applying some of the state-of-the-art models for your own data sets. I’m sure when you apply some of the state-of-the-art models to your own data set, even after doing some hyper parameter optimization you would have still end up with sub-optimal results on your data set. The reason being that some of the state-of-the-art models have some inherent bias towards the data set that they were built on. A lot of times, you would have also faced difficulties in getting sub-optimal performances, even though you’re training with a similar data set. But your data set might just be out of distribution and that’s why you’re ending up with sub-optimal results on your data set.
Check out the link below to know more about some of the biases in neural networks for certain data sets. That being said there as an ml a researcher myself, I’m particularly excited about the progress that is being made in this space as of now. But there are certainly a huge set of problems that need to be solved as soon as possible. Some of the problems would include explainability, for instance, is one sort of problem that you know we would want to address. We’d want to build like interpretable models, so that the models get better and we’re able to diagnose the weaknesses of the models also extending research that you see on paper or applying research on your own data set is also a problem because a lot of us don’t really know how to better the research that is being mentioned in the paper.
As such, significant time spent on this would help us build even more better models. I’m sure as a machine learning researcher yourself, you would know how a gradient descent algorithm works, but as long as you don’t know which loss function you should use for your own use case, you wouldn’t know how your model works as such or how your model is going to work better with a better loss function assets. So, I guess like a lot of research on understanding how or applying different loss functions or when to apply different loss functions would also help in building better models yourself.