Explaining What’s up with AI (Part 1 – What is Machine Learning?)

A friend of mine asked me to explain to him what’s going on with all this AI stuff. What should they know about it? etc.

This person is very smart but not technical in all the ways we computer people like to use that word so I wanted to try to create an explanation which requires almost 0 math or knowledge about computer systems. I’m not trying to turn them into an ML coder but just give them an intuitive understanding of what is going on in the field.

I’m going to blur and skip some details so this won’t be for everyone. Again the goal is intuitive understanding rather than accuracy.

Let’s get going:

Machine Learning

Before trying to really understand what are the latest developments in AI, it is important to start with just the concept of Machine Learning.

Computers are good at “computing” things they are told to compute. People with specialized skills write computer programs that consist of steps for the computer to take. We call this sequence of steps an algorithm. These algorithms can transform inputs (like the prices of all the things you bought and your bank account balance) into outputs (the final resolved balance). Computers are very fast and usually reliable, so we are pretty confident that it’s going to give the right answers.

This kind of software has transformed almost every part of commerce and even our social relationships. We have continued to expand the scale and scope of these systems. As a result, the systems have become very large and complex, in some cases so complex that it is difficult for one human to understand all the parts of the system, but at least you can take it apart bit by bit and examine each thing individually and over time be able to reason about its behavior.

During this same time, computer scientists realized that this sort of system has its limits. It works best when the rules for what you want it to do are really well understood and specific, but not everything is like that.

What if you want the computer to write an essay or paint a landscape? How do you program this?

Let’s begin with something even simpler.

If you want to make a computer program that can tell you if a photo has a cat in it, then its not clear how you would proceed. What would you ask the computer to do? How do you tell it how to identify a cat vs a dog? What about a cat with only 3 legs? What about a stuffed animal or a cartoon cat? etc. People tried to make systems based on a bunch of rules around what they think makes something be a cat. This didn’t work out very well. We couldn’t always determine what the rules should even be. We didn’t have a good definition of Cat-ness that we could program into the computer.

The answer turned out to be something called machine learning. The basic concept of machine learning is pretty simple. You don’t start with a set of steps for determining if there is a cat in a photo. What you start with is a lot of photos where you have them labeled ahead of time as Cat or NoCat depending on whether there is a cat in the photo. All of this input (the photos and the labels) is used by the computer to learn over time how to tell which photos have cats and which do not. We still use an algorithm (or set of steps) to guide this learning but here the algorithm isn’t trying to compute the answer directly. Instead it is going through a set of steps to build something which can predict the answer. The algorithm processes this set of training images and constructs what we call a model that detects cats. This model is typically a combination of data and computer code that we say has been trained to predict something. We’ll explain this more in the next post but, once the model has been trained, we can then evaluate it by giving it a new photo and having it predict whether or not there is a cat in the image.

The more images you have in your training set, the better the model will be. If you only had 100 images (and maybe 30 with cats), it’s possible the model will only be a little better than random at guessing if there is a cat in an image. If you have 100,000 images, it’s going to likely be pretty good.

When people train models they usually keep the input data in at least two sets. The first set, which is usually larger, is called the training set, only this data is used to train the model behavior. The second set is generally referred to as the evaluation set. The evaluation set is used to check how good the model is at making predictions for things it has never seen before. Maybe our model correctly predicts there is a cat in the images of the evaluation set 86% of the time. This evaluation is important to know that it hasn’t just memorized the inputs that it has seen, it indicates that its learned something about cats that can generalize since those images weren’t in the training set. We’ll see more about this in the next post.

Some Key Points to take away

  • Machine Learning refers to a process where we ask a computer system to learn to make predictions about data by giving it examples of what we want.
  • More data along with high quality labeling of that data gives more accurate results in these predictions.
  • A well trained model does a good job of predicting the output even on inputs it has never seen before. It will never be perfect at this but then neither are people.
  • Training the model requires a lot of computer resources and can be very expensive for large models based on large datasets
  • Using the model to predict a result is way less expensive than creating the model in the first place.

The next post will focus on a particular type of machine learning system. Neural networks.






Leave a Reply

Your email address will not be published. Required fields are marked *