I’ve decided to use the free time I have to learn more about AI. I had taken the Andrew Ng ML course a long time ago so I know what some of the words mean but I wanted to also have a better understanding of the things people are actually doing now.

First, I wanted to at least have some understanding of what people are doing with neural nets and how they work. I found the Neural Networks: Zero to Hero series by Andrey Karpathy to be really good. The first and last videos are the best but you need to decide how much of the stuff in the middle you need to be able to understand the last one.

Micrograd is a cool simple intro to Neural Net Programming

In addition to the low level bottom up approach, I’ve also been looking at some other documents and projects to see what kinds of things people are doing with chatGPT and other LLMs.

Some good resources I have found here are:

  • The GPT 4 technical report which is short on details but is a good overview that helps you get past a lot of the announcement hype.
  • The alpaca blog post which gives a good discussion about base models and fine turning.
  • The documentation for LangChain a system which among other things helps you build agents which can issue complex chained prompts
  • And yes there’s a lot of youtube videos out there about all of this stuff.

The mental model I am working with looks something like this:

At the lowest level are the large LLMs like GPT3 or LLaMA which are trained to predict the next word by training on a large corpus of text. These models are very large and expensive to train. Out of the box they don’t really make great chat systems.

At the next level up, ChatGPT was basically created by fine tuning a LLM to respond to a specific kind of input (questions) and rewarded for giving “good” answers. This fine tuning step is much more tractable and we are seeing more experimentation here including alpaca.

Above this is prompt engineering which involves two parts.

  • Experimentation with what sorts of prompts get the best outcome. This can involve pre-pending the query with an example of what sort of response you want followed by the specific query you have.
  • Grounding – Adding additional information into the prompt based on information from other sources. This could be from a database of user data or even a service that will pull in other information from the internet

At the top level is something like an agent which can use multiple prompts to achieve a final result.

I’m starting to get a clearer picture of all of these and am trying to think of some good projects that might be fun and tractable. Some early experiments with just asking chatGPT to decide which tags should apply to which documents gave some promising results.



, ,




Leave a Reply

Your email address will not be published. Required fields are marked *