LLMs w/o Supervision

It’s pretty amazing what these LLMs can do in a role as an assistant to a human user. As long as a human is there to check the results, we will see more and more places where LLMs and systems built on top of them will improve the productivity of users.

What’s harder to think about is what are the applications where the LLM can just be trusted to get the right answer. Right now I think these are pretty limited. This old tweet is still from 2014 is still true:

I’ve seen some systems where we ask the LLM itself to check its work or evaluate how it did as a way to avoid errors. I think these systems might be able to push the 87.56% above up a few more points, but its still hard to feel good about turning control over to an LLM inside of an embedded process where the outcomes matter.

It goes beyond just making sure there is someone looking for mistakes. There’s another thing which makes these LLMs shine and helps us see them as “intelligent” that wouldn’t be true if we remove the humans from the loop. Most of the time I see an LLM do something really interesting, its doing this as part of a back and forth with a user or at the very least the user started with the right question. This coaching is hard to make scalable and/or repeatable.

Maybe you could use this tech w/o human intervention in cases where something is better than nothing. For example, GPT4 does a good job of describing images so maybe it could be used to generate alt-text when there isn’t any available. I think I’m the wrong person to evaluate whether this is an overall win, but I can definitely see why people could think so.

In the meantime I’m still looking for other good examples.






Leave a Reply

Your email address will not be published. Required fields are marked *