What can LLMs do?

I wrote this up a while ago for a friend but I decided to update it and post it here.

LLMs are capable of doing a lot of interesting and surprising things. They also sometimes fail in surprising ways.

Most things I read online about LLMs are either people trying to say either:

“Well they aren’t really as clever as people think. “

or

“OMG, We are going to hit the singularity any day now.”

I am not smart enough to know how smart LLMs are. I don’t even know really what that question means. I think can try to reason about what kinds of tasks they can actually do.

So what can they do?

There’s lots of ways to potentially break this down but the following is my list of things which they seem to be really good at.

  • General knowledge queries – For topics which are well understood and not very controversial they can answer basic to almost expert level questions about a variety of topics in their training set. In my experience if it’s a pretty well understood and well covered topic and you don’t push too hard, hallucinations are rare and the overall quality of answers is very good. This is pretty scary for a general search engine like google. It’s not perfect of course but good enough for lots of things.
  • Text Transformation – Rewrite this text in french or in the voice of a pirate etc. They have learned various high level representations of text that enable them to manipulate words very effectively even better than most people.
  • Text Formalization/Coding –  This is in a sense a special form of text transformation. Because LLMs are trained on large amounts of formal structured information and computer code, they are able to do a very nice job of writing and editing formal systems such as code or configuration files. There’s a lot of excitement here about eliminating the need for software developers etc. I’m not so convinced about that. Maybe I’ll write more about that some other time. Still, they definitely can perform or assist in tasks that are at least some of what developers do. Note: For now I am keeping the idea of Tool Use under this category but it might make sense as its own category. I need to think about it some more.
  • Text Summarization – Another strong ability is taking a larger text and summarizing into to smaller one. While this results look great it is kind of hard to verify that an LLM won’t sometimes miss a key point. Good prompting is key here but even lazy attempts do ok.
  • Text Expansion – Given a short description of something, expand this out into a longer document. I have done some experiments with systems like taking notes from a medical provider and expanding this into a full report. The results are really good but it can be hard to get the LLM to expand things in the way you want.
  • Use Previous Results – Its worth considering separately the fact that LLMs are able to look at the whole context of a session to come up with the next results. This hack of asking it to “think step by step” make it essentially write out the steps and then it can work through them. I am not sure I would describe this as “planning”. It does construct a plan but at least in my mind there’s a difference between planning and creating a thing which looks like a plan. It could be that I have a negative bias from working with lousy project managers in the past. Still this ability to make notes and then use them later is some kind of “Memory” or state that is part of the abilities of these systems.

LLMs + Images

The stuff above is just for text mode LLMs. When you add in images then the list of things becomes more complicated. I am still poking the boundaries and trying to understand better, but it at least includes:

OCR Text from Images – Maybe its overkill for this task but in my Experience GPT4V does a great job of extracting text from an image. It seems to always get this right and it makes it work nicely with the “Texty” part of GPT. It easily directly answers questions about the text in an image

Scene Description – Give it a photo and it does a good job of saying what’s in it. Seems comparable at least to the average alt-text for an image and while I sure its possible to trick it, there’s definitely some real utility here.

Object Identification – Tell me what are the specific objects in the scene. Although I have had some mixed results. I am in general impressed with what it can do here.

Yeah, a lot more testing is needed to see what’s possible with images but maybe someday there will be a photos app where I can actually find the photos I want.

Utility

What I’ve tried to do here is focus on the abilities of LLMs without getting into a debate about intelligence; to me the interesting questions are really about utility anyway. What can we really use these tools for? I’ll try to update the items above as I learn more.

In any event, as I have written before, I think they will find broad usage as assistants to human workers because they provide real utility w/o much effort and I think this will only improve over time.

The part that is tricker is to understand what kind of systems these tools can be embedded in where we just accept the response without any human review of the result. Here I think its tricker and not just because of randomness or hallucinations, but I will save that for a future post.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *