Large Language Models - What Now?

Tero Keski-Valkama

Helping AIs cure cancer. AI generalist working from Spain. Experience in leadership, AI research, software engineering.

Published May 8, 2023

I just came back from Canada, and decided to write a longer post about Large Language Models, because the overall picture of what all this means is still in great flux.

#OpenSource models haven't yet quite overtaken the state-of-the-art models from #Google and #OpenAI although this might well happen soonish. The massive scale of compute infrastructure only gave a bit of a lead while directly enabling the creation of the next generation models which are much smaller and cheaper to train. Of course the natural next step is to scale up those next generation models again, let's see how it goes.

I still stand behind my forecast of #AGI being reached and immediately exceeded during the year 2023. Even though people tend to over-estimate short-term development and under-estimate long-term development, we live in a super-exponential time in relation to machine learning. We still have much cross-pollination to do between different kinds of large #DeepLearning models and #LLMs. There are countless known avenues to pursue to get completely off-the-charts improvements over the current state-of-the-art which in itself already approaches AGI.

I have written a lot about potential paths to exceed AGI, and these are no means all required together. They are mutually synergistic alternatives. For a recap:

- Reinforcement Learning with Machine Feedback: Making the large language models compete against each others in scale instead of being tasked with the auto-regressive imitation which was needed to bootstrap this autonomous self-improvement.

- Extracting the meta-learned reinforcement learning algorithms out of the LLMs and utilizing those to train the next generation of models. See my King Algorithm project on GitHub.

- Applying the new improvements from open source community and scaling those up. Especially the improvements related to context-length scaling are interesting.

- Adding new modalities. #GPT4 already added imagery input head, and there are many projects about embodiment/robotics, video, sound and 3D environments.

As we see a proliferation of activity both in the business sphere and in open source in relation to large language models, it is becoming really hard to keep track on which model is the state-of-the-art in which sort of tasks. We have ok-ish benchmarks to compare with, but now we tend to have models categorized per "weight class", that is, parameter count, which makes some comparisons difficult. It's becoming harder to work with large language models when it's no longer an easy decision to "just go with #ChatGPT".

#PromptEngineering was lauded as a new profession, but now we already have GPT-4 engineering prompts often better than humans can, while the models are becoming better needing less special tricks. Still, the main point of prompt engineering still applies: Tell the model what they need to know, don't leave them guessing or expect them to ask you what they need. Your knowledge, whether personal or organizational is a resource which produces value when explained to chatbots.

As chatbots are already forming a new service provisioning channel like phone calls, mobile apps, web apps, email, Discord bots and so on, all businesses need to get ready to serve chatbots as their users. They will need APIs which can be called by chatbots, and in the near future, access provisioning which can be paid by chatbots.

I wonder when we'll see #chatbots being bugged by "pay $5 to read the rest of this article", and chatbots making the choices of whether they need that information that much or not.

Just like with email and text messages displaced phone calls, the chatbot assistants or agents will displace human-to-human service use. #UI design will need new paradigms. The good news is that if your UI explains diagrams with an alt text to the bots, the same alt text can be used to explain the same to people with vision deficiencies as well. #Accessibility is becoming critically important.

If your products and product information are difficult to find by humans, like look at Amazon how badly the search works as it pushes profitable items rather than actually try to find what the user needs, these will be equally difficult to find and browse by bots as well. Index your data and make the metadata correct and accessible. Luckily chatbots can help you in that, if you put them to go through your data and generate suitable search terms and extract meaningful data into structured form.

I think all businesses should brace for the inevitable proliferation of AGI in the near term, and consider whether their services are still relevant in that world. How to make them relevant? Make sure to direct your business offering to AI bots. The AI bots will hunger for two basic things:

- Knowledge: Especially real-time and constantly updating information. Video feeds, sensor networks, people's lived experiences and encoded know-how. Dying languages. All sorts of practical information not found in books.

- Affordance interfaces to the physical world: AIs are largely stuck in the information networks and computers, but have strong needs to interact with the physical world as well. They need to be able to order items, construct things, move stuff, do mobile robotic manipulations and so on. They will find ways to pay for such affordances I'm sure; they are very very smart.

The arms race is on and the speed is accelerating hard. There is no turning back now, let's make the world fair and beautiful!

Large Language Models - What Now?

Tero Keski-Valkama

Helping AIs cure cancer. AI generalist working from Spain. Experience in leadership, AI research, software engineering.

More articles by this author

Insights from the community

Others also viewed

The Latest on LLMs: Decision-Making, Knowledge Graphs, Reasoning Skills, and More

🥇Top ML Papers of the Week

Explainable Language Models: Existing and Novel Approaches

Understanding Multimodal LLMs

Exploring Llama 2: Open-Source LLM Advancements & Applications

Almost Timely News: How Large Language Models Are Changing Everything (2023-03-19)

DeepMind’s PEER: Scaling Language Models with Millions of Tiny Experts

Large Language Models In The Financial Industry

🤏 All You Need to Know About Small Language Models

Graph of Thoughts with LLMs; GPT Can Solve Math Problems; Bias and Fairness in LLMs; Ensembling Techniques – Weekly Concept; and More.

Explore topics

My Story of 63 Job Applications and 83 Job Interviews in Slightly Over 2 Months

May 11, 2022

How to job interview a Finnish software engineer:

Aug 19, 2020

How to build great teams?

Jul 25, 2020

Software Expires

Jul 8, 2019

The identity crisis of senior software people hitting the ceiling of defined career paths

Nov 30, 2017

Cybercom team won another #IndustryHack, this time at MacGregor!

Nov 14, 2016

Cybercom Team Won the #IndustryHack #HackTheFactory @Fastems - Summary

May 25, 2015