Skip to main content
Blog

Making Large Language Models Uncool Again

We recently caught up with a true luminary, Jeremy Howard , co-founder of fast.ai, an ex-Chief Scientist at Kaggle, and creator of the ULMFiT approach on which all modern language models are based. You can watch the full Fireside Chat here and check out the blog post below for some highlights. These are not precise quotes from Jeremy, but our understanding and summary of extracts of what he said. We encourage you to watch the clips or the whole video to hear his thoughts first-hand!

We went a lot of places, including

  • What on earth we’ve just witnessed at OpenAI, why it’s important (and why it’s not!),
  • ​Cost-benefit analyses of using closed models (such as Chat-GPT and Claude) and OSS models (such as Llama 2 and Mistral),
  • ​Key concerns around regulatory capture and the future of OSS LLMs,
  • ​What LLM tools and techniques you need to know to future-proof yourself in this rapidly changing landscape,
  • ​How AI education will need to change to make LLMs actually uncool so that many people can actually use them!

So buckle up and enjoy the ride!

What happened at OpenAI recently?

Thoughts on what happened to open AI and what Jeremy now thinks about the future of OpenAI.

OpenAI was originally envisioned as a beacon for creating societally beneficial Artificial General Intelligence, backed by nonprofit ownership, an independent board, and a clear charter. But Jeremy doubted this would last because Microsoft has invested heavily in them, and they've promised substantial financial rewards to their staff.

As more capital flowed in, OpenAI introduced Profit Participation Incentives (PPIs) for staff, akin to stock options. This made Jeremy skeptical about whether they'll be successful in achieving societally beneficial AGI. Jeremy seemed surprised when the nonprofit board stepped in. They decided the current CEO wasn't leading towards beneficial AGI and dismissed them. Their role is to ensure the company's mission aligns with its charter.

For a typical company, the mission is profit maximization. For a nonprofit, it's adhering to its charter. But it's not that simple. Achieving a mission requires resources, money, and manpower. When the board made its move, the staff resisted, anticipating millions in personal gains from an upcoming investment valuing OpenAI at $86 billion.

Microsoft then stepped in, offering to hire the key personnel. This left the board in a difficult position, risking the loss of essential resources. The debate was whether to reinstate the CEO, at least to continue some work, or let OpenAI collapse, which might align more with their charter. This decision wasn't as straightforward as some on social media thought.

Amidst this, the board members' commitment stood out. They had no financial stake and minimal compensation, yet they faced significant challenges. It made Jeremy think that those who dismissed OpenAI's nonprofit charter might be right. Would the board eventually give up, leaving the company to its fate? But they didn't. Their persistence, given the circumstances, was remarkable.

What happens next at OpenAI

Jeremy feels they may end up like another Google or Apple. Which, if you're starting a startup and someone says, 'Oh, you're going to become the next Google,' that would seem like a big success. But if you’re aiming to change the world with societally beneficial AGI and then end up just another big tech company, for many, that’s disappointing. So, it depends on your perspective. Jeremy thinks they'll be very commercially successful. But whether that benefits or harms society is hard to tell. In general, powerful companies tend to do a bit of both.

Democratically elected governments, in theory, try to ensure they benefit society more than harm it. However, the more powerful a company is, the harder it is to control. Looking back at companies in the past, like those plundering resources in the Far East during Victorian times, they definitely harmed the societies they operated in more than they benefited them. So, who knows? Jeremy agrees we all see ChatGPT as a cool product, but feels it's getting worse, which is interesting. For something that's meant to be on an AI-improved, super-linear growth trajectory, seeing a platform actually get worse doesn't support the notion that we're on this amazing growth path.

Which OSS LLMs is Jeremy most excited about?

Starting with the 7B models, Mistral 7B stands out with the best power-to-weight ratio. And it looks like they’re hinting at releasing more models, which would be great. Then, in the 14B category, Qwen is the clear leader. The 20B model InternLM is underrated, and the 34B level is unfortunately a bit sparse, which is a shame since that's the sweet spot for most people.

This is because a 34B model can be fine-tuned on moderately priced hardware and run on inexpensive hardware, unlike a 70B model which requires high-end, costly servers. With a 34B, you can fine-tune on a single server using four-bit discretization, and run inference on a 24 gig card – that's ideal. But Meta didn't release a Llama model of this size, only the Llama fine-tuned. However, we now have the Yi 34B model, which is excellent. The only issue is its commercial license restrictions, and it seems tough to get one.

Then, at the 70B level, you're essentially at Llama2. Jeremy's focusing on base models here, but all these have their instruction fine-tuned variants, like Teknium's Hermes. The open-source community offers fantastic instruction fine-tuning datasets. Combining them in models like Hermes or Capybara works well.

Beyond instruction fine-tuning, there's also options like DPO. Hugging Face has the Zephyr model, and Intel has the neural chat model. They start with a high-quality base, add top instruction data sets, and apply DPO, which is faster and simpler. Intel's neural net model is a good example of this. It's now much smoother and easier to transform a base model into a high-quality, instruction-tuned, RLHF-capable endpoint.

Jeremy's thoughts on the future of large LLMs vs small, fine-tuned LLMs

Thoughts on the future of these relatively large models versus smaller MLMs that can be fine-tuned.

Jeremy's not quite sure here and keeps changing my mind about this. Theoretically, bigger models should always be better, right? They're more capable in reasoning, have more world knowledge, and offer a stronger foundation. Over the last few years, it seemed like every new foundational model made the previous fine-tuned versions look inadequate. But now, he's wondering if that trend is ending. GPT-4 is starting to reach the limits of what we're willing to pay for and wait for. Maybe we can handle one more magnitude of improvement, but even that's a stretch.

Using bigger models for complex problems makes sense, but they require significantly more time and money. On the other hand, fine-tuning smaller models can be quite effective. Despite this, companies like OpenAI have accumulated a lot of technical debt by scaling up rapidly. Jeremy doesn't think anyone is really focusing on smarter methods right now. So, while costs and latencies might decrease eventually, making something like GPT-5 feasible in terms of price and speed, it's all quite uncertain at the moment.

He also feels that people aren't fine-tuning or pre-training as effectively as they could. The whole process is too complex, expensive, slow, and hacky. It's still really early in the game, and there's a lot to prove.

The risk to foundation models of regulatory capture and the EU AI Act

We’ve recently heard a lot about the US AI Executive Order and the EU AI Act: a rundown of the most important things in this space, particularly with respect to regulatory capture and development of foundation models.

Okay, so the big topic is the EU AI Act. It's significant because it's an actual law with real penalties, and they're substantial. For those not in the loop, the EU AI Act regulates high-risk AI applications. For example, if an AI denies your health insurance, that's high-risk because it could have life-threatening consequences. Overall, it's sensible regulation focusing on human oversight and auditing to prevent Kafkaesque scenarios for EU citizens.

However, last year saw a surge of lobbying from the AI safety nonprofit sector, heavily funded by tech billionaires. Their aim was to extend the act to include foundation models, defined as any model larger than the current largest ones. They succeeded. Earlier this year, a revised version of the act included these foundation models, meaning you'd need government permission to create a model competitive with or better than the current leading ones. This has sparked controversy, appearing as regulatory capture and limiting competition.

France, Germany, Italy, and Spain, or at least three of these, pushed back. They were influenced by their own burgeoning startups, which would be hindered by these new restrictions. The act is meant to protect EU citizens, not prevent dystopian killer robot futures, which is a separate issue. Now the act's future is uncertain, with possibilities ranging from its failure to revert to the high-risk application focus or the new foundation model version passing.

If the foundation model version passes, Jeremy can't see how it benefits EU citizens. It would likely stifle EU companies from developing state-of-the-art models, forcing founders to look to places like the UAE or Singapore for innovation. And even if there's concern about dystopian scenarios, this act won't help. It might drive the industry elsewhere, proving counterproductive.

If the EU persists with the foundation model approach, it implies the need for international cooperation, possibly extending lobbying to the UN. We've seen similar dynamics with the World Intellectual Property Organization (WIPO), which established global requirements aligning with America's copyright regime – a system many criticize as not effectively promoting the arts and sciences but rather being a form of regulatory capture. So, we might see this pattern repeat.

So how DO we make LLMs uncool again, Jeremy?!

What making LLMs uncool again means and how we are going to do it.

Jeremy says it has been challenging. To make neural nets appealing again, we need to consider all variants, including Large Language Models (LLM). But that's just part of their mission at fast.ai. We actually need to make LLMs uncool again. Being 'cool' implies exclusivity, and we can't all be that exclusive. If everyone was, it wouldn't be special anymore. So, they aim to make neural nets and LLMs less exclusive. They shouldn't be accessible only to those with abundant data, money, computing power, expertise, and time. They need to be practical for everyday people with standard computers, data, resources, and time.

Fast.ai's research has always focused on reducing these barriers for neural networks. This led to the development of neural M fit, driven by the belief that democratizing AI in natural language processing requires transfer learning. Pre-trained models are essential – one group pre-trains the model, and then everyone else can fine-tune it for their needs. This approach, realized around 2016 or 2017, is Fast.ai's primary strategy: leveraging transfer learning, allowing a large entity to develop a comprehensive pre-trained model for everyone's use.

Jeremy is also dedicated to fine-tuning and transfer learning, making it accessible, quick, and affordable. This approach helps solve real-world problems for average people, which are often more significant than the typical Silicon Valley issues. One reason we started Fast.ai was because many in the deep learning community weren't addressing the everyday problems faced by most people, often applying AI to what we considered lower-value tasks. So, redefining what makes LLMs 'uncool' involves making them valuable and accessible for solving real-world challenges.

The biggest roadblocks here

The biggest hurdle is the size of pre-trained models in NLP; they're larger than those in computer vision. This means they need more data and more computing power for fine-tuning and usage. That's the first problem. The second issue is that the best models, or at least the top one, GPT-4, is not openly accessible. Possibly the second-best is quite a bit behind. So, it's possible that the best two models are both closed. And OpenAI restricts how you can use GPT-4, which is a major issue. I'm not sure about the best solution. Perhaps we need a CERN-style project for this field. Maybe if Europe decides against impeding its ability to develop models, they could instead invest in creating a top-notch one. Imagine keeping their scientists busy developing models that truly benefit European citizens. How cool would that be? Perhaps next to the Large Hadron Collider, we might see something like the European Very Large Language Model Center.

Will closed models always be ahead of open models?

Ilya from OpenAI has famously said that models from vendors will always be ahead.

Jeremy thinks definitely no. It's the same argument Microsoft always made about software. It's all déjà vu. Ilya didn't experience all that, so Jeremy think he's just not aware of how flawed and outdated that argument is. In fact, it's the opposite. Open source, when properly resourced, always ends up prevailing. There are many people inspired to build cool things that everyone can use. So, Ilya might think that throwing money at the people at OpenAI is the only way. But you can achieve similar, if not better, results with open source for much less money. There are a lot of people in the world who are more interested in helping others than in making huge amounts of money. They can work collaboratively, building on each other's efforts. And you don't need to pick winners or go through a selective recruiting process like OpenAI's. Instead, there's healthy competition and cooperation. Jeremy believes that if state-level funding were invested in the research and development of neural networks, it would be far more effective. Such an approach would leverage the collective passion and innovation of the open-source community, leading to advancements that are not just driven by financial incentives but by a genuine desire to contribute to the field.

Join us for more chats!

After our fireside chats, we have AMAs with our guests on Outerbounds community slack. A lot tends to happen there so join us if you’re interested in such conversations! You can also view the other fireside chats here.

Tags:

Smarter machines, built by happier humans

The future will be powered by dynamic, data intensive systems - built by happy humans using tooling that gives them superpowers

Get started for free