Oz Blog News Commentary

After the Gold Rush

March 8, 2024 - 13:25 -- Admin

The commercial future of AI will soon be bigger than the actions of any firm, individual, research team, or open-source community. In the near term, a massive tailwind of potential use cases and nearly completed projects will determine the rate of progress in creating value for users and suppliers. Still, the source of the next frontier commercial breakthrough and the distribution of profits look undetermined and, for reasons to be explained, underdetermined.

Do not get me wrong. The world of commercial AI is all aflutter over the new governance of OpenAI, but understanding the impact of those events requires distinguishing between a general gain and its distribution among anchor firms and regional clusters of entrepreneurs. We can predict general but not specific gains because the gold rush in Generative AI is still too young. As a reminder, commercial gold rushes arise after a prototype creates a surprise, and the new assessment spreads quickly. As more participants recognize a novel market opportunity, potential suppliers accelerate plans and act impatiently due to a general perception – rightly or wrongly – that quick action leads to high reward. In other words, OpenAI could get that gold as of this writing, but so could many others. 

More concretely, while we all have OpenAI to thank for its catalytic demonstration, their long-term prospects remain undetermined. Will they go down in history as the organization that sets off the rush but does not profit from it? Or will they be that rare organization that catalyzes change, thrives, survives, and becomes a profit center? Are they another Netscape, Napster, Microsoft, or Cisco?

We could ask the same question about many other firms. How do we address such a question? Market activities change during a rush because the commercial prospects for many participants change simultaneously. Activities change after the rush because commercial participants learn similar lessons and resolve related crucial open questions, simultaneously changing perceptions about prospects for many. That observation suggests a way to analyze general economic prospects even if it does not lead to precise predictions about the prospects of specific firms.  

Speculation and investment

In the last decade, generative AI models have become more complicated and too difficult to take apart and quickly diagnose. The complexity of generative AI models with hundreds of millions of parameters makes them more challenging to interpret, analyze, and fine-tune through iteration.

Try explaining to a non-expert the mechanisms that created improvements in generative AI. More parameters in the models and better GPUs have produced better results, and that trend should continue at the frontier. Yet, that is stating an association, not explaining why the new models cause such excitement.  Even experts who can explain the mechanics at a high level need help explaining specific features of the results.

This lack of explainable AI is partly due to a shortage of reliable tools to forecast when a Large Language Model (LLM) hallucinates in response to a specific question. A small industry of beta testers has grown adept at eliciting forbidden information with clever prompts or causing biased answers at undesirable moments. Unexpected performances have also interfered with deployments, with some models spewing toxic language or bias.

Bias is hard to root out because models are trained on human conversation, and human history is filled with a mix of offensive and harmless stereotypes. For example, I recently took a class of non-technical students through an exercise in frontier generative text-to-visualization models and demos. Even the most unsophisticated students could quickly find gender and racial stereotypes reflected in answers with occupational prompts. Try it on any of the prominent models.

Has that held back progress? Yes, but not entirely. While progress is exciting and confusing to observe, this is maddeningly risky for firms with commercial aspirations. Allegedly, despite developing frontier models in the lab, Google had delayed deploying its experiments with LLMs because they had not resolved these issues and wanted to avoid embarrassment. Only ChatGPT’s prominence forced their hand, and even now, they still treat Bard as a beta service.

Believe it or not, there is a historical precedent in old industrial technologies for making progress in commercial products and services before experts understood the underlying determinants. At the start of the Bessemer process being used by the US steel industry, for example, steel mills produced high-grade steel at a large scale even though nobody knew the chemistry to explain why ore from some North American locations worked so well. The science of chemistry had not caught up with the industrial processes. As a result, workers needed to learn how to fine-tune performance by recognizing the errors without knowing the underlying causes. The early US steel industry did live for decades with such uncertainty – and furnaces blew up occasionally – but there was too much money to be made.

Like the old steel mechanics, many firms today perceive that there has been too much at stake to let the lack of complete understanding interfere with exploring new deployments. Slowly and inexorably, numerous scientists and engineers have designed better ways to install guardrails within algorithms, and a miniature industry of filters and word dictionaries has emerged to anticipate toxic answers.

The open questions are not easy to address. If an application stood alone, would a mistake in a new generative AI application lead to diminished value for an existing brand? How much will buyers pay for a service that occasionally errs when the stakes are low? How about when they are high? Would a service sell better if guardrails prevented bias but slowed delivery of desirable features? Would federal regulations be beneficial or harmful here? Those types of questions hang over many commercial efforts today.

Use Cases and revenue.

There are many approaches to reducing commercial risk. The most common involves direct intervention in a product’s design. Most large firms maintain a list of (hundreds of thousands of multi-language) sensitive words that can trigger an offensive or biased response. Most also have developed filters to detect offensive sentences. In the recent past, only the most prominent and wealthiest firms had access to such approaches, but these are becoming more standardized and inexpensive. Adoption of a mainstream filter should break open barriers for many new applications.

Relatedly, many firms adopt the approach of “wait and imitate.” Once one firm shows that an approach to resolving a common issue works and scales well, everybody does the same. Because many firms face similar technical risks, it pays to watch and wait for a pioneer to resolve an issue. Many firms with similar commercial goals imitate the solution while enhancing their business in ways others cannot replicate.

A third approach leans into a specific application. This can be tricky to pull off. Consider, for example, the lessons from the IBM Watson machine that beat the Jeopardy champions. The design team learned to mask Watson’s hallucinations by assigning probabilities to potential answers, then offering the answer with the highest probability – and not speaking when all the probabilities were too low. Training Watson on past Jeopardy episodes, the team of researchers lowered the visibility and costliness of the errors.

But that approach also has drawbacks, as the Jeopardy team learned when it was not generalized to medical applications as IBM’s management had hoped. In medical applications, doctors wanted to know how an option earned their assigned probabilities. Watson needed more tools to give a justification behind choices, and this frustrated physicians who typically wanted to know the reasons behind an answer, not only the answer itself. Even worse, they wanted it in less than a minute. Tailoring the technology to this use case was too hard to do cheaply.

Copilot and its many imitators provide a more recent example of how to work around the limitations of AI. Many use cases for generative AI are “not fully automated,” retaining an essential role for user discretion. Oversimplifying for brevity, these tools suggest blocks of code and autocompleting programming steps for common languages, such as Python or C++. The typical use case saves a coder some typing steps and reduces the time required to finish routine tasks. Especially for everyday coding, it can be much faster to type in a comment and a first line and let the tool generate the remaining code. Coders often recognize when an autocompletion satisfies their needs or fails in minor ways. Editing fixes errors.

Some of the most publicized use cases employ a similar trick. For example, LLMs have been a help as a writing tool (e.g., where do the commas go?) and, similarly, for the translation of professional documents, such as product manuals (e.g., what tense of verb captures the meaning?). A user examines and edits what the tool does. Another everyday use case for generative AI is standard correspondence, allowing users to edit. Same story for researchers: A well-chosen prompt elicits a set of categories and explanations, which the user can reorganize in their own voice. (To be sure, it has also sparked a cheating crisis among those who grade high school and college essays.)

Look for clues.

How will we know when the gold rush has ended? Look for clues that shared commercial risks have become resolved and widely imitated. Shared filters and similar approaches to human discretion are likely signs of success.

Also, assess the stakes of the applications. Expect firms first to deploy applications where errors can be managed. How far are we from inexpensive AI for fully automated high-stakes settings? That is a big open question.

Fully automated customer service is a holy grail, but it is not feasible yet at a low cost. We are, however, not far away from generative AI improving searches among small inventories of products, limited by guardrails and reduced vocabularies.

More to the point, you will know the future has arrived when OpenAI’s soap opera no longer matters. That means prosperity is so general and widespread that it does not rest on one firm’s fate.

Copyright held by IEEE Micro

February 2024