ChatGPT: Post-ASU+GSV Reflections on Generative AI

e-Literate 2023-04-23

The one question I heard over and over again in hallway conversations at ASU+GSV was “Do you think there will be a single presentation that doesn’t mention ChatGPT, Large Langauge Models (LLMs), and generative AI?”

Nobody I met said “yes.” AI seemed to be the only thing anybody talked about.

And yet the discourse sounded a little bit like GPT-2 trying to explain the uses, strengths, and limitations of GPT-5. It was filled with a lot of empty words, peppered in equal parts with occasional startling insights and ghastly hallucinations. 

That lack of clarity is not a reflection of the conference or its attendees. Rather, it underscores the magnitude of the change that is only beginning. Generative AI is at least as revolutionary as the graphical user interface, the personal computer, the touch screen, or even the internet. Of course we don’t understand the ramifications yet.

Still, lessons learned from GPT-2 enabled the creation of GPT-3 and so on. So today, I reflect on some of the lessons I am learning so far regarding generative AI, particularly in EdTech.

Generative AI will destroy so we can create

Most conversations on the topic of generative AI have the words “ChatGPT” and “obsolete” in the same sentence. “ChatGPT will make writing obsolete.” “ChatGPT will make programmers obsolete.” “ChatGPT will make education obsolete.” “ChatGPT will make thinking and humans obsolete.” While some of these predictions will be wrong, the common theme behind them is right. Generative AI is a commoditizing force. It is a tsunami of creative destruction.

Consider the textbook industry. As long-time e-Literate readers know, I’ve been thinking a lot about how its story will end. Because of its unusual economic moats, it is one of the last media product categories to be decimated or disrupted by the internet. But those moats have been drained one by one. Its army of sales reps physically knocking on campus doors? Gone. The value of those expensive print production and distribution capabilities? Gone. Brand reputation? Long gone. 

Just a few days ago, Cengage announced a $500 million cash infusion from its private equity owner:

“This investment is a strong affirmation of our performance and strategy by an investor who has deep knowledge of our industry and a track record of value creation,” said Michael E. Hansen, CEO, Cengage Group. “By replacing debt with equity capital from Apollo Funds, we are meaningfully reducing outstanding debt giving us optionality to invest in our portfolio of growing businesses.”Cengage Group Announces $500 Million Investment From Apollo Funds (prnewswire.com)

That’s PR-speak for “our private equity owners decided it would be better to give us yet another cash infusion than to let us go through yet another bankruptcy.”

What will happen to this tottering industry when professors, perhaps with the help of on-campus learning designers, can use an LLM to spit out their own textbooks tuned to the way they teach? What will happen when the big online universities decide they want to produce their own content that’s aligned with their competencies and is tied to assessments that they can track and tune themselves? 

Don’t be fooled by the LLM hallucination fear. The technology doesn’t need to (and shouldn’t) produce a perfect, finished draft with zero human supervision. It just needs to lower the work required from expert humans enough that producing a finished, student-safe curricular product will be worth the effort. 

How hard would it be for LLM-powered individual authors to replace the textbook industry? A recent contest challenged AI researchers to develop systems that match human judgment in scoring free text short-answer questions. “The winners were identified based on the accuracy of automated scores compared to human agreement and lack of bias observed in their predictions.” Six entrants met the challenge. All six were built on LLMs. 

This is a harder test than generating anything in a typical textbook or courseware product today. 

The textbook industry has received ongoing investment from private equity because of its slow rate of decay. Publishers threw off enough cash that the slum lords who owned them could milk their thirty-year-old platforms, twenty-year-old textbook franchises, and $75 PDFs for cash. As the Cengage announcement shows, that model is already starting to break down. 

How long will it take before generative AI causes what’s left of this industry to visibly and rapidly disintegrate? I predict 24 months at most. 

EdTech, like many industries, is filled with old product categories and business models that are like blighted city blocks of condemned buildings. They need to be torn down before something better can be built in their place. We will get a better sense of the new models that will rise as we see old models fall. Generative AI is a wrecking ball.

“Chat” is conversation

I pay $20/month for a subscription to ChatGPT Plus. I don’t just play with it. I use it as a tool every day. And I don’t treat it like a magic information answer machine. If you want a better version of a search engine, use Microsoft Bing Chat. To get real value out of ChatGPT, you have to treat it less like an all-knowing Oracle and more like a colleague. It knows some things that you don’t and vice versa. It’s smart but can be wrong. If you disagree with it or don’t understand its reasoning, you can challenge it or ask follow-up questions. Within limits, it is capable of “rethinking” its answer. And it can participate in a sustained conversation that leads somewhere. 

For example, I wanted to learn how to tune an LLM so that it can generate high-quality rubrics by training it on a set of human-created rubrics. The first piece I needed to learn is how LLMs are tuned. What kind of magic computer programming incantations do I need to get somebody to write for me?

As it turns out, the answer is none, at least generally speaking. LLMs are tuned using plain English. You give it multiple pairs of input that a user might type into the text box and desired output from the machine. For example, suppose you want to tune the LLM to provide cooking recipes. Your tuning “program” might look something like this:

  • Input: How do I make scrambled eggs?
  • Output: [Recipe]

Obviously, the recipe output example you give would have a number of structured components, like an ingredient list and steps for cooking. Given enough examples, the LLM begins to identify patterns. You teach it how to respond to a type of question or a request by showing it examples of good answers. 

I know this because ChatGPT explained it to me. It also explained that the GPT-4 model can’t be tuned this way yet but other LLMs, including earlier versions of GPT, can. With a little more conversation, I was able to learn how LLMs are tuned, which ones are tunable, and that I might even have the “programming” skills necessary to tune one of these beasts myself. 

It’s a thrilling discovery for me. For each rubric, I can write the input. I can describe the kind of evaluation I want, including the important details I want it to address. I, Michael Feldstein, am capable of writing half the “program” needed to tune the algorithm for one of the most advanced AI programs on the planet. 

But the output I want, a rubric, is usually expressed as a table. LLMs speak English. They can create tables but have to express their meaning in English and then translate that meaning into table format. Much like I do. This is a funny sort of conundrum. Normally, I can express what I want in English but don’t know how to get it into another format. This time I have to figure out how to express what the table means in English sentences.

I have a conversation with ChatGPT about how to do this. First I ask it about what the finished product would look like. It explains how to express a table in plain English, using a rubric as an example. 

OK! That makes sense. Once it gives me the example, I get it. Since I am a human and understand my goal while ChatGPT is just a language model—as it likes to remind me—I can see ways to fine-tune what it’s given me. But it taught me the basic concept.

Now how do I convert many rubric tables? I don’t want to manually write all those sentences to describe the table columns, rows, and cells. I happen to know that, if I can get the table in a spreadsheet (as opposed to a word-processing document), I can export it as a CSV. Maybe that would help. I ask ChatGPT, “Could a computer program create those sentences from a CSV export?” 

“Why yes! As long as the table has headings for each column, a program could generate these sentences from a CSV.” 

“Could you write a program for me that does this?” 

“Why, yes! If you give me the headings, I can write a Python program for you.” 

It warns me that a human computer programmer should check its work. It always says that. 

In this particular case, the program is simple enough that I’m not sure I would need that help. It also tells me, when I ask, that it can write a program that would import my examples into the GPT-3 model in bulk. And it again warns me that a human programmer should check its work. 

ChatGPT taught me how I can tune an LLM to generate rubrics. By myself. Later, we discussed how to test and further improve the model, depending on how many rubrics I have as examples. How good would its results be? I don’t know yet. But I want to find out. 

Don’t you?

LLMs won’t replace the need for all knowledge and skills

Notice that I needed both knowledge and skills in order to get what I needed from ChatGPT. I needed to understand rubrics, what a good one looks like, and how to describe the purpose of one. I needed to think through the problem of the table format far enough that I could ask the right questions. And I had to clarify several aspects of the goal and the needs throughout the conversation in order to get the answers I wanted. ChatGPT’s usefulness is shaped and limited by my capabilities and limitations as its operator. 

This dynamic became more apparent when I explored with ChatGPT how to generate a courseware module. While this task may sound straightforward, it has several kinds of complexity to it. First, well-designed courseware modules have many interrelated parts from a learning design perspective. Learning objectives are related to assessments and specific content. Within even as simple an assessment as a multiple-choice question (MCQ), there are many interrelated parts. There’s the “stem,” or the question. There are “distractors,” which are wrong answers. Each answer may have feedback that is written in a certain way to support a pedagogical purpose. Each question may also have several successive hints, each of which is written in a particular way to support a particular pedagogical purpose. Getting these relationships—these semantic relationships—right will result in more effective teaching content. It will also contain structure that supports better learning analytics. 

Importantly, many of these pedagogical concepts will be useful for generating a variety of different learning experiences. The relationships I’m trying to teach the LLM happen to come from courseware. But many of these learning design elements are necessary to design simulations and other types of learning experiences too. I’m not just teaching the LLM about courseware. I’m teaching it about teaching. 

Anyway, feeding whole modules into an LLM as output examples wouldn’t guarantee that the software would catch all of these subtleties and relationships. ChatGPT didn’t know about some of the complexities involved in the task I want to accomplish. I had to explain them to it. Once it “understood,” we were able to have a conversation about the problem. Together, we came up with three different ways to slice and dice content examples into input-output pairs. In order to train the system to catch as many of the relationships and subtleties as possible, it would be best to feed the same content to the LLM all three ways.

Most publicly available courseware modules are not consistently and explicitly designed in ways that would make this kind of slicing and dicing easy (or even possible). Luckily, I happen to know where can get my hands on some high-quality modules that are marked up in XML. Since I know just a little bit about XML and how these modules use it, I was able to have a conversation with ChatGPT about which XML to strip out, the pros and cons of converting the rest into English versus leaving them as XML, how to use the XML Document Type Definition (DTD) to teach the software about some of the explicit and implicit relationships among the module parts, and how to write the software that would do the work of converting the modules into input-output pairs. 

By the end of the exploratory chat, it was clear that the work I want to accomplish requires more software programming skill than I have, even with ChatGPT’s help. But now I can estimate how much time I need from a programmer. I also know the level of skill the programmer needs. So I can estimate the cost of getting the work done. 

To get this result, I had to draw on considerable prior knowledge. More importantly, I had to draw on significant language and critical thinking skills. 

Anyone who ever said that a philosophy degree like mine isn’t practical can eat my dust. Socrates was a prompt engineer. Most Western philosophers engage in some form of chain-of-thought prompting as a way of structuring their arguments. 

Skills and knowledge aren’t dead. Writing and thinking skills most certainly aren’t. Far from it. If you doubt me, ask ChatGPT, “How might teaching students about Socrates’ philosophy and method help them learn to become better prompt engineers?” See what it has to say. 

(For this question, I used the GPT-4 setting that’s available on ChatGPT Plus.)

Assessments aren’t dead either

Think about how either of the projects I described above could be scaffolded as a project-based learning assignment. Students could have access to the same tools I had: an LLM like ChatGPT and an LLM-enhanced search tool like Bing Chat. The catch is that they’d have to use the ones provided for them by the school. In other words, they’d have to show their work. If you add a discussion forum and a few relevant tutorials around it, you’d have a really interesting learning experience. 

This could work for writing too. My next personal project with ChatGPT is to turn an analysis paper I wrote for a client into a white paper (with their blessing, of course). I’ve already done the hard work. The analysis is mine. The argument structure and language style are mine. But I’ve been struggling with writer’s block. I’m going to try using ChatGPT to help me restructure it into the format I want and add some context for an external audience.

Remember my earlier point about generative AI being a commoditizing force? It will absolutely commoditize generic writing. I’m OK with that, just as I’m OK with students using calculators in math and physics once they understand the math that the calculator is performing for them. 

Students need to learn how to write generic prose for a simple reason. If they want to express themselves in extraordinary ways, whether through clever prompt engineering or beautiful art, they need to understand mechanics. The basics of generic writing are building blocks. The more subtle mechanics are part of the value that human writers can add to avoid being commoditized by generative AI. The differences between a comma, a semicolon, and an em-dash in expression are the kinds of fine-grained choices that expressive writers make. As are long sentences versus short ones, decisions about when and how often to use adjectives, choices between similar but not identical words, breaking paragraphs at the right place for clarity and emphasis, and so on. 

For example, while I would use an LLM to help me convert a piece I’ve already written into a white paper, I can’t see myself using it to write a new blog post. The value in e-Literate lies in my ability to communicate novel ideas with precision and clarity. While I have no doubt that an LLM could imitate my sentence structures, I can’t see a way that it could offer me a shortcut for the kind of expressive thought work at the core of my professional craft.

If we can harness LLMs to help students learn how to write…um…prosaic prose, then they can start using their LLM “calculators” in their communications “physics” classes. They can focus on their clarity of thought and truly excellent communication. We rarely get to teach this level of expressive excellence. Now maybe we can do it on a broader basis. 

In their current state of evolution, LLMs are like 3D printers for knowledge work. They shift the human labor from execution to design. From making to creating. From knowing more answers to asking better questions. 

We read countless stories about the threat of destruction to the labor force partly because our economy has needed the white-collar equivalent of early 20th-Century assembly line workers. People working full-time jobs writing tweets. Or updates of the same report. Or HR manuals. Therefore our education system is designed to train people for that work. 

We assume that masses of people will become useless, as will education, because we have trouble imagining an education system that teaches people—all people from all socio-economic strata—to become better thinkers rather than simply better knowers and doers. 

But I believe we can do it. The hard part is the imagining. We haven’t been trained at it. Maybe our kids will learn to be better at it than we are. If we teach them differently from how we were taught. 

Likely short-term evolution of the technology

Those of us who are not immersed in AI—including me—have been astonished at the rapid pace of change. I won’t pretend that I can see around corners. But certain short-term trends are already discernable to non-experts like me who are paying closer attention than we were two months ago. 

First, generative AI models are already proliferating and showing hints of coming commoditization around the edges. We’ve been given the impression that these programs will always be so big and so expensive to run that only giant cloud companies will come to the table with new models. That the battle will be OpenAI/Microsoft versus Google. GPT-4 is rumored to have over a trillion nodes. That large of a model takes a lot of horsepower to build, train and run. 

But researchers are already coming up with clever techniques to get impressive performance out of much smaller models. For example, Vicuña, a model developed by researchers at a few universities, is about 90% as good as GPT-4 by at least one test and has only 12 billion parameters. To put that in perspective, Vicuña can run on a decent laptop. The whole thing. Tt cost $300 to train (as opposed to the billions of dollars that have gone into ChatGPT and Google Bard). Vicuña is an early (though imperfect) example of the coming wave. Another LLM seems to pop up practically every week with new claims about being faster, smaller, smarter, cheaper, and more accurate. 

A similar phenomenon is happening with image generation. Apple has quickly moved to provide software support for optimizing the open-source Stable Diffusion model on its hardware. You can now run an image generator program on your Macbook with decent performance. I’ve read speculation that the company will follow up with hardware acceleration on the next generation of its Apple Silicon microchips.

“Socrates typing on a laptop” as interpreted by Stable Diffusion

These models will not be equally good at all things. The corporate giants will continue to innovate and likely surprise us with new capabilities. Meanwhile, the smaller, cheaper, and open-source alternatives will be more than adequate for many tasks. Google has coined a lovely phrase: “model garden.” In the near term, there will be no one model to rule them all or even a duopoly of models. Instead, we will have many models, each of which is best suited for different purposes. 

The kinds of educational use cases I described earlier in this post are relatively simple. It’s possible that we’ll see improvements in the ability to generate those types of learning content over the next 12 to 24 months, after which we may hit a point of diminishing returns. We may be running our education LLMs locally on our laptops (or even our phones) without having to rely on a big cloud provider running an expensive (and carbon-intensive) model. 

One of the biggest obstacles to this growing diversity is not technological. It’s the training data. Questions regarding the use of copyrighted content to train these models are unresolved. Infringement lawsuits are popping up. It may turn out that the major short-term challenge to getting better LLMs in education may be access to reliable, well-structured training content that is unencumbered by copyright issues. 

So much to think about…

I find myself babbling a bit in this post. This trend has many, many angles to think about. For example. I’ve skipped over the plagiarism issue because so many articles have been written about it already. I’ve only touched lightly on the hallucination problem. To me, these are temporary obsessions that arise out of our struggle to understand what this technology is good for and how we will work and play and think and create in the future. 

One of the fun parts about this moment is watching so many minds at work on the possibilities, including ideas that are bubbling up from classroom educators and aren’t getting a lot of attention. For a fun sampling of that creativity, check out The ABCs of ChatGPT for Learning by Devan Walton. 

Do yourself a favor. Explore. Immerse yourself in it. We’ve landed on a new planet. Yes, we face dangers, some of which are unknown. Still. A new planet. And we’re on it.

Strap on your helmet and go.

The post ChatGPT: Post-ASU+GSV Reflections on Generative AI appeared first on e-Literate.