On Wednesday, I was part of a team that ran an evening workshop for our emerging executive community in Benelux. We had around 20 CxOs come together to connect, share ideas and build relationships.

Our European business unit lead, Simon Bostock, asked me to help create a workshop that would allow people to engage, “work” together and be playful. Here’s how it went.

The Pre-Planning

From the initial brief, it felt like the model I worked on with Simon White a few years ago might form a sound basis for a session focused on coming up with ideas for how technology might change how an organisation might operate. As to which organisation, the planning team settled on Schiphol Airport – an iconic Dutch institution and something about which most people will have both experience and views.

We settled the workshop format around using random input to allow teams to come up with ideas. From those ideas, they’d then select the “best” one, which in turn would be judged against their peers. We also thought that it would be fun to get the judging done by feeding the pitches into an LLM and asking it to decide.

A few nights before the event, I had the idea to run the entire event in simulation through an LLM before we did it for real. Initially, this was so that I could get a feel about how the workshop might run, but as I started that exercise, I realised that it could give me a generative AI-created pitch that, in turn, could compete against the humans.

The simulation

You can read the entire ChatGPT conversation here: https://chatgpt.com/share/673337b0-4108-8005-bc6e-29d3f189eda0

But here are the edited highlights…

I started by explaining what I was doing in a reasonable level of detail.

We are going to run a simulation of a workshop that I will be running tomorrow.

I’m doing this for a few reasons:

First of all, because it will help me check that I haven’t missed anything obvious in the various stages of the workshop. It’s using a technique I’ve used in the past, but the context is slightly different. The simulation will help me to spot any holes.

Secondly because I want to use the final output of the LLM simulation as a challenge to the human teams tomorrow – essentially, can they as a group beat an LLM in a challenge of creativity and innovation? I have no idea how this will pan out.

Thirdly, I’m thinking that the structure by which one shapes ideas is increasingly the most important thing. Early on I’ll ask the LLM to solve the “problem” without process to gently test that hypothesis.

And finally because something interesting might happen.

I will start by telling you about the workshop structure.

And then ChatGPT got weird on me with its very first response:

It’s important to remember that there is absolutely no sentience behind these machines – they are all just probabilities and randomness within boundaries. And when it tries to get personal and makes a mess of it, the uncanny valley becomes quite deep.

After the issue of my name, though, it became quite helpful to suggest that we should start by getting the output from simply asking the LLM to solve the “problem”, which we duly did.

With those baselines in place, we got on with the workshop simulation.

The workshop had four key stages, which were to be completed by participants split into three teams. Each team would focus on a different aspect of the work—one would examine the impact of technology on improving the overall passenger experience in the next 3 and 10 years, one on the retail experience in the same timeframes, and one on the sustainability of operations.

The first stage for the teams was to brainstorm a list of potential customer or passenger groups that could be fed into their work.

The second stage was to brainstorm a list of potential technologies. In both of these cases, we provided a list of six, and the teams had to provide another six.

The third stage was to generate some ideas. Each team had 4 12-sided dice. They rolled the first die and that selected a customer or passenger group. They rolled the second and third and those selected two from the list of technologies (rolling again if they got a double). The fourth die would select a brand from a list of 12 that we had prepared earlier. You can see the playing board we created for them to work with below, and the items they brainstormed out were written up on Artefact cards.

These selected items then became a seed for brainstorming, enabling them to develop some ideas for potential futures. Each team had a facilitator, and we also provided simple Pitch Canvases to help them structure their thinking. The aim was that in about 40 minutes, the teams should run the process two or three times.

The final stage saw the teams selecting their best idea and then writing and presenting a 90-second pitch.

Within GPT, I prompted each stage to be run, adding to the list of the provided customers and technologies and then creating the ideas based on the random seeds. In the final stage, I prompted the machine to select its best response from each team and then select the best overall.

To close the simulation, I asked it to assess whether the ideas it had created without the process were better than the ones that came from the workshop. ChatGPT concluded that its very first idea was the best. This is the point at which one needs to remember that these generative machines have no power of reasoning – it’s all just output that looks like thinking has gone on. My personal take was that the initial, non-process responses were big, high-level vision statements but didn’t actually tell very much about what the future could be like, whereas the final outputs were much more specific. Neither sets of responses are more or less likely to be “right”.

This whole process allowed me to knock a few corners off it and also generate some tips for the teams on the night—notably how the LLM selected customer groups and technologies in the simulation that were quite specific to the team’s focus areas (passenger experience, retail experience, sustainability).

The actual event

CxOs tend to be quite competitive. I think the idea of the evening’s activities being a competition between humans and machines captured many of their imaginations.

Having had a chance to run through the entire event in chat form the day before also relaxed me somewhat, as I knew what sorts of outputs we might see from the teams. It obviously didn’t really help in any way as to the interpersonal interactions on the evening, but I didn’t have to worry that there was something inherently flawed in the workshop format.

Framing early on as an experiment also helped, and my all-purpose experimental hypothesis “Somthing interesting might happen” helped too.

There were some nuances around language, particularly the use of English in a mixed audience, that will probably deserve a whole other article at some point. Suffice it to say, I had been very conscious in doing things like selecting a number of Dutch brands as part of the list of 12 in the exercise, and of course, we’d focused on Schiphol as the framing for the entire event.

But native English speakers, myself included, can sometimes forget when working with people for whom English is a second (or third, or fourth) language. This is because idioms popular in a mother tongue can often be completely confusing to people who haven’t been brought up in an English-speaking country.

For British English speakers, it’s also important to remember that International English is more likely to be a variant of American English these days.

Overall, though, there was a very high level of engagement throughout the evening, and we received some excellent feedback and some good suggestions for improvement too.

Who won?

On the evening, the final pitch stage was recorded using Google Recorder, and then the transcripts were loaded into NotebookLM alongside the 90-second pitch written by the LLM in the simulation the day before.

When asked which one was best, the response was “It depends on what Schiphol Airport’s strategy is”. Which was fair enough.

One of the benefits of Google’s NotebookLM is that you can point it at multiple specific sources. And so I pointed it at the Schiphol plan for 2050, and it then decided that the best pitch was…

… a combination of the LLM’s pitch and the human teams’ pitch for sustainability.

Which felt like it was a very suitable message to end the evening (playing into Kasparov’s idea of Centaurs).

Key things I learned

  • Running a workshop in simulation through an LLM is a really useful way to road-test it if you can’t get to do it with actual people.
  • Humans v machines is a really interesting dynamic at the moment, as is getting an LLM to judge outputs. They probably will become a bit passe quite soon though.
  • Don’t get suckered into thinking that LLMs actually reason. They don’t.
  • Getting people to do “work” with one another in a networking event is also really interesting. We probably needed to mix the groups up a bit more…
  • …but keeping the team dynamic/competition thing going at the same time is the balancing act.

Postscript

I asked NotebookLM again yesterday which idea was the best, and it said the human team’s idea about sustainability. This just goes to show that LLMs really are just randomisation machines on guardrails.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.