Can the natural language models and image generators that are currently available write an interactive text story, create the imagery to bring the text to life and can we get the adventure to run online? A lot of the techniques used here are also relevant for other automated writing and image generation. Check out the result first.
While looking into this idea I found that someone had some fun getting ChatGPT do something similar by having it play the role of a classic text adventure game by giving it a very specific prompt.
A company focused on AI generated games has created a live AI Dungeon, where the AI creates open worlds based on prompts by users. Very impressive, right now it does seem to leave less room for storytelling, because of the open world it creates.
Lets see what we can come up with! What would we have our experimental program do?

Starting with a description of the place where the story takes place as an input we want to automate the creation of an online interactive story. Looks like it’s possible to code the idea sketched above using available open source examples, API’s and engines as building blocks. OpenAI provides an API for the natural language model GPT and their image generator DALL-E. There are existing engines for running a interactive story online, most of them use a formatted description file with all the locations, objects, persons and puzzles in the little game universe. I’ve chosen Text Engine by Benji Kay.
Looks like the minimum to run an interactive story is to have some connected locations with a description. The game world can be made a lot more interesting by providing puzzles, persons and objects to interact with, maybe even include live interactions with in game persons using GPT to provide the answers! But I’ll leave all that for later… maybe never.
First: provide GPT-3 with a prompt that gives us a response we can run as a game in Text Engine. The described locations will be used to create prompts for DALL-E.
We’ll retrieve GPT’s response using python so we can use the data in following steps. The people at Open AI have helped us do this by providing examples and a playground where we can work on our prompt and even have some code created. After playing around with the input Text Engine needs and looking at some of the details about GPT-3 prompts I tried the following prompt in the playground.
The text-davinci-003 model I used went above and beyond my expectations, it responded with a correctly formatted JSON including the correct ID’s for the ‘rooms’ and all the ‘exits’ needed to connect them! With some setting up in the engine it works! It’s very bare bones, but we have got an online adventure, running from Github.
The only thing that will change is the description of the game environment, in the prompt above thats an off-planet space station collecting energy from a star shared with different species. A modified example GTP python script got that job done.
Now we need the data that we want use as an input for image generator DALL-E. Maybe the same prompt can get us that as well… This idea failed. GPT cannot provide two formats in the same response, so asking it for a list or array we can use for image generation and the JSON we need for our text engine in the same prompt will not work. Probably because the response is written word for word, so to speak, without any knowledge or memory of what was written before. Easiest solution is to extract what we need from the original response, it already contains the info we need in JSON format.
Turned out that the JSON in our example prompt was not valid, python did not like that, while the javascript engine did not mind. After changing the prompt we get valid JSON from GPT. With some massaging of the response we get The room descriptions from it in the python script. We extend them with the game environment description and some specifics about the image style and size to make prompts for DALL-E. An example DALL-E python script provided the code. I added more code to create all the files we need to set up a new game.
Text Engine was meant for text adventures only, but I want it to show our generated images. A few lines hacked into the javascript code of the adventure engine inserts an image for each ‘room’ you visit in the game.
The first somewhat working game created takes place in A desert where danger lurks with imagery in a graphic novel style with a slightly yellowish tint. Because of the image prompts it looks a bit like a point-and-click style game. You can try it here.
Next test with the original setting on an off-planet space station collecting energy from a star shared with different species and styled in a sci-fi cyberpunk style cinematic, detailed, with a slightly blueish tint resulted in this game.
The final creation can be seen at the top of this post, or here.
You’ll find all the files here. Lots of stuff could be added and improved! Some ideas:
- Improve on the image prompt.
- Extend or separate GPT prompts to add persons or objects, or make the ones found in the text interactive.
- Add blocking objects or puzzles using a prompt, for instance some that require an object from another location.
- Interaction with a non playable character using the newly released API for ChatGPT!
- Play AI generated sounds or music.
- Voice control using a voice to text model.
- Multiplayer?
- Now that we’ve got an API for ChatGPT a live, open world, environment based on it might be possible. If it has something like the sessions you get in the web interface. Have ChatGPT play gamemaster and generate images on the go. Interesting to see what the costs will be and if a session uses lots of tokens.