
One of EverWorld’s core features is free text player chat, which doesn’t exist in many other games. That’s because this is a hard problem; there are lots of reasons why this is a hard problem, and in this post I’ll dive into one of them: scale.
There are a lot of words in the English language. There doesn’t seem to be one definitive source for how many (in part because it’s changing all the time), but some estimates go as high as 1 million words including technical and scientific terms. Some words also have multiple meanings:
– Will you say hello?
– He will say hello.
– He hoped to have the will to say hello.
– I will myself to say hello.
When you take those words and put them next to each other in a sentence in different combinations, the numbers get even higher:
| Words in a sentence | Possible combinations | Equivalent to (roughly) |
| 2 words | 1,000,000,000,000 | Sun neutrinos passing through you right now |
| 3 words | 10e18 | Grains of sand on earth |
| 4 words | 10e24 | Stars in the universe |
| 5 words | 10e30 | Molecules in a human |
| … | … | … |
| 14 words | 10e84 | Atoms in the universe |
These numbers are hard to comprehend. It’s pretty crazy to think that this sentence has more word variations than there are atoms in the universe (that was 19 words by the way). This is one of the reasons that LLMs and generative AI are so interesting: you don’t need to pre-calculate every single different combination, you can ‘infer’ meaning of new sentences based on diverse training data.
A problem like this is called a combinatorial explosion, where the number of combinations in a certain context increases, often exponentially. These kinds of numbers are surprisingly easy to reach when you start to multiply different possibilities:
- The number of combinations on a Go board is estimated to be 21 x 10e170: 19 x 19 size board (361 positions), 3 different possibilities at each section (black / white / empty) means 3³⁶¹.
- On a 1080p monitor, there are 249766400 different combinations of pixels: 1920 width x 1080 height = 2,073,600 total pixels, 16,777,216 different colour combinations from 24 bit colour (256 * 256 * 256) means 2073600^16,777,216, or 249766400
A million English words is also an extreme estimate; English speakers are estimated to know between 20,000 – 35,000 words, with ~3000 – 5000 words used in common language and only ~1000 used each day. Conversation in EverWorld leans more towards this, with a greater focus on general conversational dialogue rather than deeply technical scientific discourse. These numbers are a lot less than a million, but still add up to a lot; a 10 word sentence with 1000 different words still has ~1e33 possible combinations.
In practice there are lots of ways that this number comes down:
- Only some sentences are grammatically correct (e.g. cat green thinking the book light quiet table upstairs downstairs)
- Of these, only some sentences are semantically meaningful (e.g. the cat is a cat which is also a cat)
When it comes to thinking about the meaning behind these sentences, the number comes down even more:
- Lots of sentences can have the same type of word swapped in and out (verbs, nouns, adjectives etc) and the base meaning remains the same, just with different instruments and targets (e.g. pick up the [noun])
- Lots of sentences have different words, but have the same underlying meaning (e.g. the cat went for a walk, the feline went for a stroll)
- Some sentences have repeated descriptive words that ultimately don’t change the meaning (e.g. the quiet cat quietly walked very quietly)
Another way to look at this is through higher level categories in speech annotation, like Parts of Speech (POS) tags. A POS tag could be a [verb], [noun], [adjective], [determiner and so on. Only some sequences of POS tags are grammatically valid; given 16 POS tags combined in a two word sentence, of the 256 (16 x 16) possible combinations, only ~60 are valid. Of these 16 POS tags, some will always be invalid sequences (e.g. verb verb verb verb) no matter how long the sentence is, and of the 16 POS tags only some are actually relevant and make a difference in everyday conversation (around 7 POS tags for EverWorld). The numbers then start to look a little bit different when each additional POS tag in a sentence multiples the total combinations by just 7:
| POS tags in a sentence | Possible combinations | Equivalent to (roughly) |
| 2 POS tags | 60 | Seconds in a minute |
| 3 | 420 | Calling code for CZ |
| 4 | 2940 | Days in 420 weeks |
| 5 | 20580 | Washington zip code |
| … | … | … |
| 14 | 830477232060 | Still a big number |
That’s still a lot! But again, realistically the total number of combinations is even lower than this given:
- Most invalid two word POS tagged sentences will appear again in longer sentences, making that longer sentence invalid as well e.g. ‘I went went to the shop
- Some longer sequences of POS tags are and will always be invalid no matter the sentence length. A sentence will never end with a definite / indefinite article (‘I went to the‘) or a coordinating conjunction ‘I went to the shop and‘
There are also a few factors that actually increase the number of combinations when it comes to semantic meaning:
- Some POS tags need more granularity to understand the meaning e.g. ‘this dog’ has a different meaning to ‘that dog’, even though ‘this‘ and ‘that‘ are both considered a [determiner]
- Some sentences with POS tags alone can have limited meaning without idioms or conventional use of phrases – e.g. “let’s go for a walk” doesn’t literally mean someone is asking for permission to go for a walk, it’s more of a suggestion
To bring the number of possible sentences down to a more manageable level for a free text feature like in EverWorld, there are also types of speech acts. These are things like declarations (making a statement), directives (a request or command), commissives (future commitments) etc. Linguistics studies have defined only a handful of these, maybe 10-20. We use a few more in EverWorld to cover off things like social niceties, managing topics in conversation and so on, but that’s still far fewer than the trillions of longer sentence combinations.
So, what does all this mean?
It means that while the challenge of catering for free text speech on pure sentence combinations is enormous, actually, there are lots of different ways of saying not that many things.
Much of the development work of EverWorld’s dialogue system is mapping these sentences / phrases back to something with real game meaning and functionality. It’s one thing to be able to parse the sentence ‘give me a pickaxe’, but it’s another to have an in-game character actually do the action of giving an item based on this specific dialogue from the player. That needs game logic: consideration of whether a character has a pickaxe, or owns one, or can spare one, then transferring items across inventories, transferring item ownership etc.
But once a feature is in place it can scale enormously: asking for an item using 10 words is functionally the same as asking for an item using 4. The word representing the requested item can also be swapped out for any other noun that exists. Given ~30%-40% of words are nouns, being able to handle the single sentence ‘give me a [noun]‘ also caters for 6000-10000 variations of that sentence in asking for different things. Add in some optional words like adverbs (‘quickly give me a pickaxe’) or polite interjections (‘please give me a pickaxe’), or the items with specific properties (‘give me a good pickaxe’) and the number of variation sentences that are automatically handled goes even higher.
Some of this oversimplifies speech – realistically there is a difference between demanding ‘give me a pickaxe’ and asking politely ‘could you please give me a spare pickaxe’. There is a difference between asking ‘give me a pickaxe‘ and ‘give me a house‘, or even just a player repeating ‘give me a pickaxe‘ over and over to a character vs saying it once. These need to be considered as part of a broader conversation (what’s the tone of a sentence, it’s weird to ask for a house, it’s annoying to repeat things etc.) but don’t detract from how to deal with the wide variety of possible things a player can say.
This is a somewhat unknown territory in games; there don’t seem to be any other games that have taken a dialogue system quite this far before in this particularly way. It might be the case that a more traditional, linguistic approach can’t quiet scale to meet EverWorld’s needs, at which point some kind of AI based solution might be the rest of the answer.
The next few stages in the roadmap for EverWorld dialogue are fairly simple: Handle all 4 word sentences, then all 5 word sentences, then all 6 word sentences… and so on.
Potential topics for next post:
– Game event propagation
– NPC knowledge
– EverWorld vision and design principles
– Lighting in a procedurally generated world
– Previous iterations of EverWorld + lessons learned
– Early access feedback
