Maven and its analytical discontents

I stumbled upon this article written by the late Pete Finlay about the Scrabble analysis program Maven. I’ve never had or used Maven, and don’t know much about it. However, I have always regarded it as a significant development in terms of computer analysis – to this date, there are players who still believe in the strength of Maven’s analysis vis-à-vis the more recent Quackle. Nevertheless, I’m blogging about this because there are several ideas in this article that remain relevant to Scrabble play and analysis today. FWIW, I wrote a similar article about Quackle in this handbook.

This is a long piece of writing – to make it more bearable I have interspersed random pictures from the ongoing tournament at Marbella. Photos courtesy of John Chew.

Simulations and calculating equity loss

The article contains a short guide on simulation, and recommends simulating about 1000 iterations, or fewer if there is a clear gap between the top play and the other candidate choices. This is generally a good strategy to differentiate clear choices from the rest (e.g. playing QI on most occasions). However, when there are seemingly close choices (to the human player, not the software), experience with Quackle has shown me that it can be unwise to stop after 500 or even 1000 plies, even where there is a sizable gap between the top play and the other choices. I tend to leave simulations on while I am doing other things on the computer, so I sometimes end up with 10000 iterations or more – probably 5000 is enough to tease out even the minutest of differences, but increasing the sample size even further really makes a difference sometimes.

An all-Scotland matchup between Ray Tate and Simon Gilliam.

More interesting, perhaps, is how the article suggests equity loss be calculated.

[I]f your move is not at the top of the list, subtract the number it achieved in the simulation result from the number that the top move achieved. The result of that subtraction is your equity loss for that move. For example, if the best move had a simulation result of 36.55 and your move had a simulation result of 32.44, then in theory you lost 4.11 points of equity on that move. For practical purposes, many players simply round each move loss to the nearest whole number.

With Quackle, there seem to be three ways of evaluating equity loss – equity loss based on static evaluation, equity loss based on valuation after simulation, and equity loss based on Win%. The article’s approach mirrors the second, which is interesting – especially towards the endgame, plays with a lower valuation (and that are obviously better) could possibly have a higher Win% than plays with higher valuations, likely because they are more defensive and thus less volatile. It may be best to calculate equity loss by valuation sometimes, and by Win% on other occasions. Kenji’s page on computers, taking a much more technical approach than this article or my article, discusses the flaws of Win% under the ‘Bogowin’ section.

What equity loss means

It’s generally a well-known fact that top players lose an average of 20-30 equity points a game, though I’ve come to believe that this figure should nowadays be reduced to 10 per game as standards have improved. However, it’s hard to use the absolute equity figures in any meaningful way, except to compare your performance to previous performances, or the performances of other players.

This article suggests one way of looking at equity loss in a game, which is as ‘extra points of spread you could have gained over the course of the game’. This means that a game lost with -60 spread and an equity loss of 80 should theoretically be winnable by 20. I think this is a fairly reasonable way to look at equity, but it does have interesting implications – most significantly, that there are unwinnable games, where the losing spread is larger than the equity loss. Of course, the article recognises that this is ‘merely a theoretical assessment’ as the board positions could change entirely if a different play is made.

A completed board from round 6. SANTALOL is a chemical compound which constitutes the bulk of sandalwood oil.

Other ways of valuing mistakes

The problem I have with equity is that it doesn’t care about how many mistakes you made or how many better moves you missed. Equity loss also doesn’t account enough for game-losing blunders. I think ‘playing badly’ could mean playing perfectly for most of the game, but making a silly endgame blunder that just costs a handful of equity points, but the whole game. Missing a high-scoring spot or hook could also cost a player an inordinate amount of equity, especially if the spot is neglected for a number of turns. For those reasons, I avoided talking about equity calculations in my article on Quackle. The article recognises that equity loss is not everything, and mentions how Ed Martin categorises his mistakes:

Some players are much more rigorous in their approach. Ed Martin, for example, divides his equity loss into three types: 

  • bad evaluation of moves (where he spotted the move but rejected it)
  • overlooking the best move (where he knew the word but didn’t spot the move)
  • not knowing the word for the move

I generally feel that this is a much better way to classify errors, rather than looking at the absolute figure for equity loss. For one, a player using such a system would be able to tell where he is deficient, and improve from there on (e.g. if he were overlooking the best move very often, perhaps he could improve by spending more time each move). Kenji’s page also covers this topic – this time under ‘Analyzing multiple games’ – and suggests how to reduce such errors.

For a perfectionist like me, I have thought about valuing mistakes by counting the number of times and the number of positions you are away from the top play, with less emphasis placed on valuation, and more on late-game errors rather than early-game errors. An algorithm for this would be something like:

  1. For each error made, calculate the rank difference between the top play and your play. For every 5 points in valuation your play is away from the top play, add 1 to this figure. For every 5% Win% your play is away from the top play, add 1 to this figure.
  2. Multiply this by 2*ln(move number where error was made+1) – early game mistakes should count less than endgame mistakes, but not by much less, which is why we use the natural logs. Multiplying it by two magnifies the figure, and I believe makes it closer to the current range of equity losses people face.
  3. Sum the figures for all moves up to get your ‘equity loss’ for the entire game.

This is just an arbitrary and untested algorithm, and is probably too tedious for a human to work out for all his games. Nevertheless, it can definitely be programmed – will probably try to work this out for some of my games in the future and see how this algorithm can be tweaked.

Another board from the event, this time from round 8. Several uncommon words here – BRUXISM refers to the grinding of teeth (with relevant verbs BRUX, BRUXED and BRUXING). It is a common sleep disorder.

Phonies and unchallenged phonies

The article goes on to talk about how phonies and unchallenged phonies should be valued in terms of equity. The method for challenged phonies goes:

The best way to calculate equity loss for a challenged phony is to add an exchange of each letter on your rack to the kibitz list. When you have run the simulation, count the middle scoring tile exchange as your move and take the equity loss for that move. 

If there are fewer than seven tiles in the bag you cannot follow the procedure above as you are not allowed to change tiles in that situation. There is no really satisfactory procedure in these circumstances – my best suggestion is to penalise challenged phonies at this point by the actual points score for the move.

I’m not convinced that this is the ‘best’ way to do it, as this method seems more arbitrary than anything. The other method I know about is to add 20 points to the valuation of the best move, and count that as your equity loss – while it also sounds arbitrary, there is a basis behind this, which is to penalise you for the equity points foregone, and to penalise you further for losing your turn and keeping the same rack.

World Champion Nigel Richards at the RFID board, after round 13 against Helen Gipson

Perhaps it is better to simulate a pass, because that is the effective result of a challenged phony. The difference however is that a phony reveals some of your rack to your opponent and thus sets you up for inference – still, this could perhaps be made up for by adding an amount to the figure depending on the number of tiles revealed (so bingo phonies would be penalised more heavily than single-tile phonies). Again, it’s definitely possible to program this.

Note the difference between equity loss figures and effective penalties from making mistakes – if you play a phony when you have multiple bingos which cannot all be blocked at the same time, the effective penalty for revealing the 7 tiles of your rack is not as high as a situation where you phony with 1 playable and blockable bingo. However, equity loss is a comparison between perfect play (however it is defined – see below) and your play, so the eventual result of the phony should not matter.

The section on unchallenged phonies raises further questions:

I believe that if you play a phony and your opponent does not challenge it, you should still penalise yourself, using the method outlined above, because, under the free challenge rule at least, a phony is a mistake and it should have been punished. Many players agree with this, but not all. David Webb, for example, says: “An unchallenged phony is a potential not an actual source of equity loss. I like to maximise the correlation between equity loss and points loss.” David prefers to treat unchallenged phonies as a valid word simmed in the normal way.

I normally use Dweeb’s approach on rare occasions when I actually calculate my equity loss, but it seems that definitions of ‘equity loss’ certainly differ from person to person. If one takes it as comparing perfect play to imperfect play, it would be right to penalise yourself, even when the phony is not challenged. However, it’s not certain that ‘perfect play’ as defined by the simulator is the ideal way to play, so why use that as a benchmark? (this is not a strong point, but there are some cases where choosing to deviate from simulation works – especially in cases with tile inference) If, on the other hand, one accepts unchallenged phonies as valid words and accounts for equity as such, the ‘ideal play’ becomes difficult to ascertain – should one have played a more outrageous phony in that case? Would it be challenged?

My inclination is towards the former, because of the separation between equity and effective penalties. A desperation phony attempt, for instance, should be counted as equity loss, though it may have effectively been the best chance at winning the game.

Rack Leave Values

Finally, the article leaves us with the tile valuations for SOWPODS. I’m not sure how they were generated, but it seems to be rather different from how I value my leaves – I would rate a G as superior to a W, and a C as superior to many of the positive value tiles etcetera. Perhaps I’m not valuing them right. Kenji’s website provides more precise valuations, and goes into a lot of detail (that I have yet to digest fully).

For those who have toyed around with Scrabble software before, this article provides a wide range of insights into programs and analysis in the past, and is certainly a worthy read. RIP, Pete.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: