|
UK NATIONAL RANKINGS (The Glicko System) |
Introduction
At the request of JD (06/09/02) I have investigated the Glicko system for determining national rankings with a view to providing a report to the BHGS committee for its Oct '02 meeting. This document provides the results of that investigation. The investigation itself covers
· Understanding and background of Glicko
· How Glicko could be implemented in UK
· Advantages/disadvantages of Glicko compared to current UK system of determining national rankings
Background to Glicko
Professor Glickman developed this system for chess competition and it has been adopted for internet chess. It is a statistical treatment resulting in formulae for the calculation for both ranking points of each player and a factor indicating the statistical reliability of the ranking points for the player. I have made no attempt to understand the derivation of the formulae.
In the Glicko system ranking points are awarded/deducted according to how well/poorly a player performs in each game, taking into account the ranking of his/her opponent. It is possible to determine the expected score between two opponents, based on their rankings. The better a player performs against the expected score the more ranking points are awarded, and the worse the performance more points are deducted. The final placing achieved in a competition has no bearing on the ranking (cf. current UK method of calculation).
The reliability factor is used to identify maverick performances that could unbalance the rankings. As an example an 'average' player could compete in a single competition of four games getting three average results and one 'lucky' win against a top player; his/her ranking points would be good but the reliability factor would be low for being based on four games only. This should be compared with an established 'average' player whose big wins would be balanced bad losses, but the reliability attached to their ranking would be high.
The formulae take both the ranking and reliability of the opposing players, and the game score, to determine a change to each players ranking and reliability. Reliability reduces over time if no new games are played. Organisers as required can set the rate of this decay.
Glicko represents a lifetime achievement as compared to the current UK system of achievement over the previous year only.
Both Australia and Ireland use Glicko system for their ranking. I'm not aware of any other country using them. I believe Ireland copied the Australian implementation. Both David Young (Australia) and Rob Brennan (Ireland) have been very helpful during my investigation.
Main Issues
Player History
If the Glicko system were commenced with all players rated equally it would take several games before the rankings were sufficiently developed and reliable to be an accurate reflection of the UK players. At least one year's competition history should be fed into the initial system, and preferably more.
New Players
Players not already ranked must be given an arbitrary ranking and reliability factor. These can be any number as they simply indicate a mean for all players on the rankings list. Australia and Ireland use an initial ranking of 2000 and reliability of 400: these were chosen as a rough conformance with the values used in internet chess and therefore had a proven track record.
Weighting of Competitions
Weighting of competitions (eg. 'Grand slams') is not currently used by Australia nor Ireland. Because ranking points are earned/lost according to opponent and not final competition position weightings by competition are not applicable. Under Glicko a competition is weighted by the quality of competitors.
Doubles Competitions
The UK plays far more doubles competitions than any other country. For this reason neither Australia nor Ireland have attempted to include doubles results in their rankings. Glicko formulae exist for doubles tournaments and are of a very similar formatting to the formulae for singles. Unlike the current UK system the quality of each player in a team is considered, and different ranking/reliability points are awarded according to the contribution each player can be expected to make to the team.
Team Competitions
I assume that a team competition is actually several singles games with scores cumulated for the players in a team (eg. Derby). In this case Glicko is easily applied to the individual games.
If the team competition is actually several people playing the same game then the doubles mechanism above could be applied (ie. doubles is treated as a team competition with teams of two players).
Timing of Calculation
The software used by Australia and Ireland applies the Glicko formulae after each game of a competition, using the resultant ranking/reliability for the calculation of the next game. Whilst this is simple in computer terms it means that a great deal of accurate information is required from competition organisers, ie. opponents and scores in the sequence they played. It may be easier, and not much less accurate, to apply the ranking/reliability at the start of a competition to every game played during the competition, and only update a players ranking/reliability at the end of the competition.
It is also vital that results from competitions are fed into the formulae in the correct sequence. This is because ranking and reliability points will change as a result of each competition, and the amended totals have an impact the calculations for the next competition. We must expect some competition results to be provided late. In such a case it may be necessary to publish provisional scores, excluding a missing competition, and subsequently re-calculate when the results are available. It should be possible do this using suitable software.
Scoring Systems
The formulae currently in use assume the 10-0 scoring system. I have successfully converted them to be able to use both 10-0 and 32-0 systems within the same set of rankings. It would not be possible to use the 3-2-1-0 or 3-1-0 scoring systems within Glicko. It would also not be possible to use any scoring system that uses two measures of victory, eg. 3-1-0 + %age elements killed.
Ranking Tables
Australia and Ireland publish rankings only for the top n players (was 50 but currently being increased) that have a reliability rating above a pre-set figure. Players with low reliability ratings are excluded from the ranking table although they will be listed in an alphabetic list of players ranking points. It is very subjective as to how high the reliability factor has to be to be included in the rankings. The experience of Australia and Ireland seems to indicate about 20 games. Given this is a lifetime achievement, assuming no lengthy breaks, most UK players would have the required reliability rating.
As a further check both countries also exclude anyone who hasn't played in the last 18 months regardless of their reliability. This ensures that someone with very high reliability when they 'retire' drops out of the rankings after a suitable period.
Opting Out
Australia allows a player to play a proportion of their competitions without them counting towards the rankings. The intent is to allow players to take 'fun' armies and not be penalised on the rankings. Players must give prior written notification that they wish their games to be discounted for rankings purposes. Currently Australian players are not allowed to opt out of the ranking system for Cancon. This is the sole attempt by Australia at a form of competition weighting.
Points Attenuation and Selection of UK Champion
Sorry for the odd heading - I couldn't think of anything better. This refers to how long the results of a single game still have an impact on players rating, and the reduced affect of that score over time. Under the current UK system scores have a 100% impact on players rating for a rolling 12 month period, and then get dropped completely. Glicko reduces the impact of previous scores by a small amount until they have a negligible affect, but this effect takes years. It should be remembered that Glicko represents lifetime achievement rather than a snapshot.
Multiple Circuits
The ranking points from one competition circuit are only comparable with those of another circuit if there is a reasonable amount of player crossover. Of course if sufficient numbers of players play in both circuits it could be argued that they aren't separate circuits in the first place! This means that it would not be possible to compare Australian, Irish and UK Glicko rankings directly, although it may be possible to determine some form of conversion factor based on the few players who compete any two of the circuits. I believe David Young is already looking at this.
The other problem is the possible existence of separate circuits within the UK. Existence of separate BHGS, SW and Scottish circuits without sufficient player crossover could mean that players ranking points are not directly comparable within the UK. Both David Young and Rob Brennan have assured me that there is sufficient crossover between the UK circuits. However no solid evidence has been produced to support this. It is probable that we could only 'prove' this during a trial period.
Advantages
· More reliable statistical basis to calculations
· No reliance on artificial weightings of competition
· No 'free-rides' in doubles and team competitions
· No need to artificially 'balance' organisers/umpires rankings when they can't play a competition (but see below on reliability)
Disadvantages
· Not a transparent system, ie. not easy for individuals to calculate ranking points for themselves
· May not be capable of providing comparable rankings for separate circuits
· More information required from competitions than with current system
· More limited in allowed scoring systems than current system (although actual 10-0 type scores from each game could still be accepted)
· Not obviously suitable for determining a Grand Prix style UK champion
· Reliability of organisers/umpires may be reduced due to playing fewer games (note that most organisers/umpires will nevertheless play enough games to maintain a good reliability)
Ranking System Objectives
Any system for determining rankings must be judged against the objectives of a ranking system. In the UK the current rankings are used for
· Determining a UK champion
· Seeding for competitions
· Selecting players for UK representative teams (eg. Grandson)
· General interest
I'll take each of these in turn.
UK Champion
As noted above Glicko represents a lifetime achievement and is therefore not suitable for selection of an annual national champion. Australia and Ireland get around this difficulty by running a parallel 'Player of the Year' (or equivalent). Such a mechanism could be calculated in a number of ways including our current ranking system.
I suggest one of two solutions:
1. Continue to run the current UK ranking system in parallel with Glicko, but solely as a 'Grand Prix' system to select the annual champion.
2. If the arguments about the better statistical basis for Glicko are accepted it might be better to calculate a second set of Glicko rankings based only on a rolling 12 month history at monthly intervals, with players ratings initialised at the start of that 12 months. This would be a very simple calculation to perform assuming the data is made available for the ranking in the first place. We would therefore be able to see players rankings on recent form (where unexpected competition results can have an acceptable major impact) as well as long term rankings (where unexpected results will be ironed out).
Seeding for Competitions
Because the current system uses only the last 12 months data its possible (and happens) that strong players who have been off the circuit for a short while can lose their seeding and become dangerous unseeded players, although their abilities have not diminished. Glicko does not suffer from this as it represents lifetime achievement. If the Australian/Irish practice of excluding from the rankings anyone who hasn't played for 18 months then it will also not suffer the problem of someone being seeded who has dropped out for 'too long'.
Selecting Players for Representative Teams
A decision has to be made on whether the selection process should be made on lifetime achievement or recent form - I prefer the latter. If recent form is the chosen criteria then the mechanism chosen for 'UK champion ' should be used.
General Interest
As long as a set of rankings is available the vast majority of players will be happy regardless of the mechanism used. The major differences between Glicko and the current system are
1. Current system is reasonably transparent and Glicko is not. By this I mean that an individual can create a spreadsheet to determine their own ranking points from the competition results published on the BHGS website. It would be very difficult to achieve this with Glicko as other players results affect their own, so they would have to duplicate the software used the create the rankings in the first place.
2. The current system is very subjective in the weightings given to various competitions, leading to players losing interest if they have one bad 'major' competition. Glicko does not suffer from this.
3. Doubles results are reflected equally on each player in a team regardless of their comparative abilities. This leads to some players receiving a higher ranking than others perceive they have earned, ie. some discrediting of the table. Glicko apportions doubles results in a more reasonable manner.
My guess is that the majority of players won't care too much what system is used. Most don't do the calculations themselves or give too much credence to the details. It does matter to perhaps the top 50 players.
Personal Views
I have attempted to avoid any personal preference or bias towards any system, and you have my apologies if any bias has crept in. Nevertheless I should state that I started this investigation with a belief that Glicko would be a better mechanism than the current UK system. This view was based on articles on the DBM discussion site and a belief that there were a number of flaws in the current system.
During this investigation I have confirmed that Glicko has its own flaws and its a matter of choosing a system where the flaws are outweighed by the advantages where possible. I also realised that its vital that the purposes of a ranking system are clearly determined otherwise that determination is impossible to make.
Over the last few weeks I swung one way, then the other, as to whether Glicko was suitable for the UK. At the end I am of the view that a simple application of Glicko will not provide the UK with a ranking system that would fulfil the objectives as I see them. However I am now convinced that a lifetime ranking table alongside a Glicko based 12-month rolling ranking table for current form will provide a more statistically accurate, reliable and trustworthy system that the current Grand Prix system. I have also been convinced that the software currently available will support such a solution with ease.
Recommendation
Without giving the Glicko system a trial it is difficult to determine the impact of the disadvantages listed, or whether they are outweighed by the advantages. It is also almost impossible to determine the extent (or otherwise) of problems caused by having different UK circuits until a trial is completed.
For these reasons I recommend that we run a Glicko system of rankings during 2003, alongside the existing system (which will remain the official system). As much history as possible should be fed into the Glicko formulae to provide a start point for 2003. I also recommend that we run a 'rolling 12 month' version of Glicko to see the merits of this for determining UK champion. I would expect that by managing a Glicko system we will be able to see the extent of any problems, tweak the system where necessary, and be able to form a better view of whether Glicko is more preferable or desirable than the current mechanism. Despite my better judgement I am prepared to help manage a Glicko system, but I recommend that the committee finds at least one other person to join me so that a range of views are available (apart from sharing out the work). Towards the end of 2003 it should be possible to take a better view of Glicko based on practical experience of its application on the UK.