![]() |
Black Saturn Software |
Overview'Rhoids Sports Analysis ranks baseball players and football teams using exotic methods. Rhoids Baseball analysis lists all-league teams based on sabermetric and game state methods. Sabermetric methods recognize that traditional baseball measures such as batting average and RBIs do not properly reflect the player's performance towards helping his team score runs and win games. The problem with traditional measures is their heavy bias on the player's teammates and the park he plays in. Sabremetric methods seek to eliminate that bias. Game state methods measure how each player's at-bat increases the probability of his team scoring a run or winning a game. Game state methods are biased but fairer indicators of run production than RBIs or pitcher's wins. |
|
|
Technical HighlightsImportant technical activities that support 'Rhoids sports analysis are
Basic data is collected from a variety of publicly available sports websites. This data is gathered using screen scraping programs. This is not very difficult as most of this data is very complete and formatted in structured web pages whose structure is very stable over time. There are some data on the 'Rhoids web site which are relatively bulky in space and are only semi-stable. Player and team images are prime examples of this data. Rather than collecting and storing this bulky data, 'Rhoids maintains a database of pointers to these images on the web, a technique referred to as data indexing. 'Rhoids applies a number of value added calculations to the basic collected data. The algorithms for these numerical calculations are very simple, and the interesting part is actually the process used to come up with these equations.
More interesting than the numerical calculations is the game state parsing. Amazingly, a textual account of every at-bat in every baseball game is publically available on the web. Even more incredibly, these game logs are written with a very strict syntax, which makes a program that "reads" each log and converts the outcome of every at-bat to a change in inning and game state possible, if not easy. A vocabulary of about 50 verbs seems to handle 99.99% of every game log situation. The Inning State Run and Game State Win pages explain in more depth the theory behind these algorithms. After all the data is collected and the value added calculations and analyses are figured, a set of HTML and XML pages are generated. Static (for the day) pages for the all-league and best value teams are formatted in HTML. XML files are produced to support the data dump pages. These pages provide basic and added value data for all the players in the league. Features to support screening players by position and team and sorting players by a variety of basic and sabermetric measures are provided. There are two data dumps, both of which deliver the same information, but in different methods. The old data dump uses the browser only to display an XML file. The combination of Internet Explorer's XML binding support and JavaScript screens out the appropriate XML elements, sorts them as requested, and formats the display. This approach requires no Java applet plugin, and is fast after the initial data load, because no additional data is required by the browser. Drawbacks are that it only works on IE5, and the initial download of XML data can be relatively slow. The second data dump uses a Java servlet to deliver the information. The browser page sends off a HTTP GET request to the server, which in turn screens and sorts the data according to the request, and delivers back a vanilla HTML page. This approach works on all known browsers, is remarkably fast (ignoring bandwidth considerations,) and requires no initial large data download. Check out 'Rhoids Sports AnalysisThe site was inspired by the Bill James Baseball Abstract series. Not wanting to wait until the end of the year for the sabermetric stats, and wanting to know who really was a good ball player, this site was developed. I did the game state matrix mostly because it could be done, and thought the results are really interesting. Reception to the site on the web has been mainly sabermetric junkies, as most baseball fans really aren't interested in math. |
||