During her senior year, my daughter was approaching a milestone in one of her high school sports. To calculate where she was on attaining it, I would up go to the STLtoday High School Sport Stats page https://www.stltoday.com/sports/high-school/stats/#tracking-source=main-nav and copy and paste data from into a Google Sheet and creating some queries to sum up the stats. She had classmates that played other sports who were coming up on milestones, so I did likewise for those sports. When us software developers find ourselves doing something manually over and over, we seek ways to automate it. I once had contract gig that involved writing software to scrape data from various sources like spreadsheets and PDFs and store it in a database, scraping web pages can be relatively easy. If I can do it for one school, I can do it for all schools, seasons and sports that STLtoday has decent stats for. Now I have this data, I would be nice to share it with others, so this little web page thing was created.
Some other details about querying the statistics:
Queries have a limit of 250 rows returned unless an explicit limit is included in the query.
TODO
The only way to identify a player as being the same player is by name and school. This causes problems if the name is different between seasons. For example, Lafayette baseball had "David Freeze" and "David Freese". At least that one was easy to get the correct spelling. I was able to write a bit of software that helped compare names using a similarity algorithm. Sometimes there are discrepancies that looks like typos or spelling errors, but many times it a matter of "Lindsay" or "Lindsey". I just pick one. Right now, it is more important to have consistent names for players than getting the correct name of someone who played 10, 15 or 20 years ago. Two players with identical names at the same school and same sport could erroneously be treated as one player. If the number of seasons played by such a player is greater than four or there is a greater than 3 year gap between seasons, then it can be detected.
Players who transfer schools will appear as two different players. These problems are more of an issue with "Career" stats that span seasons.
For basketball, I tried feature a query where the three point shot percentage was calculated. However, many times the number of attempts was omitted or somehow a player has more shots made than attempts. I'm thinking sometimes it is entered as "made-attempts" and others as "made-missed". I ended up removing those queries which is unfortunate since that would be an interesting stat.
The already mentioned finding variances in names.
Adding new queries that could be interest to users.
Make this prettier and mobile friendly.
More details and better verbiage on this 'About' page.
I first noticed that STLtoday has no stats for Alton Marquette girls soccer in 2016-17.1There are many missing seasons for various schools and sports. Sometimes, it is likely that the school was small and perhaps didn't field a team that year or it was the pandemic. There are definitely some are conspicuous like private Catholic schools missing soccer seasons. For some sports, MaxPreps may have some stats that can "massaged" and fed into the database.2I'll be going through the "gaps" (missing seasons interspersed with seasons with stats) first determining that the season was played and try to find an alterative source of stats.
Known gaps that can be or have been addressed:
Baseball | ||
---|---|---|
Alton Marquette | 2014-15 | TODO - https://www.maxpreps.com/il/alton/marquette-catholic-explorers/baseball/13-14/stats/ |
Girls Basketball | ||
Vashon | 2016-17 | TODO - https://www.maxpreps.com/mo/st-louis/vashon-wolverines/basketball/girls/16-17/stats/ |
Girls Volleyball | ||
Belleville East | 2021-22 | TODO - https://www.maxpreps.com/il/belleville/belleville-east-lancers/volleyball/21-22/stats/ |
Mehlville | 2016-17 | TODO - https://www.maxpreps.com/mo/st-louis/mehlville-panthers/volleyball/16-17/stats/ |
MICDS | 2017-18 | TODO - https://www.maxpreps.com/mo/st-louis/micds-rams/volleyball/17-18/stats/ |
Pacific | 2022-23 | DONE - Populated from MaxPreps (Only 19 sets there) |
Parkway Central | 2020-21 | DONE - Populated from MaxPreps |
Windsor (Imperial) | 2019-20 | DONE - Populated from MaxPreps |
Windsor (Imperial) | 2020-21 | DONE - Populated from MaxPreps |
Visitation | 2023-24 | DONE - Populated from MaxPreps |
Known gaps without known resolution:
Baseball | ||
---|---|---|
DuBourg | 2018-19 | MaxPreps has no stats |
Herculaneum | 2013-14 | MaxPreps has no stats |
Hillsboro | 2015-16 | MaxPreps has no stats |
St. Mary's | 2014-15 | MaxPreps has no stats |
Whitfield | 2013-14 | MaxPreps has no stats |
Boys Basketball | ||
Normandy | 2015-16 | MaxPreps has no stats |
Girls Basketball | ||
Festus | 2012-13 | MaxPreps has no stats |
Football | ||
Edwardsville | 2003-04 | One year too far back for MaxPreps |
Sumner | 2010-11 | MaxPreps stats empty |
Vashon | 2018-19 | MaxPreps has no stats |
Hockey | ||
Summit | 2009-10 | MaxPrep no - MidStates doesn't go back that far |
Girls Soccer | ||
Alton Marquette | 2016-17 | MaxPreps has no stats |
Borgia | 2017-18 | MaxPreps has no stats |
Ursuline | 2017-18 | MaxPreps has no stats |
Girls Volleyball | ||
DuBourg | 2006-07 | MaxPreps does not go back that far |
DuBourg | 2007-08 | MaxPreps has no stats |
DuBourg | 2020-21 | MaxPreps stats empty |
Fort Zumwalt North | 2017-18 | MaxPreps has no stats |
Fox | 2013-14 | MaxPreps has no stats |
Highland | 2021-22 | MaxPreps has no stats |
Mehlville | 2021-22 | MaxPreps has no stats |
MICDS | 2018-19 | MaxPreps has no stats |
Notre Dame | 2012-13 | MaxPreps has no stats |
Notre Dame | 2013-14 | MaxPreps has no stats |
Orchard Farm | 2012-13 | MaxPreps has no stats |
Pacific | 2007-08 | MaxPreps stats empty |
Parkway Central | 2003-04 | MaxPreps does not go back that far |
Roxana | 2021-22, 2022-23 | STLtoday doesn't have all set stats. MaxPreps has no stats |
St. Joseph's | 2000-01 | MaxPreps does not go back that far |
Visitation | 2007-08 | MaxPreps stats empty |
Visitation | 2008-09 | MaxPreps stats empty |
Webster Groves | 2020-21 | MaxPreps has no stats |
I know of one missing game in Hockey.3 There is likely a lot of missing games across sports just like missing seasons.
For those who know SQL.
TODO