So we determine that there is a 50% chance of winning a game when the offense scores 2.7659 / 0.0077 ≈ 359 yards. Additionally, we can determine when there is a 50% chance of winning, since the predictor variable 1 / 1 + e β 0 + β 1x = 1 / (1+1) when β 0 + β 1x = 0. β 1 is the coefficient associated with yards gained, and using the rule of fourths, this indicates that for every yard gained, the upper bound of the probability of winning is 0.0077 / 4 = 0.0001925. Figure 1 in the Appendix shows the fitted regression and the plotted points, where the yards gained by the winning team are points where y = 1, and the yards gained by the losing team are points where y = 0. The first model used to fit the data has one predictor variable, which is yards gained. After doing this for many games, I will calculate the percent of game outcomes that it accurately predicts by determining any output over 50% to be a correct prediction for wins, and any output under 50% to be a correct prediction for losses. To do this, I will take statistics from games from the 2021 season and plug them into my most successful model as evaluated by the pseudo R 2 values as described above. The second way I plan on testing these models is by using data that was not used to create the model (that is, statistics from games not in the original sample) and seeing how well the model predicts the outcomes of those games. While both are important, in this case the pseudo R 2 values will be more useful, since our ultimate goal is to be able to predict the outcome based on the statistics from a specific game. The R 2 value will give a measure of how good the model is at predicting the fraction of games with certain statistics will end in a win or loss, while the pseudo R 2 value will measure the model’s ability to predict the outcome of specific games. First, I decided that the pseudo R 2 values were more applicable in this case. I will test the accuracy of these models in two ways. I will use the data from 2020 to build the models, then, since there were minimal rule changes from 2020 to 2021, I will use games from the 2021 season to test the accuracy of the model (NFL Football Operations, 2021). I am using data from Pro Football Reference which has many different statistics from games from current and past seasons (Pro Football Reference, 2020 Weekly League Schedule). Then, I will create logistic regression models with pairs of predictor variables, and finally, I will create a logistic regression with many predictor variables to see if the accuracy of the models improves with more variables. First, I will model a logistic regression model for each statistic as an independent predictor variable. Since the predictor variable (winning or losing the game) is a binary variable, I will be using a logistic regression to analyze the association between my different variables and the outcome of the game. The predictor variables I will analyze are yards gained by a team’s offense, yards lost by a team’s defense, and turnover differential. For example, something like point differential (the difference in points scored) would always be able to predict who wins a game because that is how the winner is determined, but how well do other statistics predict the outcomes? How well can multiple statistics as predictor variables predict outcomes? Particularly, I will explore if there are any particular statistics (or combinations of statistics) that are more predictive of the winner of any given NFL game. In this paper, I will analyze the predictive power of different statistics from NFL games on game outcomes. But in an era of statistics and analysis, how unpredictable is the game? An essential beauty of sports, especially football, is the unpredictability of the game no one expected the winless Jets to beat a hot 9-5 Rams in Week 15 of the 2020 season, yet it happens nonetheless.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |