Intro

Oklahoma politicos have long assumed the existence of a divide between urban and rural legislators in state politics. The failures of many bills — most recently SB1647 — have been attributed to this divide, and its alleged influence informs much of the punditry around politics in this state. Despite its influence, no one to my knowledge has empirically evaluated its effects.

Through rigorous statistical analysis, I have concluded that the population density of a legislator’s district has no significant relationship with that legislator’s partisanship, and that population density has little to no explanatory value when predicting vote outcomes.

Here I will address the common most common argument for why an urban-rural divide exists in Oklahoma — the one-party factionalism theory — and will present my own framework for better understanding political competition in our state’s politics. The urban-rural divide does not hold up to scientific scrutiny, and I believe it must be rejected to improve the political coverage of in our state government.

One-party states and factionalism

Oklahoma is functionally a one-party state. The Republican Party dominates statewide elections, controls all 7 of the House and Senate seats in the state, and has a stranglehold on the state legislature. Realistically, as it stands today, the Democratic Party has very little chance of ever winning a majority in any level of government in Oklahoma.

Political scientists attribute the competition that still exists in one-party states like Oklahoma to the formation of “factions,” or coalitions within the dominant party. It has long been assumed that the main factional divide within the Oklahoma state legislature is between the urban and suburban Republicans and the rural Republicans.

This factional explanation is spoken of without much scrutiny, especially with regard to bills likely to face stiff opposition within the Republican Party. When I interned at the State Capitol in the spring of 2022, I heard countless politicians, lobbyists, and journalists alike attribute the failure of SB1647 in particular to this urban-rural divide. One of my best friends even wrote an op-ed about the divide in The Oklahoman. I cannot overstate how prevalent this myth is among Oklahoma’s political class, and I cannot understate how little evidence actually exists to support it.

Demographics of Oklahoma

Before evaluating the effects of population density on partisanship and vote choice, it is important to establish the demographic makeup of Oklahoma. Having an understanding of these variables is essential to understanding politics in this state.

The partisan scores used in this analysis were generated through the W-Nominate Method (Poole & Rosenthal). W-Nominate uses the spatial model of voting to estimate ideal points - or numerical measures of relative partisanship - based on roll call voting data. The roll call votes used for this W-Nominate analysis were from the 2022 Regular Session sourced from LegisScan. The W-Nominate partisan scores for the House and Senate are displayed on interactive maps below:

After generating these scores, I sought to create a measure of population density for each district. These scores were calculated by dividing the number of people per district by district area, and are expressed in people per square mile. These scores are plotted on distribution plots below:

After generating the legislator partisan scores and district-level population density values, I collected various demographic variables from the United States Census. The variables selected for this analysis were % white by district, % male by district, % with college degrees by district, and % living below the poverty line by district. The distributions of these variables in the House by party are shown below:

It can be observed in all of the above visualizations that Oklahoma is very conservative, mostly rural, and that demographics are distributed relatively evenly between the parties.

Visualizing demographic correlations

Before getting to the raw statistics, I believe that it would be beneficial to visualize the relationships between all of these variables. First, I want to show that demographic factors correlate with both partisan scores and population density. In the graph below, it is clear that a weak relationships exist between the education, gender, and wealth of a district and legislator partisanship. These relationships are visualized below:

My analysis also shows a much stronger relationship between these demographic factors and population density:

The fact that both partisan scores and population density are correlated with the selected demographic factors is very important for this analysis. The mutual correlation allows for “controls” to be applied to my linear regression prediction model. Put simply, statistical controls allow my models to measure and account for the influence of the confounding effects of the demographic variables and give a more clear picture of the relationship (or lack thereof) between population density and partisan scores and vote choice.

Population density and partisan scores

My first piece of evidence to disprove the urban-rural divide is the lack of direct correlation between population density and partisan scores. Even when population density is put on a natural log scale (a method of eliminating potential scaling issues when comparing values with different orders of magnitude), it has no visible correlation with partisan scores:

When party is controlled for, still no relationship appears to exist:

These graphs clearly show the lack of a linear relationship, but sometimes visualizing linear relationships can be challenging without accompanying numbers to put things in perspective. The charts listed below - called ordinary least squares regression models (OLS regressions) - are predictive statistical models designed to measure the relationships between two or more continuous numerical variables. Each chart has two models - one modeling the entire legislature, and one modeling just the Republican Party in which the divide is alleged to exist. In addition to clarity, the other advantage predictive modeling has over simple correlation visualization is the ability for confounding variables to be “controlled.” I explained this idea earlier, but essentially a control allows for the influence of a potentially confounding variable to be removed in a predictive model. Another important note is that, in these models, population density is measured in people per 0.001 square miles. This has been done to limit the noised caused by modeling variables of vastly different scales. My OLS regression models can be seen below:

## 
## House Demographics Linear Model
## ============================================================================
##                                               Dependent variable:           
##                                    -----------------------------------------
##                                                 Partisan Score              
##                                             (1)                  (2)        
## ----------------------------------------------------------------------------
## % With College Degree                    -0.071***              -0.005      
##                                           (0.010)              (0.004)      
##                                                                             
## % Male                                    -0.074**              0.0003      
##                                           (0.036)              (0.014)      
##                                                                             
## % Below Poverty Line                     -0.103***              -0.001      
##                                           (0.017)              (0.007)      
##                                                                             
## % White                                    -0.004               -0.002      
##                                           (0.007)              (0.003)      
##                                                                             
## Population Density (per 0.001 sqm)         0.033                0.021*      
##                                           (0.035)              (0.012)      
##                                                                             
## Constant                                  7.087***              0.853       
##                                           (1.860)              (0.776)      
##                                                                             
## ----------------------------------------------------------------------------
## Observations                                101                   82        
## R2                                         0.429                0.078       
## Adjusted R2                                0.399                0.017       
## Residual Std. Error                   0.493 (df = 95)      0.154 (df = 76)  
## F Statistic                        14.281*** (df = 5; 95) 1.279 (df = 5; 76)
## ============================================================================
## Note:                                            *p<0.1; **p<0.05; ***p<0.01
## 
## Senate Demographics Linear Model
## ===========================================================================
##                                              Dependent variable:           
##                                    ----------------------------------------
##                                                 Partisan Score             
##                                             (1)                 (2)        
## ---------------------------------------------------------------------------
## % With College Degree                    -0.056***             0.004       
##                                           (0.015)             (0.012)      
##                                                                            
## % Male                                   -0.139**              -0.033      
##                                           (0.060)             (0.043)      
##                                                                            
## % Below Poverty Line                     -0.074***             0.002       
##                                           (0.027)             (0.020)      
##                                                                            
## % White                                    0.003               0.001       
##                                           (0.010)             (0.007)      
##                                                                            
## Population Density (per 0.001 sqm)        -0.024               -0.009      
##                                           (0.060)             (0.039)      
##                                                                            
## Constant                                 8.819***              1.848       
##                                           (3.049)             (2.269)      
##                                                                            
## ---------------------------------------------------------------------------
## Observations                                46                   38        
## R2                                         0.348               0.051       
## Adjusted R2                                0.266               -0.097      
## Residual Std. Error                   0.407 (df = 40)     0.231 (df = 32)  
## F Statistic                        4.268*** (df = 5; 40) 0.346 (df = 5; 32)
## ===========================================================================
## Note:                                           *p<0.1; **p<0.05; ***p<0.01

OLS Regression models can be very hard to read, but in these two charts, only three takeaways are important. The first key number is the “R-Squared” value. This tells us that, in these models, nearly half of all variance in partisan scores is explained by the statistically significant demographic variables. The second key takeaway is the “p-value” of the correlation coefficients. The correlation coefficients are statistically significant if their p-values are less than 0.1 - meaning that we are 90% confident that the correlation is due to a real effect and not due to randomness. Note here that population density is not statistically significant in three of the four models. The third key takeaway is the values of the correlation coefficients. The “constant” is the predicted partisan score if every variable in the model was set to zero. The correlation coefficients next to each explanatory variable represent the predicted change in that correlation coefficient for every 1-unit change in said explainatory variable. Partisan score - the experimental variable - is on a scale of -1 to 1, so it is reasonable for the statistically significant correlation coefficients to be small, but the tiny coefficients for population density - even when statistically significant - indicate the weakness of the relationship and point to population density having no impact on partisan scores. From these models, I feel confident rejecting the hypothesis that population density effects partisanship.

The urban-rural divide and SB1647

As mentioned in the introduction, the urban-rural divide is often cited to explain the failure of bills within the Republican party. The most recent example of this was the failure of SB1647 - the 2022 school voucher bill. As it turns out, this explanation does not hold up to scrutiny.

The distributions of population density and partisan score by vote choice on this bill are displayed below:

Box plots can be challenging to read, but they are rather simple. The rectangles represent the middle 50% (the mean) of the distributions, with the bottom lines marking the 25th percentiles and the top lines marking the 75th percentiles. The vertical lines extending out from either end of the box plots represent the upper and lower quartiles (75 + (1.5 * IQR) and 25 - 1.5 * IQR)). Finally the dots past the quartile whiskers represent outliers. The specific math is not as important as the interpretation, and these box plots tell us that the distribution of district population density for both vote choices is much wider than the distribution of partisan scores. These plots indicate a strong relationship between partisan score and vote choice and a weak or non-existent relationship between population density and SB1647 vote choice.

My predictive statistical models bolster the explaination inferred from the above box plots. These models are logistic regressions. Logistic regressions show the correlation between two or more continuous variables and two or more categorical variables. In this chart, the first model is all legislators, and the second model is only Republicans. These models use roll call voting data for SB1647 only. See below:

## 
## Logistic Regresson for SB1647 Vote
## ===============================================================
##                                        Dependent variable:     
##                                    ----------------------------
##                                            SB1647 Vote         
##                                         (1)            (2)     
## ---------------------------------------------------------------
## Population Density (per 0.001 sqm)     0.249          0.315    
##                                       (0.298)        (0.330)   
##                                                                
## Partisan Score                         1.433*         0.145    
##                                       (0.768)        (1.508)   
##                                                                
## Constant                               -0.488        -0.087    
##                                       (0.420)        (0.642)   
##                                                                
## ---------------------------------------------------------------
## Observations                             46            38      
## Log Likelihood                        -29.439        -25.646   
## Akaike Inf. Crit.                      64.878        57.293    
## ===============================================================
## Note:                               *p<0.1; **p<0.05; ***p<0.01

These models are interpreted much like the previous OLS regression models. It can be seen that population density again fails to achieve statistical significance. My models also show an extremely strong relationship between partisan score and vote choice, meaning partisan score likely explains the majority of variance.

Conclusion: my framework for understanding partisanship in Oklahoma

In place of the urban-rural divide hypothesis, I present a demographic-based hypothesis for understanding Republican Party partisanship in Oklahoma. My findings indicate that the gender composition, level of education, and poverty density of a legislative district explain much of the variance in partisan behavior and vote choice. This hypothesis is supported by the broader political science literature and, as I believe I have shown, does a much better job of explaining politics in Oklahoma.

Further evidence of this hypothesis can be seen in the graphs below. The first set shows level of education and partisan scores:

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

Poverty density and partisanship:

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

Gender distribution and partisanship:

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

All of these relationship appear to exist within the Republican Party in these graphs, and as can be seen in my modeling, they clearly matter more in determining partisanship than population density.

I believe I have provided sufficient statistical evidence to reject the urban-rural hypothesis. Instead, I believe a demographic-divide hypothesis is a better framework for understanding political divides in Oklahoma.

Oklahoma is divided along the same demographic lines as the rest of the country - gender, college education, and wealth. Political thinkers in must update their understanding of what divides truly exist in our state in order to better interpret political events and make more informed political decisions.