A primary challenge in doing research on racial discrimination is that you need to answer the "what if" questions. For example, it's not enough for research to show that blacks are pulled over by police for traffic stops more often than whites. What if more blacks were driving in a way that caused them to be pulled over more often? A researcher can't just dismiss that possibility. Instead, you need to find a way to think about the available data in a way that addresses these kinds of "what if" questions.
When it comes to traffic stops, for example, one approach is to look at such stops in the shifting time window between daytime and darkness. For example, compare the rate at which blacks and whites are pulled over for traffic stops in a certain city during a time of year when it's light outside at 7 pm and at a time of year when it's dark outside at 7 pm. One key difference here is that when it's light outside, it's a lot easier for the police to see the race of the driver. If the black-white difference in traffic stops around 7 in the evening is a lot larger when it's light at that hour than when it's dark at that hour, then racial discrimination is a plausible answer. Taking this idea a step further, a researcher can look at the time period just before and after the Daylight Savings Time time shifts.
A team of authors use this approach and others in "A large-scale analysis of racial disparities in police stops across the United States," published in Nature Human Behavior (July 2020, pp. 736-745, authors are Emma Pierson, Camelia Simoiu, Jan Overgoor , Sam Corbett-Davies, Daniel Jenson,
Amy Shoemaker , Vignesh Ramachandran, Phoebe Barghouty, Cheryl Phillips, Ravi Shroff and Sharad Goel ). The authors make public records request in all 50 states, but (so far) have ended up with "a dataset detailing nearly 100 million traffic stops carried out by 21 state patrol agencies and 35 municipal police departments over almost a decade." Their analysis sounds like this:
In particular, among state patrol stops, the annual per-capita stop rate for black drivers was 0.10 compared to 0.07 for white drivers; and among municipal police stops, the annual per-capita stop rate for black drivers was 0.20 compared to 0.14 for white drivers. For Hispanic drivers, however, we found that stop rates were lower than for white drivers: 0.05 for stops conducted by state patrol (compared to 0.07 for white drivers) and 0.09 for those conducted by municipal police departments (compared to 0.14 for white drivers). ...
These numbers are a starting point for understanding racial disparities in traffic stops, but they do not, per se, provide strong evidence of racially disparate treatment. In particular, per-capita stop rates do not account for possible race-specific differences in driving behaviour, including amount of time spent on the road and adherence to traffic laws. For example, if black drivers, hypothetically, spend more time on the road than white drivers, that could explain the higher stop rates we see for the former, even in the absence of discrimination. Moreover, drivers may not live in the jurisdictions where they were stopped, further complicating the interpretation of population benchmarks.
But here's some data from the Texas State Patrol on the share of blacks stopped in different evening time windows: 7:00-7:15, 7:15-7:30, and 7:30-7:45. A vertical line shows "dusk," considered the time when it is dark. The researchers ignore the 30 minutes before dusk, when the light is fading, and focus on when the period before and after that window. You can see that the share of black drivers stopped is higher in the daylight, and then lower after dark.
Another test for racial discrimination looks at the rate in which cars are searched, and then looks at the success rate of those searches. Interpreting the result of this kind of test can be mildly complex, and it's useful to go through two steps to understand the analysis. The the authors explain the first step in this way:
In these jurisdictions, stopped black and Hispanic drivers were searched about twice as often as stopped white drivers. To assess whether this gap resulted from biased decision-making, we apply the outcome test, originally proposed by Becker, to circumvent omitted variable bias in traditional tests of discrimination. The outcome test is based not on the search rate but on the ‘hit rate’: the proportion of searches that successfully turn up contraband. Becker argued that even if minority drivers are more likely to carry contraband, in the absence of discrimination, searched minorities should still be found to have contraband at the same rate as searched whites. If searches of minorities are successful less often than searches of whites, it suggests that officers are applying a double standard, searching minorities on the basis of less evidence. ...
Across jurisdictions, we consistently found that searches of Hispanic drivers were less successful than those of white drivers. However, searches of white and black drivers had more comparable hit rates. The outcome test thus indicates that search decisions may be biased against Hispanic drivers, but the evidence is more ambiguous for black drivers.
This approach sounds plausible, but if you think about it a little more deeply, it's straightforward to come up with examples where might not work so well. Here's an example:
[S]uppose that there are two, easily distinguishable, types of white driver: those who have a 5% chance of carrying contraband and those who have a 75% chance of carrying contraband. Likewise assume that black drivers have either a 5 or 50% chance of carrying contraband. If officers search drivers who are at least 10% likely to be carrying contraband, then searches of white drivers will be successful 75% of the time whereas searches of black drivers will be successful only 50% of the time. Thus, although the search criterion is applied in a race-neutral manner, the hit rate for black drivers is lower than that for white drivers and the outcome test would (incorrectly) conclude that searches are biased against black drivers. The outcome test can similarly fail to detect discrimination when it is present.
To put it another way, the decision to search a vehicle is binary: you do it or you don't do it. Thus, the key issue is the threshold that a police officer applies in deciding to search. As in this example, you can think of the threshold in this way: if the percentage chance of finding something is above the threshold level, a search happens; if it's below that level, a search doesn't happen. The next step is to estimate these threshold probabilities:
In aggregate across cities, the inferred threshold for white drivers is 10.0% compared to 5.0 and 4.6% for black and Hispanic drivers, respectively. ... Compared to by-location hit rates, the threshold test more strongly suggests discrimination against black drivers, particularly for municipal stops. Consistent with past work, this difference appears to be driven by a small but disproportionate number of black drivers who have a high inferred likelihood of carrying contraband. Thus, even though the threshold test finds that the bar for searching black drivers is lower than that for white drivers, these groups have more similar hit rates.
A short takeaway from this research is that when blacks complain about being stopped more often by police, there is solid research evidence backing up this claim. The evidence on blacks being searched more often in a traffic stop is real, but probably best-viewed as a little weaker, because it doesn't show up in the basic "success rate of searches" data and instead requires the more complex threshold analysis.
For other discussions of how social scientists try to pin down evidence the extent to which racial discrimination underlies racial disparities, see: