There were a lot of complaints during Bills-Cardinals in Week 1
At long last, Buffalo Bills football is back. You know what that means! The real joy of the season. Talkin’ rules and proprietary penalty statistics. For those of you new to Buffalo Rumblings, this is your spot to talk about the officiating each week. Here we dive into the rulebook, talk about fairness, and of course apply the Penalty Harm stat.
What’s that? Beyond the yards, Harm assesses a weighted scale to evaluate the damage a penalty caused. Did it negate a first down? Points? Giving up free downs? All that and more goes into the formula. The formula that received quite the workout in Week 1. Let’s dive in.
Standard and Advanced Metrics
Penalty Counts
This chart is mostly straightforward. For the newcomers, I track the assessed flags (left side of bars) as well as the total flags, including offset and declined (right side). This was a somewhat unusual game with zero declined or offset penalties, so the two sides are nearly identical. The sole difference is the middle gray bar in each showing league averages. Before we discuss the mean, the game specific takeaway is simple. Buffalo had nearly twice as many flags as Arizona.
When it comes to averages, the two teams fall pretty squarely on opposite sides of that ledger. League averages tend to trickle down over the course of the year, but for the sake of reference teams averaged 5.67 assessed flags and 6.72 total flags last season. Those numbers are more realistic targets later in the year. For now just be prepared to deal with a couple extra flags per game while everyone settles in.
Penalty Yards
This is similar to above with similar results. One note here for any new readers is that the right-hand side counts assessed yards plus those negated or negatively impacted by penalty. This is not tracked league-wide, so the right side doesn’t have a gray bar for the average.
If you’re curious, the 2023 average for assessed yards was 46.73 per team each game.
Penalty Harm
Arizona Cardinals
Here’s where any new reader will be especially lost. I’ll discuss the formula a bit, but the basic idea is that the greater the Harm rating, the greater the likelihood a specific flag had a major impact on the game. Anyone still curious about the formula can ask in the comments below. It’s not a secret, I just don’t want to type it out every week.
For the Arizona Cardinals, they had a grand total of 6.0 Harm. Our dividing line between a good day and a bad day is 10.0 Harm, meaning Arizona’s flags weren’t likely to have been a major factor in the outcome.
The too-many-men-on-the-field flag cost them five yards and gave up a free first down from second. That’s 0.5 Harm for the yards, and 1.0 Harm for the one free down given.
Cornerback Sean Murphy-Bunting was a little grabby during the game, getting a yards-only defensive holding and a six-yard defensive pass interference flag that also was yardage only. Wide receiver Keon Coleman’s stat line might have looked even nicer if not for Murphy-Bunting.
Kicker Matt Prater’s flag came when a kick fell short of the landing zone. That spots the ball at the 40 yard-line. With the new kickoff rules I compared this to the touchback starting point of the 30-yard line for a difference of ten yards. This is a new wrinkle for penalties in the 2024 season.
The most-impactful flag of the day was a roughing the passer called on linebacker Zaven Collins. Live I thought this was a clean hit and a terrible call by the officials (even though it helped the Bills). I promised to look at the replay on social media to see how I felt afterward. Here’s the replay.
After review, I’ve mostly changed my mind. There are two points where I see a reason to throw the flag, and both are paused for your convenience. Collins pulls himself into Josh Allen and makes helmet-to-helmet contact with Allen. Note that there is no blanket rule on blows to the head, but Allen is protected as a passer so this is prohibited. Next up, defenders are forbidden from finishing a sack with any sort of punishing end. The rotation to the ground could be seen as an attempt to do so. Based on the timing of the flag I think it’s the blow to the head that was called but either would be a reasonable cause. While I’ve changed my tune on this and understand the flag, I will also go on record and say this isn’t on the egregious side of things in my opinion. If this hadn’t been called, I don’t think I’d be griping.
For the data lovers, this penalty gave up 15 yards from the assessed, as well as wiping out a nine-yard sack. That’s 1.5 + 0.9 for 2.4 Harm.
Buffalo Bills
There was a general perception that the refs had a poor game, including that opinion from yours truly. A lot of this was due to an early flag on cornerback Ja’Marcus Ingram for unnecessary roughness. The replay on this made it very clear that Ingram didn’t make forcible contact to quarterback Kyler Murray, was attempting to pull up, and had been committing to his tackle as Murray began his slide. In other words, the call was really bad.
I don’t have a GIF for you on this one as, overall, the common perception is one I agree with. My only disclaimer is that unlike, say, offensive holding (see below), unnecessary roughness calls are supposed to be made even if they’re ambiguous. Depending on the angle of the ref throwing the flag, I could see where perhaps their view wasn’t perfect and they were a bit oversensitive. In other words, this might be a forgivable error.
That flag was assessed for 10 yards (half the distance to the goal) and gave up two free downs for a Harm rating of 3.0. So it was a significant bad call.
Let’s take a look at a few other flags to see what the deal was with the refs. We’ll start with a false start called on left tackle Dion Dawkins. If a flag isn’t shown, that means I felt the penalty was valid.
There’s a very slight shift in Dawkins’ stance and purely technically speaking this is a false start. The rule is that the player has to move “in such a way as to simulate the start of a play.” I’m not convinced a lot of defenders would see that and believe the play is starting. I don’t like this call.
This is a bit more representative of a move that simulates the start of play. It’s slight for sure, but right tackle Spencer Brown’s flinch to me looks like he’s about to get going. I don’t mind this flag and it’s a good contrast to the Dawkins one.
Here’s Dawkins again but this time being called for holding. I did an entire piece on just offensive holding but the short version is that there should be a “material restriction” created by the player on offense. I can see where a ref may assume Dawkins gave a bit of a twist on this play but unlike a roughness flag the refs are supposed to hold the whistle unless they clearly see all elements of the infraction. I don’t like this call on Dawkins for that reason.
To contrast, Here’s left guard O’Cyrus Torrence creating a fairly obvious material restriction. I don’t mind this flag.
Last but not least, there’s no GIF for the facemask call on Torrence as it was clearly the right call. It rated so high on Harm because it negated a touchdown. All scores negated by penalty are rated as 1.0 Harm per point negated (seven for TDs). Add 15 yards assessed and four yards impacted to the mix for 7.0 + 1.5 + 0.4 = 8.9 Harm.
Remember that Harm tries to pinpoint which flags are the most likely to have impacted the game. The facemask penalty directly took four points off the board for Buffalo so Harm correctly calls this one out as a flag to discuss. To pile onto that point, Buffalo had 19.3 Harm total, which is way over the 10.0 threshold. It’s safe to say Buffalo would have fared even better had it not been for flags.