Similarity Scores and Uniqueness Index

I was going to add this stat to the FAQ, but ended up with enough material that I felt it warranted a brief post. I’ve been working with a new stat: Uniqueness Index (UQI).

Similarity scores are becoming increasingly common in the draftnik world. They were originally used by Bill James to track baseball career arcs, and the same concepts can be applied to relate athletic profiles. We can use this kind of tool to compare a player from the current class to the historical NFL data set, determining a list of similar players to aid in analysis.

Uniqueness Index is a simple extension of this idea. In my simScore formulation, I define “S80” as the list of all players who achieve an 80 similarity score and above. This forms the list of all significant comparable players, with 80 representing one-half of a standard deviation from a perfect match. This is a pretty common idea with similarity scores.

I also calculate “S60,” which relates a list of players who fall, on average, within a full standard deviation of the given athletic profile. 60 represents a very weak comparison, and I don’t typically use any comps which fall below 80. Still, the players above 60 are, in a general sense, from the same category as the target profile.

UQI is simply the percentage of players in the given position group who fail to meet the S60 requirements. The basic formula is as shown below:

UQI

Players of a unique profile will have a high UQI as a smaller portion of the available data set meets the S60 similarity requirements. Standard athletic profiles will be within reach of a large portion of the available data set, and will thus have a low uniqueness.

While it’s theoretically possible to achieve a 100 UQI, the highest I have on record is 99, a mark reached by 3 of the 4 current members of the 3sigma club — Calvin Johnson, Evan Mathis, and Lane Johnson. J.J. Watt, the Fourth Musketeer, rings in at a 97.

The lower end of UQI is typically about 25-30, which represents the most dead-average profile possible. LSU OT prospect La’el Collins has one of the lower marks in the current draft class with a 27 UQI. It’s not a judgment of his ability, but does show that his build and athleticism are entirely fungible.

The correlation between UQI and SPARQ exists but is not perfect. Players who test better and worse will typically have fewer peers than those who test in the average range, but there are examples of average testers with unique profiles. For example, Wes Saxton and Jesse James are side-by-side in the TE SPARQ chart, but their UQI are very different.

JesseJames

The two have a nearly identical pSPARQ but fall in very different worlds of UQI. Jesse James is a more rare athlete than Wes Saxton.

That’s not necessarily good or bad. Part of the idea behind UQI is that we gain an appreciation of what we have and haven’t seen before. We can try to project Bud Dupree, but with a UQI of 95, we haven’t really seen many athletes like Bud Dupree before. To give an idea of how it works, here are a few other UQIs from the current class.:

UQI

I’ll be expanding on some further metadata related to similarity scores, but UQI is a good enough start for now.

-ZW

11 thoughts on “Similarity Scores and Uniqueness Index

    1. zachwhitman Post author

      I haven’t looked at this. Currently, I’m using UQI more as a descriptive tool than anything else. How unique is a given player, have we seen them before — I feel like this kind of context can aid in film study and general evaluation.

      Like

      Reply
  1. Aaron Macfadden

    LOVE this kind of data!. If you could take each position group ,& organize the UQI/SPARQ scores in a graph, like you have done for the random players selected above… That would make my breakdown of this draft a LOT easier.

    Like

    Reply
  2. Brian Anderson

    If you’re ever looking for a funny, light topic to write about, I think something on quarterback simscores compared to other positions would be pretty fun. E.g., Tom Brady is such a bad athlete, his nearest simscore is X offensive tackle. Or, RGIII is such a good athlete, his nearest simscore is Y safety. Comparing across positions could make for an entertaining article.

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s