The last blogpost of this series was the “tool” post: the actual coding sheet. This final part is about whether that tool is worth using. The first big question is reliability: what happens if two people watch the same villain and still end up with completely different Big Five scores? If this framework is supposed to support systematic comparison and statistical analysis, it can’t rely entirely on one person’s subjective intuition. It needs to be stable across observers.
Making it trustworthy: coder agreement explained step-by-step
The solution is inter-rater reliability testing. That’s standard scientific practice: you get two different people to code the exact same character independently, and then you measure how similar their results are. The simplest way to understand it:
Imagine Coder A and Coder B both rate the Joker’s Neuroticism using the four Neuroticism items from the sheet.
Coder A gives scores of 7, 6, 7, 7 (average 6.75).
Coder B gives scores of 6, 6, 7, 6 (average 6.25).
Their final trait scores are only 0.5 points apart on a 1–7 scale. That means both people basically saw the same emotional profile: high instability, visible distress, volatility, and conflict. Small differences like this are normal. The “scientific way” (takes 30 seconds in Excel): Put the five averaged trait scores (E, A, C, N, O) from Coder A into one row and Coder B into the row below. Then use the formula:
=CORREL(A1:A5,B1:B5)
This returns a number between 0.0 (total disagreement) and 1.0 (perfect agreement). Research shows that values above 0.70 are usually considered acceptable (Gosling et al., 2003). If you clear that threshold consistently, your measurement is stable enough to trust. What this looks like in the analysis database Once you start building a database, you can add coder agreement as a quality-control column:
Character | Media | E | A | C | N | O | Coder Agreement
Joker | The Dark Knight | 6.5 | 1.5 | 2.0 | 7.0 | 7.0 | 0.82
Thanos | MCU | 5.0 | 2.5 | 7.0 | 3.0 | 6.0 | 0.91
Thomas Shelby | Peaky Blinders | 6.0 | 3.5 | 7.0 | 6.25 | 6.5 | test needed
Why this works across every type of antagonist and antihero
The entire point of this coding sheet is universality. I designed the 20 items to work across different genres, archetypes, and narrative roles. They don’t require special rules for “crime boss vs. fantasy villain vs. horror monster.” They simply capture observable patterns.
Mastermind types (Thanos, Hannibal Lecter, Littlefinger)
These tend to cluster around extremely high Conscientiousness, high Openness, and rock-bottom Agreeableness.
Chaos tricksters (Joker, Loki)
Usually high Openness, high Neuroticism, low Conscientiousness.
Silent brutes (Michael Myers, Grundy)
Low Extraversion, low Openness, low Neuroticism.
Tragic antiheroes (Walter White, Anakin Skywalker)
Often show changing profiles over time – especially rising Conscientiousness and surging Neuroticism as their arc progresses.
Same sheet, same math, different profiles. Perfect for pattern-finding. Thesis-ready and scalable.
From a project-planning perspective, the method is built for scale:
Time investment:
- 10 – 15 minutes per main character/antagonist
- 3 – 5 minutes for minor villains
Minimum viable sample: ~20 characters for early pattern detection and basic clustering
Reliability target: coder agreement above 0.70
Evidence confidence rating: I also plan to track evidence confidence based on screen time (e.g., 1 star = limited screen time, 3 stars = full narrative visibility).
That’s it for the Big Five part of the framework for now.
With the Personality Profile layer operationalized, I now have a concrete way to compare villains by personality structure without turning every analysis into a full essay. Next up, I’ll return to the other layers of the framework – especially Observable Traits (ACIS/visual coding) and Symbolism & Motivation – and then eventually start applying the full combined method to actual villains.
Until then — see ya!
Literature:
- Costa, Paul T., and Robert R. McCrae. Revised NEO Personality Inventory (NEO PI-R) and NEO Five-factor Inventory (NEO-FFI). Psychological Assessment Resources (PAR), 1985.
- Gosling, Samuel D., Peter J. Rentfrow, and William B. Swann Jr. “A very brief measure of the Big-Five personality domains.” Journal of Research in personality 37.6 (2003): 504-528.
- John, Oliver P., Laura P. Naumann, and Christopher J. Soto. “Paradigm shift to the integrative big five trait taxonomy.” Handbook of personality: Theory and research 3.2 (2008): 114-158.
- Soto, Christopher J., and Oliver P. John. “The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power.” Journal of personality and social psychology 113.1 (2017): 117.