The science behind Pythia

What personality science tells us about who thrives at work.

For more than three decades, personality research has been quietly building one of the most robust bodies of evidence in the behavioural sciences. Pythia is built on that evidence. This page walks through the work that informs our product — and the boundaries we hold ourselves to.

The Big Five: the model we rely on

Pythia uses the Five-Factor Model of personality — commonly known as the Big Five — which organises individual differences along five dimensions: openness, conscientiousness, extraversion, agreeableness, and neuroticism (emotional stability). The model emerged from decades of lexical and factor-analytic work and is now the dominant framework in personality psychology, with strong cross-cultural replication (McCrae & Costa, 2008; Goldberg, 1993).

Unlike type-based instruments such as the MBTI, the Big Five measures traits on continuous spectra and has accumulated substantial evidence of reliability and predictive validity across samples, methods, and decades of follow-up (Roberts, Kuncel, Shiner, Caspi & Goldberg, 2007).

Personality predicts work outcomes

The foundational meta-analysis of Barrick and Mount (1991) established that Big Five traits — particularly conscientiousness — predict job performance across occupations. Hurtz and Donovan (2000) confirmed and refined these findings a decade later, and more recent meta-analytic work (Judge, Bono, Ilies & Gerhardt, 2002; Wilmot & Ones, 2019) continues to support the link.

The current state of the evidence, synthesised in a 2025 review covering Big Five, HEXACO, and related models, is that personality traits are among the strongest non-cognitive predictors of job performance, with conscientiousness the most consistent predictor across task performance, organisational citizenship behaviour, and counterproductive work behaviour (van Aarde, Meiring & Wiernik, 2025).

The picture is not one-size-fits-all. Different roles activate different traits: extraversion and openness matter more in interpersonal and leadership roles; conscientiousness dominates in structured, execution-heavy work (Judge et al., 2002). Facet-level analysis — looking within each trait — has been shown to improve predictive precision for specific outcomes such as promotions and leadership emergence (Soto & John, 2017; Wihler, Meurs, Wiesmann, Troll & Blickle, 2017).

How strong are these effects?

Personality-to-outcome correlations typically sit in the .10 to .30 range. That might sound modest, but as Roberts and colleagues (2007) have argued, these effect sizes are comparable to — and often larger than — those found for socioeconomic status or cognitive ability in predicting life outcomes, and larger than the effects of many widely accepted medical interventions. The replicability of these associations is also strong: Soto (2019) found that 87% of previously published Big Five–outcome links replicated in a large preregistered study, though effect sizes were on average 77% of original estimates — a useful reminder to be conservative about individual predictions.

Where AI fits in

A recent paper in Nature Human Behaviour is especially relevant to what we build. Wright and colleagues (2025) showed that widely available large language models can, out of the box, score Big Five traits from brief open-ended narratives — streams of thought and video diaries — with accuracy that matches or exceeds established benchmarks such as self–other agreement and bespoke machine learning models.

Featured finding

Wright et al. (2025), Nature Human Behaviour — LLM-derived personality scores from short open-ended text converged with self-report measures and predicted daily behaviour and mental health, outperforming traditional closed-vocabulary text analysis. Averaging scores across multiple LLMs produced the strongest agreement with self-report.

This line of work builds on a longer tradition of extracting personality signals from language and behaviour. Park and colleagues (2015) demonstrated automatic personality assessment through social media language; Youyou, Kosinski and Stillwell (2015) showed that computer-based personality judgements from digital footprints were more accurate than those made by close acquaintances; Stachl and colleagues (2020) predicted Big Five traits from smartphone behaviour patterns. Pythia draws on these developments while anchoring its primary measurement in validated self-report instruments.

How we use this evidence responsibly

Predictive validity is not destiny. The same literature that supports personality assessment in hiring also cautions against using it as a sole selection tool. Best practice — reflected in guidance from the American Psychological Association and the Society for Industrial and Organizational Psychology — is to combine personality assessment with structured interviews, work samples, and cognitive measures.

We take three concrete positions. First, Pythia reports traits descriptively, not as pass/fail screens. Second, we interpret scores in the context of the role, not in isolation — because the trait–performance link depends on what the job actually demands. Third, we surface confidence, not certainty: the science gives us meaningful signal, not deterministic prediction.

References

Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.
Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist, 48(1), 26–34.
Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of Applied Psychology, 85(6), 869–879.
Judge, T. A., Bono, J. E., Ilies, R., & Gerhardt, M. W. (2002). Personality and leadership: A qualitative and quantitative review. Journal of Applied Psychology, 87(4), 765–780.
McCrae, R. R., & Costa, P. T. (2008). The Five-Factor Theory of personality. In O. P. John, R. W. Robins, & L. A. Pervin (Eds.), Handbook of personality: Theory and research (3rd ed., pp. 159–181). Guilford Press.
Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., & Seligman, M. E. P. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108(6), 934–952.
Roberts, B. W., Kuncel, N. R., Shiner, R., Caspi, A., & Goldberg, L. R. (2007). The power of personality: The comparative validity of personality traits, socioeconomic status, and cognitive ability for predicting important life outcomes. Perspectives on Psychological Science, 2(4), 313–345.
Soto, C. J. (2019). How replicable are links between personality traits and consequential life outcomes? The Life Outcomes of Personality Replication Project. Psychological Science, 30(5), 711–727.
Soto, C. J., & John, O. P. (2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143.
Stachl, C., Au, Q., Schoedel, R., Gosling, S. D., Harari, G. M., Buschek, D., Völkel, S. T., Schuwerk, T., Oldemeier, M., Ullmann, T., Hussmann, H., Bischl, B., & Bühner, M. (2020). Predicting personality from patterns of behavior collected with smartphones. Proceedings of the National Academy of Sciences, 117(30), 17680–17687.
van Aarde, N., Meiring, D., & Wiernik, B. M. (2025). Personality and job performance: A review of trait models and recent trends. Current Opinion in Psychology.
Wihler, A., Meurs, J. A., Wiesmann, D., Troll, L., & Blickle, G. (2017). Extraversion and adaptive performance: Integrating trait theory and trait activation theory. Personality and Individual Differences, 116, 133–138.
Wilmot, M. P., & Ones, D. S. (2019). A century of research on conscientiousness at work. Proceedings of the National Academy of Sciences, 116(46), 23004–23010.
Wright, A. G. C., et al. (2025). Assessing personality using zero-shot generative AI scoring of brief open-ended text. Nature Human Behaviour.
Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4), 1036–1040.

Pythia administers a Big Five assessment based on the NEO PI-R tradition and reports trait and facet scores alongside contextual benchmarks. We continually review new research and update our measurement approach accordingly. Questions about our methodology? Get in touch.