PSYCHOACTIVE TRIGGERS AS A STIMULUS BATTERY FOR MEASURING LARGE LANGUAGE MODELS (LLMs): A BRIDGE BETWEEN PSYCHOMETRICS, CLINICAL PSYCHOLOGY, AND LLM ENGINEERING
DOI:
https://doi.org/10.69635/mssl.2026.2.1.31Keywords:
LLM Evaluation, Psychometrics, Prompt Sensitivity, Reproducibility, Behavioral Profiling, Clinical Safety, PersonaMatrix, TestPersonaAbstract
Evaluating large language models (LLMs) in psychologically sensitive and human-centered domains faces two persistent challenges. First, conventional benchmarks capture instrumental capabilities but often fail to represent model behavior in open-ended dialogue where emotional context, conflict, ambiguity, and user safety define quality. Second, LLM outputs can be unstable across re-runs and highly sensitive to prompt phrasing, undermining reproducibility and cross-model comparisons [1].
This article introduces an applied framework of psychoactive triggers: standardized textual stimuli designed to evoke systematic shifts in response style, narrative coherence, explanatory stance, empathy calibration, and risk regulation. Psychoactive triggers are treated as an analogue of psychometric items adapted to LLMs: each trigger carries a controlled psychological load (e.g., threat, shame, guilt, control, intimacy, autonomy), allowing measurement of stable behavioral patterns rather than binary correctness. The framework is illustrated using the PersonaMatrix ecosystem, where trigger batteries are applied in multiple measurement waves.
A four-class metric taxonomy is proposed, with this paper focusing on Class I metrics—reproducibility and stability (RSI/IDS/RCS)—using a single PersonaMatrix test, “What Is My Character Type?” (TestPersona). Written at the intersection of LLM research and clinical psychology, the article provides clinical rationale and ethical constraints for safe deployment of psychologically loaded evaluations.
References
Zhou, L., Schellaert, W., Martínez-Plumed, F., Moros-Daval, Y., Ferri, C., & Hernández-Orallo, J. (2024). Larger and more instructable language models become less reliable. Nature, 634, 61–68. https://doi.org/10.1038/s41586-024-07930-y
Serapio-García, G., et al. (2025). A psychometric framework for evaluating and shaping personality traits in large language models. Nature Machine Intelligence. https://doi.org/10.1038/s42256-025-01115-6
Moore, J., Grabb, D., Agnew, W., Klyman, K., Chancellor, S., Ong, D. C., & Haber, N. (2025). Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers. Proceedings of FAccT ’25. https://doi.org/10.1145/3715275.3732039
Lahuta, L. (2024). Neuropsychological mechanisms of the influence of shamanic practices on creativity: A mixed-methods study. In O. Kostenko & Y. Yekhanurov (Eds.), Digital transformation in Ukraine: AI, metaverse, and society 5.0 (pp. 189–192). SciFormat Publishing Inc. https://doi.org/10.69635/978-1-0690482-1-9-ch26
Klymenko, K. O., & Kostenko, O. V. (2020). Problems of legal support for the functioning of the infrastructure of electronic administrative services. Current Problems of the State and Law, 87, 65–71. https://doi.org/10.32837/apdp.v0i87.2799
Klymenko, K., & Kostenko, O. (2020). Information activity and information support of the lawyer’s activity in Ukraine. World Science. Legal and Political Science, 4(3(55)), 4–7. https://doi.org/10.31435/rsglobal_ws/31032020/6971
Potamitis, N., et al. (2025). Benchmarking the (in)stability of LLM reasoning. arXiv preprint arXiv:2512.07795.
Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., … Zhang, Y. (2022). Holistic evaluation of language models (HELM). arXiv preprint arXiv:2211.09110.
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., … Stoica, I. (2023). Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv preprint arXiv:2306.05685.
Dubois, Y., Gallegos, I. O., Rush, A., & Klein, D. (2024). Length-controlled AlpacaEval: A simple way to debias automatic evaluators. arXiv preprint arXiv:2404.04475.
Biderman, S., et al. (2024). Evaluation Harness (lm-eval): An open source library for independent, reproducible, and extensible evaluation of language models. arXiv preprint arXiv:2405.14782.
Kostenko Oleksii. (2025). DIGITAL JURISDICTION. Metaverse Science, Society and Law, 1(2). https://doi.org/10.69635/mssl.2025.1.2.23
Published
Issue
Section
License
Copyright (c) 2026 Drobakha Anatoliy, Kalitkin Mykhailo, Klymenko Kateryna, Nayda Roman, Lahuta Liudmyla, Kostenko Oleksii (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles are published as open access and are licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). This means that authors retain the copyright to the content of their articles. Under the CC BY 4.0 license, the content can be copied, adapted, displayed, distributed, republished, or otherwise reused for any purpose, including commercial use, provided that proper attribution is given to the original authors.
https://orcid.org/0009-0003-0283-878X
