PSYCHOACTIVE TRIGGERS AS A STIMULUS BATTERY FOR MEASURING LARGE LANGUAGE MODELS (LLMs): A BRIDGE BETWEEN PSYCHOMETRICS, CLINICAL PSYCHOLOGY, AND LLM ENGINEERING

Drobakha Anatoliy; Kalitkin Mykhailo; Klymenko Kateryna; Nayda Roman; Lahuta Liudmyla; Kostenko Oleksii

doi:10.69635/mssl.2026.2.1.31

Authors

Drobakha Anatoliy M.Sc. in Psychology, Independent Researcher, PersonaMatrix Project, United States Author https://orcid.org/0009-0003-0283-878X
Kalitkin Mykhailo M.Sc., Independent EdTech Researcher, UNOWA-PersonaMatrix Project, Portugal Author
Klymenko Kateryna Researcher, Research Institute of Informatics and Law, National Academy of Legal Sciences of Ukraine, Kyiv, Ukraine Author https://orcid.org/0000-0002-5227-2329
Nayda Roman M.Sc., Independent Researcher (Systems & Data Analysis), PersonaMatrix Project, Ukraine Author
Lahuta Liudmyla CEO, Institute of Psychological Maturity, United States Author
Kostenko Oleksii Ph.D., Associate Professor, State Scientific Institution «Institute of Information, Security and Law of the National Academy of Legal Sciences of Ukraine», Ukraine Author https://orcid.org/0000-0002-2131-0281

DOI:

https://doi.org/10.69635/mssl.2026.2.1.31

Keywords:

LLM Evaluation, Psychometrics, Prompt Sensitivity, Reproducibility, Behavioral Profiling, Clinical Safety, PersonaMatrix, TestPersona

Abstract

Evaluating large language models (LLMs) in psychologically sensitive and human-centered domains faces two persistent challenges. First, conventional benchmarks capture instrumental capabilities but often fail to represent model behavior in open-ended dialogue where emotional context, conflict, ambiguity, and user safety define quality. Second, LLM outputs can be unstable across re-runs and highly sensitive to prompt phrasing, undermining reproducibility and cross-model comparisons [1].

This article introduces an applied framework of psychoactive triggers: standardized textual stimuli designed to evoke systematic shifts in response style, narrative coherence, explanatory stance, empathy calibration, and risk regulation. Psychoactive triggers are treated as an analogue of psychometric items adapted to LLMs: each trigger carries a controlled psychological load (e.g., threat, shame, guilt, control, intimacy, autonomy), allowing measurement of stable behavioral patterns rather than binary correctness. The framework is illustrated using the PersonaMatrix ecosystem, where trigger batteries are applied in multiple measurement waves.

A four-class metric taxonomy is proposed, with this paper focusing on Class I metrics—reproducibility and stability (RSI/IDS/RCS)—using a single PersonaMatrix test, “What Is My Character Type?” (TestPersona). Written at the intersection of LLM research and clinical psychology, the article provides clinical rationale and ethical constraints for safe deployment of psychologically loaded evaluations.

References

Zhou, L., Schellaert, W., Martínez-Plumed, F., Moros-Daval, Y., Ferri, C., & Hernández-Orallo, J. (2024). Larger and more instructable language models become less reliable. Nature, 634, 61–68. https://doi.org/10.1038/s41586-024-07930-y

Serapio-García, G., et al. (2025). A psychometric framework for evaluating and shaping personality traits in large language models. Nature Machine Intelligence. https://doi.org/10.1038/s42256-025-01115-6

Moore, J., Grabb, D., Agnew, W., Klyman, K., Chancellor, S., Ong, D. C., & Haber, N. (2025). Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers. Proceedings of FAccT ’25. https://doi.org/10.1145/3715275.3732039

Lahuta, L. (2024). Neuropsychological mechanisms of the influence of shamanic practices on creativity: A mixed-methods study. In O. Kostenko & Y. Yekhanurov (Eds.), Digital transformation in Ukraine: AI, metaverse, and society 5.0 (pp. 189–192). SciFormat Publishing Inc. https://doi.org/10.69635/978-1-0690482-1-9-ch26

Klymenko, K. O., & Kostenko, O. V. (2020). Problems of legal support for the functioning of the infrastructure of electronic administrative services. Current Problems of the State and Law, 87, 65–71. https://doi.org/10.32837/apdp.v0i87.2799

Klymenko, K., & Kostenko, O. (2020). Information activity and information support of the lawyer’s activity in Ukraine. World Science. Legal and Political Science, 4(3(55)), 4–7. https://doi.org/10.31435/rsglobal_ws/31032020/6971

Potamitis, N., et al. (2025). Benchmarking the (in)stability of LLM reasoning. arXiv preprint arXiv:2512.07795.

Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., … Zhang, Y. (2022). Holistic evaluation of language models (HELM). arXiv preprint arXiv:2211.09110.

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., … Stoica, I. (2023). Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv preprint arXiv:2306.05685.

Dubois, Y., Gallegos, I. O., Rush, A., & Klein, D. (2024). Length-controlled AlpacaEval: A simple way to debias automatic evaluators. arXiv preprint arXiv:2404.04475.

Biderman, S., et al. (2024). Evaluation Harness (lm-eval): An open source library for independent, reproducible, and extensible evaluation of language models. arXiv preprint arXiv:2405.14782.

Kostenko Oleksii. (2025). DIGITAL JURISDICTION. Metaverse Science, Society and Law, 1(2). https://doi.org/10.69635/mssl.2025.1.2.23