Location: Home >> Detail
TOTAL VIEWS
J Psychiatry Brain Sci. 2026;11(1):e260002. https://doi.org/10.20900/jpbs.20260002
1 Department of Computer Science, Kent State University, Kent, OH 44224, USA
2 Department of Computer Science, Information, and Engineering Technology, Youngstown State University, Youngstown, OH 44555, USA
3 Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
4 Department of Psychological Sciences, Kent State University, Kent, OH 44224, USA
5 College of Public Health, Kent State University, Kent, OH 44224, USA
* Correspondence: Jianfeng Zhu
Background: Early substance use during adolescence increases the risk for later substance use disorders and mental health problems, yet the real-time emotional and contextual factors underlying these behaviors remain poorly understood.
Methods: We analyzed approximately 23,000 substance use–related posts and an equal number of non–substance use posts from Reddit’s r/teenagers subreddit (2018–2022). Posts were annotated for discrete emotions (sadness, anger, joy, guilt, fear, shame, and disgust) and contextual influences (family, peers, and school) using large language models (LLMs). Group differences were assessed statistically, emotion–context associations were visualized using heatmaps, and an interpretable XGBoost model identified key predictors of substance-use discussions. In addition, LLM-based thematic analysis extracted latent psychosocial themes linking emotions with contexts, providing qualitative insight into adolescents’ lived experiences.
Results: Negative emotions, particularly sadness, guilt, fear, and disgust, were more prevalent in substance use posts, while joy predominated in non-substance use discussions. Guilt and shame showed distinct patterns: guilt often signaled regret and reparative intent, whereas shame reinforced risky dynamics through peer identity performance. Peer influence emerged as the strongest contextual driver, closely associated with sadness, fear, and guilt. In contrast, family and school environments displayed dual roles, acting as both risk and protective contexts depending on relational quality and stressors.
Conclusions: Adolescent substance use reflects the interplay of sadness, guilt, and peer influence as central risk factors, with family and school contexts can operate as sources of both risk and protection.
Adolescent substance use is a pressing public health concern. According to nationally representative surveys conducted in 2023, at least one in eight teenagers reported using an illicit substance in the past year; substance use among 8th graders increased by 61% between 2016 and 2020, and more than 62% of 12th graders reported alcohol misuse [1]. Adolescence is a critical developmental window during which initial experimentation often occurs, with tobacco and alcohol often preceding illicit drug use [2]. A range of factors contribute to substance use in teenagers, including peer influence, family dynamics, school performance, and emotional states, all of which shape adolescents’ risk profiles [3–5]. While these risk factors are well recognized, the real-time interplay between emotions and contextual experiences surrounding substance use remains poorly understood. Deepening our understanding of these moment-to-moment relationships is essential for informing educational strategies and designing timely, targeted interventions that resonate with adolescents’ daily realities.
Conventional surveys and longitudinal assessments provide valuable population-level estimates but are time-lagged, burdensome, and limited in their ability to capture rapidly shifting feelings and social contexts. Emotional states can fluctuate within minutes or hours, and exposure to peer interactions, school stressors, or family conflict is often episodic; these dynamics are difficult to capture with periodic questionnaires or clinic-based assessments. Ambulatory assessment methods, such as ecological momentary assessment and passive digital data collection, have been used to capture “real-world” microprocesses in youth [6], offering more granular insight into affective and behavioral fluctuations in daily life.
Social media platforms offer a complementary lens for examining these processes, capturing the dynamic, bidirectional interplay between adolescents’ behavior and contextual events. Adolescents use these platforms to share immediate mood states and narrate experiences embedded within peer, school, and family settings [7–9]. Recent studies have analyzed substance-related discussions among adolescents on Reddit (e.g., r/teenagers), revealing distinct emotional frames and narrative patterns associated with substance use experiences [10]. However, prior studies have leveraged social media data to track opioid-related content and other substance use topics at scale, enabling near-real-time surveillance of trends and risk factors [11,12]. Such passively generated or self-disclosed content provides a unique opportunity to observe adolescent thoughts, feelings, and behaviors as they occur in real time, in naturalistic settings [13].
Recent advances in artificial intelligence, particularly large language models (LLMs), such as those in the GPT family and Gemini, enable scalable, domain-adaptable annotation of text, including emotion detection and extraction of contextual themes [14]. Building on these advances, the present study analyzes adolescent discussions on Reddit to: (i) apply LLMs to annotate each post with the specific emotions and contextual factors (family, peer, and school); (ii) quantify emotion context co-occurrence patterns via heatmaps; and (iii) identify predictors of substance use-related posts using interpretable machine learning; and (iv) extract latent themes linking emotions with contexts through LLM-based thematic analysis. By connecting dynamic emotions with situated social contexts, this study seeks to identify actionable signals that can inform the development of prevention strategies and timely interventions for adolescent substance use.
Social media big data refers to large-scale, digitized records of naturally occurring communication on platforms such as Reddit, Twitter, and Facebook. A recent systematic scoping review of Reddit-based substance use research highlights the platform’s unique affordances for studying substance-related behaviors, motivations, and social support dynamics, while also identifying methodological challenges and best practices [15]. Individuals often share personal experiences, questions, and reflections on substance use online, while those struggling with addiction often seek social support through these communities [16,17]. By analyzing user-generated content, researchers can extract insights that are valuable for both academic research and applied settings [18].
Compared with traditional surveys, social media data provide scalable, near-real-time access to youth perspectives, including those from hard-to-reach populations such as homeless adolescents. This approach supports the identification of substance use patterns and informs prevention and intervention strategies [19]. Sentiment analysis has been widely used to assess public attitudes toward substances, including emotional responses to synthetic opioids such as fentanyl [20]. Moreover, the predominance of positive portrayals of substance use in online spaces may shape adolescent attitudes and norms, underscoring the need for balanced and developmentally appropriate prevention messaging [21].
Emotional and Contextual Factors in Adolescent Substance UseThe relationship between emotions (e.g., sadness, shame, anger, joy, guilt) and adolescent substance use is complex and multifaceted. Sadness and shame, in particular, have been shown to strongly predict substance involvement [22]. Emotional distress is recognized as a key mechanism in both the onset and maintenance of substance use disorders [23], whereas interventions targeting emotion regulation and psychological resilience can reduce reliance on substances as maladaptive coping strategies [24,25].
Contextual influences also play a central role in adolescent substance use. Peer behaviors and perceived peer norms strongly shape adolescents’ choices, with social pressure and fear of exclusion often prompting experimentation [26]. Peer pressure, together with concerns about social rejection, can exacerbate emotional and behavioral difficulties, leading adolescents to use substances as a coping mechanism or as a means of gaining acceptance within peer groups [27,28]. Family dynamics—such as neglectful parenting practices and parental alcohol use—have been linked to higher rates of substance use initiation [29]. School environments, including teacher-student relationships, school satisfaction, and academic performance, further influence substance use trajectories, functioning as either risk or protective factors depending on contextual conditions [30,31].
LLMs and Machine Learning ApproachesAdvances in natural language processing and LLMs have opened new avenues for analyzing substance use patterns in digital text data. Prior studies have shown that zero-shot and few-shot LLMs can achieve performance comparable to human annotators on tasks such as stance detection, open-text coding, and survey labeling [9,32–34]. However, model performance is sensitive to prompt design choices, including input length, output structure, and definition framing [35].
Beyond LLM-based approaches, tradition machine learning methods have been applied in mental health research for prediction and feature extraction. For instance, regression-based models combined with SHAP visualization have been used to predict adolescent anxiety onset by identifying key predictors [36]. Multi-view learning approaches that integrate heterogeneous social media signals (e.g., Facebook likes, status updates) have also achieved high accuracy in predicting substance use behaviors [37].
Despite these advances, prior research on adolescent substance use has often examined emotions, family, peer, or school-related influences in isolation, leaving important gaps in understanding their interplay in real-world digital contexts. Traditional survey-based methods further fail to capture the immediacy of youth expression, and relatively few studies have leveraged LLMs to annotate emotional and contextual factors at scale. Moreover, existing machine learning models frequently overlook these nuanced signals, limiting both interpretability and the development of targeted, context-sensitive interventions.
Research QuestionsBuilding on evidence that emotions (e.g., sadness and shame), family dynamics, peer influence, and school environments are central to adolescent substance use, the present study examines how these factors manifest in digital social spaces, specifically with Reddit’s r/teenagers subreddit. Prior research has often treated these dimensions independently, leaving gaps in understanding their interplay as expressed through real-time youth narratives. Furthermore, few studies have applied LLMs to jointly annotate emotional and contextual factors in adolescent populations.
Accordingly, this study addresses the following research questions:
RQ1: What emotions are most prevalent in adolescent substance-use posts, and how do they differ from non-substance-use posts?
RQ2: How do family, peer, and school contexts appear in substance-use posts, and how do these patterns differ from non-substance-use posts?
RQ3: How are emotions and contextual factors interrelated within substance-use and non-substance-use discussions?
RQ4: To what extent do emotional and contextual features predict substance-use content in machine learning models?
RQ5: How can LLMs extract latent subthemes from emotion-context pairings, and what additional insights do these provide beyond quantitative analyses?
At a high level, the analytic pipeline for this study is illustrated in Figure 1. The study is guided by five research questions (RQ1–RQ5), which span emotion prevalence, contextual influences, and LLM-assisted thematic analysis. By integrating Reddit data, LLM-based annotation, and machine learning methods, this work aims to clarify the interplay between emotion and context in adolescent substance use.
We analyzed user-generated posts from Reddit’s r/teenagers subreddit (2018–2022), a forum with over 3.3 million members. Substance use-related posts were identified using a curated set of keywords spanning multiple substances (e.g., alcohol, nicotine, cannabis, prescription drugs) [38]. For each post, the title and selftext fields were concatenated into a single text string. Duplicate posts were removed, and posts containing fewer than 10 whitespace-delimited tokens were excluded, as very short posts (e.g., “Yes,” “That’s cool”) typically lack sufficient content for reliable annotation.
To quantify the precision of the keyword-based filtering step, we manually reviewed a random sample of 300 keyword-matched posts. Of these, 270 posts were confirmed to contain explicit references to substance use, yielding a precision of 270/300 (0.90). A 95% confidence interval was estimated using the Wilson binomial method (95% CI: 0.861–0.929).
Following keyword-based filtering, we identified 23,275 substance-use posts and sampled an equal number of non-substance-use posts from the same time to construct a balanced dataset for subsequent analyses. The complete keyword list used for filtering is provided in Supplementary Table S1. Representative categories of ambiguous or non-substance keyword matches identified during manual review are summarized in Supplementary Table S2.
After further data cleaning and preprocessing, the final analytic sample comprised 21,169 substance-use posts and 17,781 non–substance-use posts, for a total of 38,950 posts. Representative examples are shown in Table 1.
LLM-Based Annotation of Emotions and Contexts FactorsWe utilized OpenAI’s GPT-4o-mini model [39] as a prompt-based annotator to assign one primary emotion label and up to three contextual labels to each post. All prompts were executed with temperature setting of 0.5, while other decoding parameters were left at their default API settings. Each post was processed once per prompt without retries or ensembling. This approach allows scalable yet nuanced content analysis, and prior studies show LLM-generated annotations align well with human coding [40–42].
Emotions (single label). We adopted seven discrete emotions defined by the International Survey on Emotion Antecedents and Reactions: joy, guilt, anger, disgust, fear, sadness, and shame [43]. Posts that did not express a clear emotion were labeled neutral. An example prompt was as follows: “You are an emotionally intelligent and empathetic agent. Analyze the following text and identify which one of the following emotions it best represents—joy, guilt, anger, disgust, fear, sadness, shame—or ‘neutral’ if no specific emotion is expressed.” The emotion annotation prompt is provided in Supplementary Box S1.
Neutral-labeled posts were retained in descriptive analyses but excluded from emotion-specific thematic analyses, which focused exclusively on the seven discrete emotions of interest.
Context annotation (multi-label). Each post was independently coded for the presence of family, peer, and school influences. Separate prompts were designed for each contextual domain [44]. For example, the peer-context prompt asked: “Determine whether the post shows any influence of peers (e.g., peer pressure, wanting to fit in, friends’ behaviors). If yes, output ‘Peer Influence’ and briefly describe the peer-related context; if not, output ‘None.’” Representative examples of the prompts used for family, peer, and school context annotation are provided in Supplementary Box S2.
Huma validation. We conducted a human-coded validation of LLM-generated emotion and contextual annotations on a sample of 300 posts. Human reviewers judged the correctness of each annotation. The annotation schema (Table S4), aggregate validation results (Table S5), and schematic examples (Table S6) are provided in the Supplementary Materials.
Statistical & Machine Learning AnalysesWe conducted chi-square tests and independent-samples t tests to compare the distributions and mean endorsement rates of emotional and contextual features between substance-use and non-substance-use posts. To visualize associations between emotions and contextual factors, we generated a correlation heatmap based on pairwise correlations among emotional and contextual features.
To assess predictive performance, we trained a binary XGBoost classifier (gradient-boosted decision trees) [45] to distinguish substance-use posts from non-substance-use controls, using the emotion and context labels as input features. Model performance was evaluated on a held-out test set comprising 30% of the data following a random train-test split, using accuracy, precision, recall (sensitivity), and F1 score as evaluation metrics.
To interpret model predictions, we examined both feature importance scores from XGBoost and Shapley Additive Explanations (SHAP) [46]. SHAP values quantify each feature’s marginal contribution to the model’s prediction, enabling identification of the most influential emotional and contextual factors associated with substance-use related text posts.
LLM-Based Thematic Extraction of Emotion-Context SubtopicsBeyond quantitative patterns, we sought to understand why adolescents express particular emotions within specific contexts surrounding substance use. To this end, we conducted a thematic analysis using GPT-4o-mini to inductively extract latent subtopics for salient emotion × context pairings. Prior work suggests that LLMs can efficiently summarize and interpret large text corpora to reveal themes judged to be reasonable by domain experts [47], although careful validation is required to ensure consistency. Our procedure was as follows:
Sampling: For each emotion-context combination of interest (e.g., sadness × peer influence, guilt × family, etc.), we retrieved up to 80 posts that were relatively rich in content (at least 100 words) to ensure sufficient contextual information. This strategy focused the analysis on substantive discussions rather than brief to trivial posts.
Prompting: Posts were processed in batches using GPT-4o-mini with instructions to identify recurring subthemes or storylines, focusing on (i) reasons behind the expressed emotion, (ii) patterns in the contextual domain, and (iii) any stated triggers for substance use. For example, for sadness × peer influence, the prompt was:
“Here are several posts from teens expressing sadness in situations involving peer influence (e.g., friends or classmates) related to substance use. Summarize the common themes or issues they talk about. What are recurring reasons they feel sad? How do peers figure into their stories? Are there common triggers for substance use mentioned?”
Synthesis: GPT-4o-mini outputs were manually reviewed by the first author to ensure coherence and interpretive accuracy. Representative subthemes and commonly reported triggers were retained for each emotion-context pairing.
Figure 2 shows the distribution of substance use-related posts in the r/teenagers subreddit from 2018 to 2022. Monthly peaks were detected using the find_peaks function from SciPy library (scipy.signal), implemented in Python (version 3.10.18) [48]. Peaks were operationally defined as local maxima that exceeded both preceding and following months and exhibited a minimum prominence of 0.2 percentage points, thereby filtering out minor month-to-month variability.
Posting frequency increased steadily beginning in March 2018, reached a pronounced peak in July 2019, and declined sharply in August 2019. This decline was followed by a more gradual decrease with minor fluctuations through the end of 2022. A chi-square test indicated seasonal variation in posting frequency, χ2(3) = 178.91, p < 0.0001, with higher activity observed during summer and holiday periods.
Emotion and Context Distributions by Substance Use GroupTo assess the reliability of LLM-generated annotations, we conducted a human-coded validation on a subsample of 300 posts. For emotion annotations, human reviewer evaluated the correctness of the LLM-assigned primary emotion label, yielding an overall validation accuracy of 0.910. For contextual annotations, human reviewer assessed whether the LLM correctly identified the presence or absence of family-, peer-, and school-related contexts. Validation accuracy was high across all context dimensions (family: 0.953; peer: 0.957; school: 0.977), with substantial agreement reflected by F1 scores (0.885–0.971) and Cohen’s κ values (0.871–0.889; Supplementary Table S3). Together, these results indicate strong alignment between LLM-generated annotations and human judgments on the evaluated subsample.
Figure 3 displays the distribution of emotions in substance use-related versus non-substance-use posts. Across both groups, sadness, joy, and anger were the most prevalent emotions. Notably, joy was substantially more common in non-substance use posts (22%) than in substance use posts (15%), representing the largest relative difference between groups. Sadness and anger occurred at comparable rates across the two groups. In contrast, disgust, fear, and guilt were more frequently expressed in substance use posts, with guilt showing a particularly pronounced difference (8% vs. 3%). Shame was expressed at similar level in both groups (4%).
Figure 4 displays the distribution of contextual influences across substance use-related and non-substance-use posts. Peer influence was the most frequently mentioned context in both groups, with a markedly higher proportion in substance use-related posts (72%) than in non-substance-use posts (41%). Family influence was also more common in substance-use posts (21% vs. 9%). In contrast, school-related contexts were discussed at similarly low frequencies in both groups (approximately 10%).
Statistical Comparisons and AssociationsGroup comparisons revealed significant differences in both emotional and contextual features between substance use-related and non-substance-use posts. As shown in Table 2, substance use-related posts exhibited higher levels of guilt, disgust, fear, sadness, and shame, whereas joy was more prevalent in non-substance use posts. Anger was slightly more common in non-substance use posts, although the effect size was small. Most emotional features showed highly significant group differences (p < 0.001), with anger (p = 0.001) and sadness (p = 0.008) also reaching significance at the p < 0.01 level. The largest proportional differences were observed for joy (Δp = −0.076), guilt (Δp = +0.045), and disgust (Δp = +0.035), whereas differences for anger, sadness, and shame were smaller in magnitude.
As shown in Table 3, substance use posts referenced peer and family influence significantly more often than non-substance use posts, with peer influence showing the largest difference between groups (p < 0.001). Peer influence exhibited the greatest proportional difference between groups (Δp = +0.310), followed by family influence (Δp = +0.122). In contrast, references to the school environment did not differ significantly between substance-use and non-substance-use posts (p = 0.185), with a negligible proportional difference (Δp = +0.004).
As shown in Figure 5, emotion-context correlations differed across groups. In substance use-related posts, family influence was most strongly associated with fear and sadness, with stronger associations than those observed in non-substance-use posts. Peer influence showed the strongest positive association with guilt, whereas school environment was most strongly linked to sadness. By contrast, in non-substance use posts, the school environment exhibited the strongest correlation with fear. Across both groups, family influence showed a consistent positive association with anger.
Figure 6 shows the SHAP values for contextual and emotional features. The XGBoost model achieved an accuracy of 0.67 on the test set, with a recall (sensitivity) of 0.75, precision of 0.68, and an F1 score of 0.71. Higher SHAP values for joy and school environment were associated with non-substance-use posts, whereas lower values indicated stronger associations with substance use. In contrast, peer influence, family influence, disgust and guilt showed higher importance for predicting substance use-related posts, underscoring their role as key emotional and contextual drivers in these discussions.
Thematic Analysis of Emotion-Context InteractionsPeer-related posts revealed that fear, shame, and sadness were particularly salient in the context of substance use. Adolescents frequently described fears of social exclusion and pressure to conform to peer norms, while shame and embarrassment often emerged in relation to public or socially visible incidents (e.g., photos shared online). Joy was also prominent, typically linked to peer bonding during celebrations, but was often intertwined with risky behaviors. These findings highlight peers as both a primary risk pathway (pressure, normalization) and a source of positive reinforcement in adolescent substance use narratives (see Tables 4–6 for illustrative themes).
Family-related posts were characterized primarily by fear, guilt, and sadness, reflecting the central role of parents and siblings in shaping adolescents’ attitudes toward substance use. Posts described fear of discovery, guilt associated with disappointing parents, and sadness linked to family conflict, divorce, or parental alcohol use. Joy was reported less frequently and typically emerged in contexts where families adopted permissive or celebratory attitudes toward drinking. Overall, family contexts represented a dual landscape, functioning both a source of protection through expectations and monitoring and as a pathway to risk through modeling, conflict, or neglect.
School-related posts emphasized fear, sadness, and anger as the dominant emotional themes. Students expressed fear of disciplinary consequences (e.g., suspension or expulsion) and sadness stemming from academic stress or social exclusion. Anger often arose in response to perceived unfairness, peer coercion, or frustration with dismissive school staff. Positive emotions, particularly joy, were reported in connection with school milestones and social events; however, these moments often co-occurred with substance use.
This study examined emotional and contextual correlates of adolescent substance use in social media posts using statistical tests, LLMs, and LLM-assisted thematic analysis. Three broad conclusions emerge. First, substance-related discussions exhibited clear seasonal variation, with peaks during school breaks and holiday periods. This pattern is consistent with prior research documenting seasonal cycles in youth mental and behavioral health [49–51]. Such peaks may reflect increased social activity and heightened stressors (e.g., loneliness, family tension) that elevate adolescents’ vulnerability to substance use. Second, sadness emerged as the most prevalent emotion in substance use-related posts, aligning with extensive evidence that negative affect is a key predictor of adolescent substance involvement [52,53]. This finding underscores the role of emotional distress as a central mechanism linking psychosocial stressors to substance-related behaviors during adolescence. Third, peer influence emerged as the dominant contextual driver, followed by family and school contexts. Correlation patterns indicated that guilt, sadness, and fear clustered most strongly with peer-related contexts, consistent with developmental theories of adolescent socio-emotional reactivity and heightened peer susceptibility during puberty [54,55].
Negative emotions played distinct roles in adolescents’ substance use narratives. Guilt, reflecting concern about the impact of one’s behavior on others, often co-occurred with remorse and regret [56]. In our data, guilt was linked to corrective intentions following substance use, suggesting a potentially protective function. By contrast, shame was more closely tied to threats to self-worth and was associated with risky dynamics such as peer validation and identity performance [57]. This distinction aligns with prior research indicating that guilt may promote reparative action, whereas shame may amplify vulnerability through social identity pressures [58,59]. Disgust frequently emerged in the form of internal conflict or peer critique, sometimes functioning as a moral boundary; however, its salience diminished in contexts where substance use was normalized.
Contextual patterns further underscored the central role of peers, consistent with extensive evidence that peer norms strongly shape adolescent substance involvement [60]. Prior work emphasizes the need to distinguish peer selection from peer socialization when designing effective prevention strategies [61]. In our data, peer influence was most strongly associated with guilt in substance-use posts, suggesting that adolescents often frame their substance use through interpersonal responsibility and regret. Family influence was uniquely tied to fear and sadness, highlighting concerns about parental discovery and strained family relationships. School contexts, in contrast, were more closely associated with sadness in substance use-related posts but with fear in non-substance-use posts, pointing to different stress pathways across environments. These findings extend earlier work showing that adolescents’ heightened emotional reactivity and reduced regulatory capacity amplify risks within peer-dominated contexts [62].
LLM-assisted thematic coding provided additional nuance. Sadness was tied to loneliness and social disconnection-even “in a crowd”—underscoring belonging as a core motivational driver. Fear functioned ambivalently: fear of exclusion promoted conformity, whereas fear of parental sanction or long-term harm acted as a deterrent to substance use. Disgust served as a moral boundary (e.g., disdain for vaping peers), but its protective effect weakened in cultures where use was normalized. Guilt reflected the intention-behavior gap, in which reflective goals were overridden by situational impulses, prompting self-reproach. Shame often followed use but also fueled a feedback loop of continued risk-taking through identity performance (“being cool”) and later regret [63].
Family and school environments were similarly pivotal. Family-related risks included parental conflict, divorce, substance use, and low support, while strong family bonds and positive adult models emerged as protective factors [64]. In schools, triggers included academic stress, parties, and safety concerns, while policies and supportive programs provided buffers. Addressing academic pressures through targeted stress-management strategies may therefore help reduce substance use risk.
Together, these findings advance understanding of adolescent substance use by identifying peers as both primary risk pathways and potential sources of support, families as contexts of both conflict and protection, and schools as key environments shaping stress and coping. At the individual level, the contrast between disgust and guilt underscores how discrete emotions channel risk versus resilience pathways [58]. These insights extend theory on the socio-emotional mechanisms of adolescent substance use while pointing to multilevel targets for prevention.
This study has several limitations. First, the anonymous nature of Reddit precludes control over key demographic variables such as age, gender, ethnicity, or location, which constrains the generalizability of the findings. Moreover, it is not possible to verify that all posts were authored by adolescents; contributions from adults or automated accounts may have introduced bias. Second, the use of GPT-4o-mini for emotional and contextual annotations entails reliance on large language model behavior, which are subject to inherent limitations. Although manual validation was performed, annotation accuracy ultimately depends on prompt design and model capabilities. Third, the comparative design, which involved randomly sampled non-substance-use posts, may not fully capture broader discourse patterns, and the absence of longitudinal data restricts insight into the developmental course of substance use behaviors. Finally, the self-reported nature of the data may involve exaggeration or omission of substance use, further limiting interpretability. Future research should incorporate longitudinal designs and larger, more diverse datasets to strengthen representativeness and analytic robustness.
In conclusion, this study demonstrates that negative emotions play differentiated roles in adolescent substance use discourse. Disgust and guilt emerged as salient emotional correlates, with guilt often reflecting remorse and reparative intentions, whereas shame was more closely tied to identity threats and risky peer dynamics. Fear exhibited an ambivalent function, operating both as a risk factor through conformity pressures and as a protective signal through concerns about consequences, while sadness emerged as a pervasive emotional backdrop to substance-related discussions. Peers emerged as the dominant contextual influence, with family and school also important while family and school environments contributing distinct and sometimes opposing risk and protective dynamics. Together, these findings suggest that adolescent substance use is best understood as the product of interacting emotional and contextual mechanisms rather than isolated risk factors. Future research should continue to refine and validate LLM-based annotation methods for emotion-context mechanisms and extend analyses to other youth-oriented platforms. Collectively, our findings underscore the need for multilevel prevention strategies that address individual emotions alongside peer, family, and school contexts, advancing theory on the socio-emotional mechanisms of adolescent substance use while identifying actionable leverage points for prevention.
This study used publicly available, de-identified Reddit data. All personal identifiers (e.g., usernames) were removed prior to analysis. Because the data were publicly accessible and no direct interaction with human subjects occurred, institutional review board (IRB) approval and informed consent were not required. All findings are reported in aggregate from to protect user privacy.
Declaration of Helsinki STROBE Reporting GuidelineThis study was conducted in accordance with the principles of the Declaration of Helsinki. The study design, data collection, analysis, and reporting followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.
The following supplementary materials are available online, Supplementary Analyses and Materials for Emotions, Context, and Substance Use in Adolescents: A Large Language Model Analysis of Reddit Posts.
The data used in this study are publicly available from an established academic repository hosting Reddit dataset (Academic Torrents).
The analysis code supporting the findings of this study is publicly available in an author-maintained GitHub repository.
All authors meet the authorship criteria established by the International Committee of Medical Journal Editors (ICMJE). JZ and RJ conceptualized the study. JZ, HJ and YW developed the methodology. JZ implemented the software, conducted the formal analysis, curated the data, and performed the visualization. Validation was performed by JZ, KGC and DRK. JZ prepared the original draft of the manuscript. Writing—review and editing were contributed by JZ, KGC, RJ and DRK. JZ managed project administration.
None declared.
This research received no external funding.
The authors thank all individuals who contributed to this study through technical support, data preparation, or preliminary discussions but who do not meet the criteria for authorship. We also acknowledge the participants whose publicly available data made this research possible.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
Zhu J, Jiang H, Wang Y, Coifman KG, Jin R, Kenne DR. Emotions, context, and substance use in adolescents: A large language model analysis of reddit posts. J Psychiatry Brain Sci. 2026;11(1):e260002. https://doi.org/10.20900/jpbs.20260002.

Copyright © Hapres Co., Ltd. Privacy Policy | Terms and Conditions