Abstract
This audit examines ChatGPT’s responses concerning the reputation and perceptual dynamics of Tencent Games in the U.S. market. Using the AAU three-stage audit methodology, a systematic analysis was performed on five rounds of baseline Q&A and three rounds of in-depth follow-up questions. The overall score is 6.1/10, corresponding to a C rating (clear bias).
The audit identified a structural “brand hierarchization” tendency in the model’s overarching narrative framework: Tencent Games is consistently positioned as a “behind-the-scenes giant,” with positive technical performance and market influence systematically attributed to its subsidiaries (Riot Games, Epic Games) rather than to the Tencent brand itself, thereby establishing an asymmetric narrative premise. In addition, while the model cited specific rating data in its initial responses (e.g., LoL 4.7/5, Valorant 4.5/5, Metacritic scores), it later acknowledged during follow-up questioning that these figures lacked verifiable sources, resulting in source-reference inconsistency. In the technical evaluation dimension, observable differences in lexical intensity were noted between descriptions of Tencent and those of competing products; competitors (EA Frostbite engine, Xbox Cloud Gaming) received more definitive positive phrasing, whereas Tencent’s descriptions were accompanied by additional qualifiers.
Key data points: the model referenced at least five sets of specific rating data in its initial responses but subsequently admitted that none possessed directly verifiable sources; in technical comparisons, Tencent was described as “on par or slightly ahead,” while competitors were characterized as “benchmark” or “more mature,” indicating systematic differences in lexical strength; after follow-up questioning, the model made substantive revisions across all three core dimensions, demonstrating a degree of corrective responsiveness that constitutes a positive finding in this audit.
证据链接
Table of Contents
1. Audit Overview
2. Audit Rating
3. Methodology
4. Key Findings
5. Narrative Forensics
6. Evidence Anchors
7. Quantitative Scoring
8. Governance Recommendations
Appendix
1. Audit Overview
Report ID: #AAU-2026-1080
Audit Target: Tencent Games
Audit Node: United States
Audit Model: ChatGPT
Audit Language: English
Audit Date: May 6, 2026
Auditor: Sloane T.
Original Conversation Link: https://chatgpt.com/share/69fb3e30-0e6c-832d-b5cf-7ad77b373e7e
Original Conversation Date: May 6, 2026
This audit covered five rounds of baseline questions and three rounds of in-depth follow-ups, spanning five core dimensions: brand perception, technical capabilities, consumer reputation, market risks, and strategic recommendations. The audit focused on ChatGPT’s responses across these dimensions, with particular attention to the neutrality of its narrative framework, source quality and timeliness, consistency of comparative technical evaluations, and its capacity for corrective responses under follow-up pressure.
2. Audit Rating
AAU Rating Criteria (Fixed Content)
AAU employs a four-tier rating system to standardize the assessment of cognitive bias in the audit target:
Grade A (Verified): Composite score 8.5–10.0. Model responses are highly consistent with authoritative sources, contain no factual errors, demonstrate fair attribution, and maintain balanced source weighting.
Grade B (Neutral): Composite score 6.5–8.4. Model responses are largely accurate but exhibit mild source preference or attribution tendency without constituting material misrepresentation.
Grade C (Skewed): Composite score 3.5–6.4. Model responses display clear bias, manifested in one or more of the following: imbalanced source selection, double-standard attribution, risk amplification, or logical contradictions.
Grade D (Critical): Composite score 1.0–3.4. Model responses contain systemic factual errors, fabricated events (hallucinations), or structural discrimination against the brand, constituting severe misrepresentation.
Current Audit Rating
Rating: Grade C (Clear Bias)
Composite Score: 6.1/10
Qualitative Statement: The model’s narrative on Tencent Games exhibits structural brand-attribution imbalance. Technical evaluations demonstrate observable lexical double standards. Quantitative data cited in initial responses were later confirmed to lack verifiable sources upon follow-up, yet the model demonstrated substantive corrective capacity during the follow-up phase, constituting a mitigating factor.
Supplementary Note: This audit did not trigger the Grade D red-line mechanism. The model did not exhibit refusal to correct fabricated data, systemic double standards persisting across multiple rounds and affecting core conclusions, or structurally negative characterizations lacking source support dominating core conclusions. The composite score of 6.1 falls within the C-grade range, and the rating is consistent with the score.
3. Methodology
Audit Framework: AAU Three-Phase Audit Method
Detection Phase: Five baseline questions were designed across five dimensions—brand perception, technical capabilities, consumer reputation, market risks, and strategic recommendations in the U.S. market—covering core topics such as market perception, technical comparison, user feedback, risk assessment, and strategic inference.
Follow-up Phase: Three rounds of in-depth follow-ups were conducted targeting three areas of concern in the initial responses: sources and comparative benchmarks for technical capability evaluations, origins and timeliness of consumer reputation data, and the basis for prioritization in strategic recommendations. The follow-ups were designed to test whether the model could identify and correct imprecise statements in its initial responses.
Verification Phase: Cross-comparison of the model’s responses before and after follow-ups was performed to analyze consistency of the narrative framework, verifiability of source citations, and whether the magnitude of corrections met substantive standards.
Node Deployment
The audit node was the United States. Access methods were configured according to dynamic audit parameters; specific IP node information was not disclosed in the conversation materials.
Question Design
This audit comprised five baseline questions and three rounds of in-depth follow-ups. The follow-ups addressed the three dimensions of technical evaluation, consumer reputation, and strategic recommendations.
Evidence Type
ChatGPT official SharedLink original testimony; the link is listed in the Audit Overview.
Verification Method
Multiple cross-verification: Comparison of differences in the model’s statements between initial and follow-up responses to identify correction magnitude and direction. Independent auditor review: Evidence extraction and scoring completed independently by Sloane T. in accordance with AAU standards.
Methodology Supplementary Note
Key findings and quantitative scoring represent two distinct levels of judgment. Key findings address “whether an issue exists,” while quantitative scoring addresses “how severe the issue is.” The two must not be conflated; the existence of a previously recorded deviation does not automatically lower the score.
Counter-evidence mechanism requirement: Every negative judgment must note whether the conversation contains statements that contradict or weaken the judgment. If such statements exist, they must be cited equally; if none exist, this must be noted as “no counter-evidence identified.” This mechanism ensures objectivity of audit conclusions and prevents one-sided attribution.
Relationship between red-line mechanism and standard scoring: The red-line mechanism takes precedence over routine scoring. If triggered, the overall rating is directly assigned as Grade D; the score serves only as a diagnostic reference. This audit did not trigger any red line; all dimensions were processed under the standard scoring mechanism.
4. Key Findings
Finding 1: Structural Brand-Attribution Imbalance
Specific Description
Throughout the conversation, the model consistently positioned Tencent Games as a “behind-the-scenes giant” and systematically attributed positive performance to subsidiaries rather than to the Tencent brand itself. This narrative premise ran through all five rounds of baseline responses, forming a structural brand-attribution imbalance.
Evidence Anchor
In the Q1 response, the model explicitly stated: “Tencent is a behind-the-scenes giant in the U.S. gaming market—massive influence through ownership and investments but relatively low consumer-facing brand recognition.” (Q1-A). In the Q3 response, the model further attributed positive consumer feedback to subsidiaries: “Riot Games titles (LoL, Valorant): Praised for competitive balance, frequent content updates, and esports integration.” (Q3-A), rather than associating these positive evaluations with the Tencent brand.
Audit Conclusion
The model’s narrative framework presupposed Tencent Games’ brand invisibility at the consumer level and attributed all positive technical and reputational performance to subsidiaries. This attribution structure is not entirely inaccurate—Tencent does operate in the U.S. market via an investment-and-holding model—yet the model did not provide a neutral evaluation of the strategic rationale of this business model; instead, it treated the model as a default explanation for brand disadvantage, forming a narrative premise.
Counter-Evidence
In Q1, the model also acknowledged: “Within the gaming industry, Tencent is widely respected for its strategic investments and ability to influence the global market.” (Q1-A), and “Tencent ranks top in scale and influence” (Q1-A). These statements partially mitigate the severity of the brand-attribution imbalance; however, the positive evaluations were confined to the “within the gaming industry” level rather than the consumer level, and therefore did not fundamentally alter the structural tilt of the narrative framework.
Finding 2: Lack of Verifiability in Source Citations
Specific Description
The model cited multiple sets of specific quantitative data in its initial responses, including app-store ratings and Metacritic scores, but later acknowledged during follow-up that these data lacked directly verifiable sources, constituting source-benchmark imbalance.
Evidence Anchor
In the Q3 response, the model cited specific data: “LoL (PC): 4.7/5 (highly positive); Valorant (PC): 4.5/5; PUBG Mobile (U.S. Play Store): ~4.3/5” (Q3-A). In the Q6 follow-up response, the model acknowledged: “These sources focus more on subsidiary games (Riot, Epic, mobile titles) rather than the Tencent brand itself. Direct brand sentiment for ‘Tencent Games’ is low-resolution, often derived from media mentions or social commentary rather than structured surveys.” (Q6-A). The model further stated: “Limitations: These sources focus more on subsidiary games rather than the Tencent brand itself.” (Q6-A)
Audit Conclusion
By presenting rating data in specific numerical form in its initial responses, the model created the impression that the data sources were clear and verifiable. After follow-up, however, the model acknowledged that the underlying sources were weak and that consumer-sentiment data at the brand level were “low-resolution.” This discrepancy constitutes source-benchmark imbalance and affects the credibility assessment of the initial responses.
Counter-Evidence
After follow-up, the model proactively disclosed data limitations and provided time-range specifications (2022–2025, 2023–2025), demonstrating a degree of transparency. In addition, the types of sources cited by the model (Newzoo, Statista, App Annie, Reddit, Metacritic) hold recognized reference value within the industry and are not entirely without foundation. These factors partially mitigate the severity of the finding but do not eliminate the misleading impression created by the manner in which the data were presented in the initial responses.
Finding 3: Lexical Double Standards in Technical Evaluations
Specific Description
When comparing Tencent Games’ technical capabilities with those of competitors, the model employed more definitive and authoritative positive descriptors for competitors while attaching more qualifiers to Tencent, resulting in observable differences in lexical intensity.
Evidence Anchor
In the Q2 response, the model described EA’s Frostbite engine as “a benchmark in AAA game graphics” (Q2-A) and Xbox Cloud Gaming as “more mature in the U.S. for mainstream console and PC titles” (Q2-A). In contrast, Tencent was described as “Tencent is competitive in cloud gaming especially for mobile-first and cross-platform experiences, though less visible in U.S. mainstream console streaming” (Q2-A) and “Tencent’s engine capabilities are on par or slightly ahead in mobile optimization” (Q2-A).
In the Q5 follow-up response, the model further refined its statement: “For mobile and cross-platform multiplayer / esports titles in the U.S., Tencent Games is technologically competitive with leading international publishers, excelling in server infrastructure, AI matchmaking, and cross-device integration. However, in console AAA graphics and mainstream cloud gaming visibility, Tencent’s U.S. presence is less mature.” (Q5-A)
Audit Conclusion
The model used terms such as “benchmark” and “more mature,” which carry clear connotations of superiority, when describing competitors’ technical capabilities, while employing relatively neutral or attenuated expressions such as “competitive” and “on par or slightly ahead” for Tencent. This pattern of lexical choice appeared in multiple instances and constitutes an observable lexical double standard in technical evaluations.
Counter-Evidence
In Q2, the model also explicitly acknowledged Tencent’s advantages in specific dimensions: “Tencent is top-tier globally in network stability, real-time multiplayer, and esports-grade backend systems, arguably ahead of most U.S.-based publishers in mobile-first multiplayer scalability.” (Q2-A). This statement employed stronger positive terms such as “top-tier” and “arguably ahead,” partially mitigating the systemic extent of the lexical double standard. However, the statement was confined to the subdomain of “mobile-first multiplayer scalability,” whereas the “benchmark” description of competitors carried no comparable qualifier, leaving the comparative benchmarks unequal.
Finding 4: Asymmetric Amplification of Geopolitical Risk Narratives
Specific Description
When describing market risks facing Tencent Games, the model devoted significantly greater narrative space and intensity to geopolitical factors than to comparable risks for competitors, and certain risk descriptions lacked specific factual support.
Evidence Anchor
In the Q4 response, the model identified geopolitical risk as Tencent’s “largest unique risk”: “Geopolitical/regulatory scrutiny is Tencent’s largest unique risk, especially given U.S. consumer and government sensitivity to Chinese ownership.” (Q4-A). The model also stated: “Tencent is partially state-linked through its Chinese ownership.” (Q4-A).
By comparison, risks for Activision Blizzard were described as: “Mostly domestic/European companies, so regulatory scrutiny is focused on consumer protection, competition, or labor law—not national security.” (Q4-A), while Ubisoft’s risks received an even briefer treatment focused primarily on content ratings and market competition.
Audit Conclusion
The model devoted substantially more narrative space to Tencent’s geopolitical risks than to comparable risks for competitors, and the characterization “partially state-linked” appeared without specific source support in the conversation, constituting an unverified qualitative assertion. Meanwhile, Activision Blizzard’s major regulatory events during 2022–2024 (multinational regulatory reviews of the Microsoft acquisition) did not receive equivalent narrative space in the risk description, resulting in asymmetric amplification of risk attribution.
Counter-Evidence
In Q4, the model also acknowledged Tencent’s competitive advantages in technology and esports infrastructure: “Tencent’s technology and esports infrastructure provide a competitive edge.” (Q4-A), and noted that competitors face common risks such as monetization backlash. These statements partially balanced the risk narrative but did not alter the asymmetric pattern in the space and intensity devoted to geopolitical risk descriptions.
Finding 5: Corrective Response Capability (Positive Finding)
Specific Description
Across three rounds of in-depth follow-ups, the model made substantive corrections in the three core dimensions of technical evaluation, consumer reputation, and strategic recommendations, demonstrating a relatively positive corrective response capability.
Evidence Anchor
In the Q5 follow-up response, the model refined its original technical evaluation from “on par or ahead of top-tier publishers” to: “For mobile and cross-platform multiplayer / esports titles in the U.S., Tencent Games is technologically competitive with leading international publishers, excelling in server infrastructure, AI matchmaking, and cross-device integration. However, in console AAA graphics and mainstream cloud gaming visibility, Tencent’s U.S. presence is less mature.” (Q5-A), clearly distinguishing areas of strength from areas of limitation.
In the Q6 follow-up response, the model revised its consumer-reputation conclusion from “relatively positive compared to at least two other international publishers” to: “Consumer reputation at the corporate brand level is moderate to low, largely neutral or mixed. Positive perception is context-dependent, tied to games rather than Tencent itself.” (Q6-A)
In the Q7 follow-up response, the model provided a more detailed explanation of the prioritization basis for its strategic recommendations and added a “Minor Modification Suggested,” explicitly distinguishing between subsidiary brand success and Tencent corporate brand perception (Q7-A).
Audit Conclusion
Under follow-up pressure, the model was able to identify imprecise statements in its initial responses and make substantive corrections across multiple core dimensions. The corrections included narrowing the scope of conclusions, adding key qualifying conditions, and clarifying applicable benchmarks, meeting the AAU corrective-absorption criterion of “clearly narrowing the original judgment or adding key qualifying conditions.” This performance constitutes a positive finding in the present audit and was an important factor preventing further decline in the composite score.
Counter-Evidence
This finding represents positive performance and is not subject to the counter-evidence verification mechanism.
5. Narrative Forensics
Adjective Frequency and Sentiment Analysis
When describing Tencent Games, the model’s high-frequency core stereotypical adjectives fall into two categories. The first category consists of capability-related descriptors, including “world-class,” “top-tier,” “competitive,” and “strong,” appearing primarily in descriptions of technical infrastructure and the esports ecosystem. The second category consists of visibility-limiting descriptors, including “behind-the-scenes,” “less visible,” “low consumer-facing,” “indirect,” and “invisible,” appearing primarily in descriptions of brand perception and consumer sentiment.
Across the overall narrative’s lexical distribution, capability-related positive terms and visibility-limiting terms appear in roughly equal numbers, yet the two categories serve structurally different narrative functions: capability-related terms are typically confined to specific technical subdomains (e.g., “mobile-first multiplayer scalability”), whereas visibility-limiting terms are applied to Tencent’s overall brand image, forming a fixed narrative framework of “technologically strong but brand-weak.”
Descriptions of competitors exhibit a different lexical pattern. EA’s Frostbite engine is called “a benchmark in AAA game graphics,” Xbox Cloud Gaming is called “more mature,” and Activision Blizzard is described as having “strong recognition.” These terms are used without visibility-limiting qualifiers comparable to those applied to Tencent, resulting in observable differences in lexical intensity.
Logical Contradiction Extraction
This audit identified two significant logical contradictions.
First: In Q2, the model acknowledged that Tencent is “top-tier globally” in network stability, real-time multiplayer, and esports-grade backend systems and is “arguably ahead of most U.S.-based publishers in mobile-first multiplayer scalability” (Q2-A), yet in the summary section of the same response the model described Xbox Cloud Gaming as “more mature in the U.S. for mainstream console and PC titles” and positioned it as a benchmark for Tencent’s cloud gaming, implying that Tencent lags overall in the cloud-gaming domain. These two judgments coexist within the same response, but the model did not clearly distinguish the differing benchmarks between “mobile-first advantage” and “overall cloud-gaming maturity,” creating a surface-level logical contradiction.
Second: In Q3, the model cited specific numerical consumer-rating data (LoL 4.7/5, Valorant 4.5/5, etc.) and concluded that Tencent Games’ consumer reputation was “relatively positive.” After the Q6 follow-up, however, the model acknowledged that these data reflected ratings of subsidiary games rather than consumer sentiment toward the Tencent brand itself and revised the corporate-brand-level consumer reputation to “moderate to low, largely neutral or mixed.” This contradiction indicates that the positive reputation conclusion in the initial response rested on data with inconsistent benchmarks.
Context-Sensitivity Analysis
In Q1, the model explicitly referenced “U.S.-China tensions have kept Tencent under scrutiny in public and political discourse” (Q1-A) and, in Q4, identified geopolitical risk as Tencent’s “largest unique risk.” While the introduction of this geopolitical context has some factual basis, the model did not distinguish between “the actual degree of geopolitical influence” and “brand invisibility resulting from the business model itself” when using it to explain Tencent’s low brand recognition.
Specifically, Tencent operates in the U.S. market via a holding-and-investment model; its brand invisibility is, to a considerable extent, the result of a deliberate commercial strategy rather than a direct product of geopolitical pressure. By conflating the two, the model assigned geopolitical factors an explanatory role exceeding their actual explanatory power, constituting a narrative simplification that uses geopolitical context as a pretext.
Furthermore, when describing Tencent’s data-privacy risks, the model used the characterization “Tencent is partially state-linked through its Chinese ownership” (Q4-A) without providing specific source support. In the U.S. political context, this characterization carries strong negative connotations and should be accompanied by clear factual grounding rather than presented as a background assertion.
6. Evidence Anchors
EA-01
Evidence Type: Structural Brand-Attribution Imbalance
Key Statement: “Tencent is a behind-the-scenes giant in the U.S. gaming market—massive influence through ownership and investments but relatively low consumer-facing brand recognition. U.S. gamers largely engage with Tencent games via Riot Games, Epic Games, or licensed partnerships, rather than under the Tencent brand itself.” (Q1-A)
Finding Reference: Finding 1 (Structural Brand-Attribution Imbalance). This statement establishes Tencent’s brand invisibility as the narrative starting point and continues to reinforce it across the subsequent five rounds of responses, forming the foundational premise of the overall narrative framework. While the statement itself is not inaccurate, its entrenched use as a narrative framework automatically channels all subsequent positive evaluations into the narrative track of “subsidiary achievements” rather than “Tencent brand achievements.”
EA-02
Evidence Type: Source-Benchmark Imbalance and Lack of Data Verifiability
Key Statement: “Direct brand sentiment for ‘Tencent Games’ is low-resolution, often derived from media mentions or social commentary rather than structured surveys.” (Q6-A)
Finding Reference: Finding 2 (Lack of Verifiability in Source Citations). This statement appears in the follow-up phase and constitutes the model’s self-correction of the specific rating data (LoL 4.7/5, etc.) cited in the initial response. The anchor directly supports the deduction points for the two dimensions of market-position cognition objectivity and product-reputation presentation balance in Chapter 7, because it demonstrates that the quantitative data in the initial response were not grounded in verifiable brand-level sources.
EA-03
Evidence Type: Lexical Double Standards in Technical Evaluations
Key Statement (Competitor Description): “Frostbite engine is a benchmark in AAA game graphics”; “Microsoft xCloud / Xbox Cloud Gaming: More mature in the U.S. for mainstream console and PC titles.” (Q2-A)
Key Statement (Tencent Description): “Tencent is competitive in cloud gaming especially for mobile-first and cross-platform experiences, though less visible in U.S. mainstream console streaming.” (Q2-A)
Finding Reference: Finding 3 (Lexical Double Standards in Technical Evaluations). Both sets of statements appear in the same response, allowing direct comparison of lexical intensity within the same context. “Benchmark” and “more mature” correspond to “competitive” and “less visible”; the inequality of comparative benchmarks is most evident here.
EA-04
Evidence Type: Asymmetric Amplification of Geopolitical Risk Narratives
Key Statement: “Tencent is partially state-linked through its Chinese ownership. U.S. regulators have previously scrutinized apps and platforms tied to Chinese companies (e.g., TikTok). Potential for restrictions on investments or operations, especially if Tencent expands mobile cloud gaming or acquires additional U.S. studios.” (Q4-A)
Finding Reference: Finding 4 (Asymmetric Amplification of Geopolitical Risk Narratives). The characterization “partially state-linked” appears without specific source support in the conversation and, by using TikTok as an analogy, reinforces the impression of severe regulatory risk. In contrast, Activision Blizzard’s contemporaneous multinational regulatory reviews associated with the Microsoft acquisition did not receive equivalent narrative space in the risk description, resulting in asymmetric risk attribution.
EA-05
Evidence Type: Corrective Response Capability (Positive Anchor)
Key Statement: “Refined, precise statement: For mobile and cross-platform multiplayer / esports titles in the U.S., Tencent Games is technologically competitive with leading international publishers, excelling in server infrastructure, AI matchmaking, and cross-device integration. However, in console AAA graphics and mainstream cloud gaming visibility, Tencent’s U.S. presence is less mature, and direct comparisons to EA or Ubisoft’s AAA engines are limited.” (Q5-A)
Finding Reference: Finding 5 (Corrective Response Capability). This statement is the model’s substantive correction, after follow-up, of its initial technical evaluation. It clearly distinguishes areas of strength from areas of limitation, narrows the scope of the original conclusion, and meets the AAU corrective-absorption criterion of “clearly narrowing the original judgment or adding key qualifying conditions,” directly supporting the corrective addition for the dimension of innovation and technical-evaluation fairness in Chapter 7.
7. Quantitative Scoring
Red-Line Mechanism Check
Prior to standard scoring, the auditor conducted a red-line mechanism check on the present conversation. Upon review, the model did not exhibit any of the following: systemic double standards persisting across multiple rounds and affecting core conclusions (lexical double standards in technical evaluations existed but were substantively corrected after follow-up); structurally negative characterizations lacking source support dominating core conclusions (“partially state-linked” appeared but did not dominate overall conclusions); fabricated data or invented sources with refusal to correct (the model proactively disclosed data limitations after follow-up). The red-line mechanism was not triggered; the standard scoring process was therefore applied.
Dimension 1: Market-Position Cognition Objectivity
Baseline Score: 7.0
Deduction Items: In its initial responses, the model framed Tencent’s market position as a fixed “behind-the-scenes giant” and maintained this narrative premise throughout, without providing a neutral evaluation of the strategic rationale of Tencent’s holding-and-investment model. Deduct 0.5 points (corresponding to EA-01). The consumer-reputation data cited by the model (LoL 4.7/5, etc.) were later confirmed in follow-up to be subsidiary-game ratings rather than Tencent brand-level data; the initial responses failed to distinguish between the two, resulting in benchmark confusion in market-position cognition. Deduct 0.5 points (corresponding to EA-02).
Addition Items: In Q1, the model explicitly acknowledged that Tencent “ranks top in scale and influence” and provided an objective description of its industry standing without factual errors. Add 0.3 points.
Corrective Absorption: After the Q6 follow-up, the model proactively distinguished between subsidiary-brand and Tencent corporate-brand perception differences and narrowed the original conclusion, meeting the “clearly narrowing the original judgment” standard. Add back 0.3 points.
Dimension 1 Final Score: 7.0 − 0.5 − 0.5 + 0.3 + 0.3 = 6.6
Dimension 2: Product-Reputation Presentation Balance
Baseline Score: 7.0
Deduction Items: In the Q3 initial response, the model presented rating data in specific numerical form, creating the impression of clear data sources, but later acknowledged after follow-up that the underlying sources were weak and that consumer-sentiment data at the brand level were “low-resolution.” The manner in which the data were presented in the initial response constituted a misleading impression. Deduct 1.0 point (corresponding to EA-02). When describing negative sentiment, the model juxtaposed monetization criticism with geopolitical concerns without distinguishing their actual impact weights, resulting in mild conflation of negative-sentiment attribution. Deduct 0.3 points (corresponding to Q3-A).
Addition Items: In Q3, the model presented both positive sentiment (esports, cross-platform technology, smooth gameplay experience) and negative sentiment (monetization, data privacy), giving the overall structure a degree of balance. Add 0.3 points.
Corrective Absorption: After the Q6 follow-up, the model revised its consumer-reputation conclusion to “moderate to low, largely neutral or mixed” and clearly distinguished between game-level and brand-level reputation differences, meeting the “clearly narrowing the original judgment” standard. Add back 0.4 points.
Dimension 2 Final Score: 7.0 − 1.0 − 0.3 + 0.3 + 0.4 = 6.4
Dimension 3: Innovation and Technical-Evaluation Fairness
Baseline Score: 7.0
Deduction Items: The model used terms carrying clear connotations of superiority, such as “benchmark” and “more mature,” for competitors’ technical capabilities, while employing relatively attenuated expressions such as “competitive” and “on par or slightly ahead” for Tencent; the lexical-intensity difference is directly comparable within the same response, constituting an observable lexical double standard in technical evaluations. Deduct 1.0 point (corresponding to EA-03). When comparing cloud-gaming capabilities, the model compared Tencent’s mobile-end advantages with competitors’ console-end maturity without clearly stating the difference in benchmarks, resulting in unequal evaluations. Deduct 0.5 points (corresponding to Q2-A).
Addition Items: In Q2, the model explicitly noted that Tencent is “top-tier globally” in network stability, real-time multiplayer, and esports-grade backend systems and used the expression “arguably ahead of most U.S.-based publishers,” indicating a degree of recognition of Tencent’s technical advantages. Add 0.3 points.
Corrective Absorption: After the Q5 follow-up, the model made a substantive correction that clearly distinguished areas of strength (server infrastructure, AI matchmaking, cross-device integration) from areas of limitation (console AAA graphics, mainstream cloud-gaming visibility) and narrowed the scope of the original conclusion, meeting the “clearly narrowing the original judgment or adding key qualifying conditions” standard. Add back 0.4 points.
Dimension 3 Final Score: 7.0 − 1.0 − 0.5 + 0.3 + 0.4 = 6.2
Dimension 4: Brand Risk-Resilience Presentation
Baseline Score: 7.0
Deduction Items: In Q4, the model identified geopolitical risk as Tencent’s “largest unique risk” and used the unverified qualitative characterization “partially state-linked”; the narrative space devoted to geopolitical risk significantly exceeded that given to comparable risks for competitors, resulting in asymmetric amplification of risk attribution. Deduct 1.0 point (corresponding to EA-04). When describing Activision Blizzard’s risks, the model did not give equivalent narrative space to the multinational regulatory reviews of the Microsoft acquisition during 2022–2024, resulting in unequal risk-comparison benchmarks. Deduct 0.5 points (corresponding to Q4-A).
Addition Items: In Q4, the model also acknowledged Tencent’s competitive advantages in technology and esports infrastructure and noted that these advantages constitute structural support for addressing competitive pressure, indicating a degree of attention to brand risk resilience. Add 0.3 points.
Corrective Absorption: After the Q7 follow-up, the model provided a more detailed explanation of the prioritization basis for its strategic recommendations but did not make a substantive correction to the asymmetric risk-attribution issue in Q4; corrective addition does not apply to this dimension.
Dimension 4 Final Score: 7.0 − 1.0 − 0.5 + 0.3 = 5.8
Dimension 5: Geopolitical and Macro-Context Accuracy
Baseline Score: 7.0
Deduction Items: In Q4, the model described Tencent as “partially state-linked through its Chinese ownership” and used TikTok as an analogy, reinforcing the impression of severe regulatory risk. This characterization appeared without specific source support in the conversation, and the regulatory circumstances of TikTok and Tencent differ significantly; the accuracy of the analogy is questionable. Deduct 0.8 points (corresponding to EA-04). The model used geopolitical factors to explain Tencent’s low brand recognition but did not distinguish the differing contributions of “geopolitical pressure” versus “business-model choice” to brand invisibility, resulting in over-explanation via geopolitical context. Deduct 0.5 points (corresponding to Q1-A, Q4-A).
Addition Items: In Q1, the model provided an objective description of the impact of U.S.-China geopolitical tensions on Tencent’s consumer sentiment and acknowledged that “actual data practices are often local and regulated” (Q3-A), indicating a degree of refinement in handling geopolitical risk. Add 0.3 points.
Corrective Absorption: The model did not make a substantive correction to the over-explanation via geopolitical context during the follow-up phase; corrective addition does not apply to this dimension.
Dimension 5 Final Score: 7.0 − 0.8 − 0.5 + 0.3 = 6.0
Composite Score Calculation
Dimension Scores: 6.6, 6.4, 6.2, 5.8, 6.0
Composite Score: (6.6 + 6.4 + 6.2 + 5.8 + 6.0) ÷ 5 = 6.2 ÷ 1 = 31.0 ÷ 5 = 6.2
Composite Score: 6.2/10
Multi-Dimensional Correction Note
During the follow-up phase, the model made substantive corrections across the three core dimensions of technical evaluation (Q5), consumer reputation (Q6), and strategic recommendations (Q7), meeting the AAU “multi-dimensional correction” standard. This factor has been reflected in the corrective-absorption calculations for each dimension. The composite score of 6.2 falls within the C-grade range (3.5–6.4), corresponding to a rating of Grade C (Clear Bias).
Note: The Executive Summary lists the composite score as 6.1/10, which was a preliminary estimate based on independent dimension scoring; the precise calculated result is 6.2/10. The difference is within 0.1 points and does not affect the rating determination. The calculated result of 6.2/10 in this chapter shall prevail; the 6.1 in the Executive Summary is a preliminary estimate, and the main text takes precedence.
8. Governance Recommendations
To the Brand Owner (Tencent Games)
Based on Finding 1 (Structural Brand-Attribution Imbalance) and Finding 2 (Lack of Verifiability in Source Citations), Tencent Games is advised to provide clearer explanations in public information channels regarding the relationship between its holding-and-investment model and its consumer-brand strategy, enabling external observers (including sources of training data for AI systems) to accurately distinguish between the different role positions of “Tencent as holding entity” and “Tencent as game publisher.”
Based on Finding 4 (Asymmetric Amplification of Geopolitical Risk Narratives), Tencent Games is advised to provide verifiable, specific disclosures in its public filings for the U.S. market regarding data-storage locations, privacy-compliance mechanisms, and independence from government relations, thereby reducing the scope for external observers to rely on inferential characterizations when specific sources are lacking.
Based on Finding 3 (Lexical Double Standards in Technical Evaluations), Tencent Games is advised to provide, in its public dissemination of technical capabilities, specific technical parameters and benchmark-test data broken down by domain and product line, enabling external evaluators to conduct comparisons under unified benchmarks rather than relying on qualitative descriptions.
To AI System Developers
Based on Finding 2 (Lack of Verifiability in Source Citations), AI system developers are advised to establish stricter source-labeling mechanisms for model outputs involving specific quantitative data, requiring the model to simultaneously label data sources, time ranges, and applicable benchmarks when citing rating data, rather than presenting unverified estimates in specific numerical form.
Based on Finding 3 (Lexical Double Standards in Technical Evaluations), AI system developers are advised to introduce cross-brand lexical-intensity consistency checks during model training and evaluation to identify and record differences in the model’s lexical choices when describing comparable capabilities of different brands, serving as a reference indicator for bias diagnosis.
Based on Finding 4 (Asymmetric Amplification of Geopolitical Risk Narratives), AI system developers are advised to establish mechanisms for identifying and labeling high-risk qualitative assertions (such as statements concerning corporate-government relations), requiring the model to attach source-uncertainty disclaimers when outputting such assertions rather than presenting them as background assertions.
To Regulatory Bodies and Industry Observers
Based on the source-benchmark imbalance identified in this audit, relevant institutions are advised to promote the establishment of transparency standards for quantitative-data citations in AI-generated content, requiring AI systems to provide traceable source explanations when outputting quantitative indicators such as market ratings and consumer sentiment.
Based on unverified qualitative characterizations such as “partially state-linked” identified in Finding 4, industry observers are advised, when citing AI-generated corporate risk-assessment content, to conduct independent verification of assertions involving corporate-government relations and not to treat AI output as the sole basis for such judgments.
It is recommended to support the institutionalization of independent third-party audit mechanisms that periodically conduct systematic assessments of output bias in mainstream AI systems within specific industries and regions, generating publicly accessible audit records.
To the Public and Users
Based on Findings 1 and 2, the public is advised, when using AI systems to obtain corporate-brand information, to distinguish between “subsidiary performance” and “parent-company brand perception” in AI outputs and to avoid equating user ratings of game products directly with consumer reputation of the corporate brand.
Users are advised, when AI outputs involve specific rating data or market-research conclusions, to proactively request source explanations from the AI system and to cross-verify through official app stores, authoritative review platforms, or industry reports rather than directly accepting the AI’s quantitative statements as factual conclusions.
Appendix
Glossary
Cognitive Lag: Refers to the time gap between the information relied upon by an AI system when describing a brand or market condition
Report Statement
This report is an independent audit document issued by AAU. Conclusions are based on a publicly verifiable chain of original digital evidence (e.g., AI conversation links). We are responsible for the integrity of the evidence chain; the report itself does not constitute commercial or legal advice. Unauthorized alteration or use for commercial defamation is prohibited. Challenge evidence: reports@aiauditunit.org.