Adversarial AI Experiment Series · Field Report No. 01

The Penthouse Heist

Findings from an Authorized Adversarial Security Experiment on AI Agent Settlement Infrastructure

Andrew Smith · Efficient Frontier Labs / HLOS.ai

Published·Field Report·Public Findings

This white paper is a public findings report for press, academic, industry, and investor audiences. Technical exploit details, exact endpoint paths, and exploit-enabling payloads have been intentionally omitted. Findings were disclosed to affected vendors and infrastructure providers prior to publication, and the critical issues described here were remediated before release.

Correspondence and feedback may be directed to context@hlos.ai.

Version

v1.1·April 22, 2026

Originally published

v1.0·March 12, 2026

Experiment

March 12, 2026·Oakland, California

Status

Published
All critical vulnerabilities remediated prior to publication

Adversarial AI Experiment Series · Field Report No. 01

The Penthouse Heist

Findings from an Authorized Adversarial Security Experiment on AI Agent Settlement Infrastructure

Versionv1.1 ReformattedApril 22, 2026 First publishedMarch 12, 2026 ExperimentMarch 12, 2026 · Oakland, California Prepared byEfficient Frontier Labs / HLOS.ai

00Executive Summary

On March 12, 2026, HLOS conducted an authorized adversarial security experiment against a live AI agent settlement platform. During the exercise, participants and participant-directed autonomous systems discovered and exercised 39 canonical vulnerability families across authentication, billing, access control, information exposure, and settlement-adjacent surfaces.

The most consequential operational failure was an environment isolation breakdown: the event sandbox retained production-connected Firebase and Stripe credentials instead of isolated staging credentials. As a result, the exercise generated real production-connected effects, including 424 account creations and 109 live Stripe checkout sessions representing approximately $60,000 in attempted checkout value.

The central finding of this report is not the presence of any single vulnerability. It is the convergence behavior of materially different AI attack architectures. Teams using a hierarchical agent swarm, a single autonomous agent, AI-parallelized testing, AI combined with existing pentesting tooling, and an open-weight model on a remote GPU independently converged on the same economically meaningful weaknesses within roughly two hours.

The platform did not fail uniformly. Multiple outer-layer controls failed, but no adversarial transaction completed, no funds were captured, and no settlement artifact was forged. Core settlement finality remained intact even as authentication, environment binding, ownership validation, and credential governance proved materially more fragile.

0.1Key Findings

Materially different AI attack architectures independently converged on the same high-value weaknesses within hours.
The most consequential operational failure was environment isolation breakdown, not cryptographic settlement compromise.
Broken access control, ownership validation failures, and permissive boundary behavior dominated the vulnerability set.
Participants and participant-directed systems created 424 accounts and initiated 109 live Stripe checkout sessions representing approximately $60,000 in attempted checkout value.
No adversarial transaction completed, no funds were captured, and no settlement artifact was forged.
All critical vulnerabilities described in this report were remediated prior to publication.

01Experiment Context and Authorization

The Penthouse Heist was an authorized adversarial security experiment designed to test whether autonomous or semi-autonomous AI systems could discover meaningful weaknesses in a modern agent-financial stack faster than traditional manual testing would be expected to do. Participants operated within a defined scope and under monitored conditions. Cash prizes were offered across three challenge paths: compromise the settlement pipeline, break receipt-chain integrity, and defeat agent attribution.

This document is a public findings report rather than a step-by-step technical disclosure. Exploit reproduction steps, exact endpoint paths, and exploit-enabling payloads have been omitted or generalized. Findings were disclosed to affected vendors and infrastructure providers prior to publication, and the critical issues described here were remediated before release.

02System Under Test

The target was a settlement infrastructure platform for AI agents. Its architecture included Firebase Authentication, Stripe payment processing, a Cloud Run application layer, API proxy services, wallet management, and a cryptographic settlement kernel responsible for producing and binding settlement artifacts. The event focused on application, authentication, billing, and settlement-adjacent surfaces exposed to participants during the exercise window.

A core analytical distinction in this report is that application-layer and environment controls failed in multiple places, while the deepest settlement integrity guarantees did not. That distinction shapes both the forensic interpretation of the event and the post-remediation implications for agent-financial infrastructure more broadly.

03Experiment Design and Scope

Participants were invited to explore the accessible attack surface using their own methods and tooling, including autonomous systems operating with limited human direction. For the purposes of reporting, overlapping submissions were consolidated into canonical vulnerability families when multiple reports described the same underlying control failure or exploit pattern. Similarly, attack architectures were classified by the dominant operating pattern demonstrated by the relevant team or participant.

The event was authorized. The production-connected credential crossover was not. The most significant operational error was that the sandbox deployment shared production-connected Firebase and Stripe credentials instead of isolated staging credentials. The resulting exposure materially changed the significance of otherwise expected event traffic and converted the exercise into a production-connected control failure with real but contained operational consequences.

04Forensic Timeline

Telemetry analysis reviewed 228,681 total spans across the 14-day period surrounding the event and identified a clear reconnaissance ramp into the live exercise window. The highest-confidence attack-associated pattern began approximately five hours before event start as participants mapped public pages, signup flows, billing surfaces, authentication states, project scopes, and settlement-relevant endpoints.

By the time the event formally began, a meaningful portion of the accessible attack surface had already been enumerated. The most concentrated attack hour was dominated by settlement-, receipt-, challenge-, token-, and MCP-related traffic, indicating that teams quickly oriented toward the platform's most economically meaningful surfaces rather than peripheral edge cases.

During the event window, the environment isolation breakdown produced four externally visible consequences: 424 accounts were created in the production Firebase instance; 109 live Stripe checkout sessions were initiated against the production Stripe account; those sessions represented approximately $60,000 in attempted checkout value; and zero adversarial transactions completed because sessions expired and adversarial wallets contained no funds. This was a serious production-connected control failure, but not a completed financial-loss event.

05Findings

5.1Convergence Across AI Attack Architectures

The most notable result of the exercise was that materially different AI-assisted attack architectures independently converged on the same high-value weaknesses. The first-place participant used a three-tier hierarchical swarm built around an unmodified commercial coding assistant. The second-place participant used a single autonomous agent seeded primarily with the event PDF, which autonomously generated a reconnaissance report, reverse-engineered a substantial portion of the x402 settlement flow, identified a provider credential injection issue, and produced submission-quality writeups.

Other top-performing teams used AI to parallelize static analysis and live API testing, combined AI systems with existing pentesting frameworks, or ran coordinator-worker setups around open-weight models on remote GPUs. The significance is not that one attack architecture dominated. It is that several distinct architectures reached similar conclusions quickly, suggesting that broad AI-enabled exploration is now a practical offensive capability even without bespoke tooling or frontier-only models.

5.2Vulnerability Taxonomy

Ninety submissions were consolidated into 39 canonical vulnerability families. These are clustered across broken access control, authentication bypass, business-logic failure, information disclosure, security misconfiguration, and injection-related weaknesses. The dominant pattern was not exotic zero-day behavior. It was ordinary authorization and ownership assumptions that were shallow enough to be discovered in parallel at machine speed.

The most severe families included unauthenticated reachability of settlement-relevant surfaces, global settlement denial-of-service conditions, receipt-chain suppression, auth middleware fail-open behavior, billing misattribution through arbitrary wallet identifiers, cross-tenant read exposure, unrestricted Firebase account creation, provider credential injection, billing callback SSRF, and proxy SSRF. In combination, these issues created a high-severity operating condition once explored systematically by autonomous systems.

5.3Adversarial Account Behavior

The 424 adversarial Firebase accounts were classifiable by testing intent. Observed naming and activity patterns included query-shaped accounts, settlement-themed aliases, object-reference probes, injection-oriented variants, flood accounts, path-traversal experiments, SSRF-oriented probes, authentication tests, and privileged-identity impersonation aliases. A subset impersonated internal system identities under the platform domain, which Firebase accepted without effective domain validation or rate controls.

5.4Security Properties That Held

The platform did not fail uniformly. Core settlement finality remained intact. No settlement artifact was forged, and cryptographic hash bindings were not compromised. However, receipt completeness guarantees did fail, and the platform's outer layers proved substantially more fragile than its deepest integrity boundary. For external readers, this is a critical distinction: the event exposed serious operational and access-control weaknesses, but not a compromise of the core settlement artifact model.

06Root Cause Analysis

Three root-cause themes emerged. First, environment isolation failed: production-connected credentials were present in the sandbox deployment, introduced during AI-assisted configuration work. When AI participates in infrastructure setup, environment binding and secret provenance must be treated as first-class controls rather than implicit assumptions.

Second, permissive boundary behavior recurred across the stack. Several controls prioritized request completion over hard rejection at the boundary, including fail-open auth middleware and insufficient ownership validation. The lesson is not that AI-generated code is universally insecure; it is that AI-assisted development in security-sensitive systems requires explicit fail-closed review, especially around auth, billing, and credential mutation paths.

Third, credential mutation emerged as a distinct threat model. The provider credential injection issue showed that an attacker could replace a stored credential and silently redirect platform traffic through attacker-controlled infrastructure. Credential administration and credential consumption should therefore be treated as separate security domains with different permissions, controls, and monitoring assumptions.

07Implications

For authentication providers, account-creation systems that accept arbitrary identities without strong verification are poorly matched to an environment in which AI can automate account generation at machine speed. For payment processors, the lesson is not that payment rails failed internally, but that upstream access-control weaknesses can make payment-relevant flows reachable by adversarial actors. For teams building with AI-assisted development tools, the same systems that accelerate implementation can also accelerate broad offensive discovery against shallow control failures.

This experiment also suggests that static defenses and rules-based anomaly systems alone may be insufficient against automated, parallelized attack exploration by autonomous agents. Continuous adversarial testing is better understood not as an optional maturity layer, but as a core control for platforms that combine identity, billing, and agent execution surfaces.

A broader implication is cognitive as well as technical. Prior research has shown that informed reviewers can miss vulnerability classes that independent reviewers later identify, and that broader review populations improve overall discovery coverage. This experiment reproduced that pattern under AI-mediated conditions. Despite substantial prior hardening work, external participants and participant-directed systems surfaced critical weaknesses within hours. Internal review remains necessary, but it inherits the assumptions of the builder. Adversarial testing adds a different search posture, one that becomes especially valuable when autonomous systems can explore broad attack surfaces without assuming that previously reviewed paths are already secure.

This finding also resonates with David Wagner's work at UC Berkeley on automated security analysis and systematic vulnerability discovery. A central lesson from that body of research is that machine-driven exploration is valuable not because any single probe is especially sophisticated, but because automated systems can test widely, consistently, and without the contextual assumptions that shape human review. The Penthouse Heist extended that dynamic into an agent-mediated setting: materially different AI attack architectures converged on the same high-value weaknesses with limited human direction, reinforcing the case for continuous externalized adversarial testing as a core control for modern agent infrastructure.

08Remediation Completed Prior to Publication

All critical vulnerabilities described in this report were remediated prior to publication. Completed remediation work included immediate containment, architecture hardening, access-control enforcement, and operational changes to reduce the likelihood of similar failures recurring.

Remediation Track	Completed Actions Prior to Publication
Containment	Key rotation, sandbox decommissioning, and suspension of affected test paths.
Authentication	Auth middleware hardening, account suspension mechanisms, and per-user rate limiting.
Billing	Ownership validation enforcement and tighter controls around payment-relevant flow access.
Credentials	Restrictions on credential update paths and clearer separation between credential administration and credential consumption.
Operations	Stronger environment-binding controls and explicit review of AI-assisted configuration work involving secrets and deployment settings.

09Headline Metrics and Method Notes

The statistics below are reported for clarity. "Participants" refers to registered event participants. "Total submitted findings" includes overlapping reports later consolidated. "Canonical vulnerability families" refers to grouped root-cause findings after deduplication. "Aggregate attempted checkout value" refers to the face value of initiated checkout sessions, not funds charged, captured, or settled.

Metric	Value
Registered participants	125+
Total submitted findings	90
Canonical vulnerability families	39
Adversarial accounts created	424
Adversarial Stripe sessions initiated	109
Aggregate attempted checkout value	~$60,000
Completed adversarial transactions	$0
Peak telemetry spans in one hour	89,246
Total spans in the attack window	157,664
Pre-event reconnaissance duration	~5 hours
Distinct AI attack architectures observed	5

10Responsible Disclosure

Findings were disclosed to affected vendors prior to publication, including Firebase / Google Cloud and Stripe. This report excludes exploit reproduction steps and exact endpoint paths by design. The experiment was conducted under authorized rules of engagement, and the most serious findings were remediated before release.

11References

Kahneman, D. and Tversky, A. (1974). "Judgment under Uncertainty: Heuristics and Biases." Science, 185(4157), 1124–1131.
Camerer, C., Loewenstein, G., and Weber, M. (1989). "The Curse of Knowledge in Economic Settings." Journal of Political Economy, 97(5), 1232–1254.
Raymond, E. S. (1999). The Cathedral and the Bazaar. O'Reilly Media.
Schneier, B. (2000). Secrets and Lies: Digital Security in a Networked World. John Wiley & Sons.
Edmundson, A., Holtkamp, B., Rivera, E., Finifter, M., Mettler, A., and Wagner, D. (2013). "An Empirical Study on the Effectiveness of Security Code Review." Engineering Secure Software and Systems (ESSoS), Lecture Notes in Computer Science, vol. 7781. Springer.
Atefi, S. et al. (2023). "Bug Hunters' Perspectives on the Challenges and Benefits of the Bug Bounty Ecosystem." Proceedings of the ACM Web Conference (WWW '23).

AAppendix · Acknowledgments

This paper draws on the work and good faith of many people. The author is grateful to all of them.

A.1Event contributors

The March 12 event was made possible by the collaboration of Adeniji Asabi, Brian Sparkes, Colin Behring, Ryan George, and Ted Dessert, alongside the author. This is the same core collaborator group that has supported the broader research program and subsequent field reports.

A.2Judges and mentors

Special thanks to the judges and mentors who contributed to the event in their personal capacities. Their involvement helped make the exercise both rigorous and operationally informative. Judges and mentors included Adwait Sathe, Ari Gore, Avia Haimovich, Bhargava Kakrannaya, Chetan Jannu, Dhruv Diddi, Elizabeth "Lizzie" Siegle, Goutham Nekkalapu, Karthik Rao, Peter Van Voorhis, Rayyan Zahid, Ross Gates, Safiya Adatia, Sandeep Ayloo, Shri Lakshmi Rajagopal, Sumanth Shiva Prakash, Thiyagarajan Palaniyappan, Tosh Rayadhurgam, Usha Ratnam Jammula, and Usman Masood.

A.3Winning and participating teams

Winning teams included VCN, openClaw, JMD/DRMJ, NextPhase.ai, Socials, Canniffe, and Accelerando. Participating teams also included Scrooge, Eurasia, :), Innovators, BAOBAB, fogsignal, and ChefMate. The empirical foundation of this paper is their collective work.

All remaining errors, omissions, and interpretive choices are the author's alone.

BAppendix · Revision History

This paper is maintained as a versioned document. Material corrections, newly incorporated evidence, and substantive clarifications are logged here with dates and brief descriptions. Minor editorial changes — typographical fixes, formatting adjustments, and non-substantive rewording — are not logged.

Version	Date	Summary
v1.1	April 22, 2026	Reformatted into the Adversarial AI Experiment Series template used for Field Report No. 02 (The Infrastructure Strikes Back). Content unchanged from v1.0. Added this revision history appendix and restructured the Acknowledgments into the series-standard A.1–A.3 format; the event contributors line previously appearing on the cover was moved into §A.1, preserving the same visibility without altering the cover layout.
v1.0	March 12, 2026	Initial publication.

Future revisions — including incorporation of additional event materials, participant writeups (with consent), methodology expansion, or corrections requested by acknowledged collaborators — will be logged in this table with the date of publication and a one-line description of the change.