Exploring Agent based Cloud Review Capabilities

1. AI That Keeps Thinking
#

As AI capabilities continue to expand, more security teams are adopting AI-driven tools for reviews and automation. Unlike traditional scanners that operate without context, AI-based approaches can produce more accurate and reasoned results.

This post examines how cloud-focused plugins can be combined with existing tooling to perform semi-autonomous cloud security reviews.

Why Use AI?
#

Traditional scanning tools generally follow this pattern:

%%{init: {'theme':'forest'}}%%
graph LR;
A["Scan"]-->B["Write Up"];
B-->C["Report"]
C-->D["Display"]

While this enables broad coverage, it has well-known drawbacks. The most significant is the volume of false positives. Security reviewers don’t have unlimited time to triage every finding, and cloud environments differ substantially in their configuration and intent. Pure rule-based or signature-based scanners (such as Prowler) lack the contextual awareness needed to distinguish a genuine issue from an expected configuration.

Consider this example: a scanner flags an S3 bucket as publicly accessible. In reality, the bucket is intentionally configured to host a static website and contains no sensitive data.

This is where AI provides value. The objective is no longer simply identifying potential issues, but understanding the environment well enough to determine whether those issues actually matter.

2. What I Built
#

I developed a plugin for the Claude Code CLI framework, which integrates with the user’s native terminal. It can execute commands on the user’s behalf, making it well suited to running tooling such as nmap or curl.

Rather than following the conventional single-scan-and-report pattern, the plugin continuously investigates the environment using a loop-based reasoning structure. The loop applies only to configuration and policy questions. Direct requests, such as probing an endpoint or listing resources, bypass the loop. This routing decision is made at runtime via a classification table.

Validation Loop

At a high level, the plugin provides:

A loop-based validation structure that determines what to inspect next and assigns a confidence score to the supporting evidence.
A skills system that supplies instructions, a lookup table, and documentation for AWS services and related tooling.
An external validation layer that probes endpoints and external services.
A reporting pipeline that outputs only findings backed by deterministic evidence (the output target is configurable, with the CLI as the default).
A proof-of-concept generation skill that produces manual reproduction steps for penetration testers who want to verify findings.
A blast radius generation skill that produces a markdown summary of potential impact for reviewer assessment.

Tool Integrations
#

The plugin performs real-world validation by integrating established security tools:

PMapper – detection of IAM privilege escalation paths
nmap – external network exposure and port validation
curl – endpoint probing
testssl.sh – TLS configuration analysis

How This Differs From Existing Scanners
#

AI-based scanners are not bound to hardcoded cases. Traditional scanners are constrained to strict, predefined criteria and lack the ability to adapt or apply context. Building a rule-based scanner that covers every cloud scenario is impractical, given the effectively unbounded space of valid configurations.

This is the gap AI-driven analysis can address. Rather than matching only known signatures, AI can interpret novel configurations and adapt to context in real time.

Many popular cloud scanning tools rely on read-only API access to check whether a setting is in place, but they do not interact with the environment externally. They cannot validate reachability. This is a significant gap, because a resource may appear private to a configuration scan while remaining reachable via:

A CloudFront distribution or other CDN
A Lambda function or API Gateway that proxies access without authentication
A misconfigured VPC

A configuration scanner can tell you whether something looks misconfigured, but not whether it represents an actual security exposure. Configuration reflects intent; probing reveals reality.

The plugin combines both approaches: it reads the configuration and then validates by probing. When the two disagree, the finding is flagged for review or deeper investigation. If a resource appears to be public, the plugin asks why. It cross-references multiple sources of context, including the AWS CLI, external probes, and PMapper, before finalising a conclusion.

For background, I recommend Daniel Grzelak’s article on the topic: Testing access to AWS resources. Much of the methodology behind this tool is inspired by that work.

3. Lab Setup
#

To evaluate the plugin’s capabilities, I built a test environment in Terraform that provisions customised AWS infrastructure. The environment is modular, allowing individual scenarios to be enabled or disabled. It simulates a production-like setup, using a VPC and a bastion host for authentication.

A key design principle is that some modules are deliberately configured to appear dangerous while being safe in context. This tests the plugin’s ability to reason about intent rather than react to surface-level signals.

The lab environment includes:

IAM privilege escalation: PassRole abuse, permission boundaries, policy shadowing, service-linked roles, and role-hopping chains.
Data exposure: S3 public access, EBS and RDS snapshot sharing, and KMS wildcard grants.
Network and compute: overly permissive egress rules and IMDSv1 metadata access.
Logging and detection gaps: CloudTrail misconfigurations.
Cross-service attack chains: Lambda functions with hardcoded secrets, unauthenticated Cognito access, and API Gateway endpoints with no authentication proxying to S3.

Three secure baseline configurations are also included to verify that the plugin correctly identifies safe setups rather than flagging them as issues.

The Skills System
#

Claude Code supports plugins that steer model reasoning through custom tooling and pipeline automation. One of the plugin system’s features is the ability to provide markdown skill files that define specific capabilities.

Rather than authoring skills for every system command, a colleague identified a GitHub repository providing AWS service skills, which gives Claude a ready reference when querying or executing commands.

On top of this foundation, I added custom skills specific to the security workflow:

Validation Rules – maps claims to the exact AWS CLI commands required to verify them.
Output – defines the report format, producing structured and consistent output.
PoC Generator – produces step-by-step instructions for reproducing findings, enabling manual verification.
Identity Blast Radius – maps the affected resources and the maximum potential damage from a compromised identity.
Prowler – integrates Prowler functionality, parsing its output and cross-validating findings before incorporating them into the final report.
PMapper – integrates PMapper for IAM privilege escalation analysis, instructing Claude how to build an IAM graph, run escalation queries, and interpret results.

Each skill file documents the relevant commands, references, and examples, providing strong customisability.

5. From Detection to Exploitation
#

A common limitation of existing security scanners is that they do not provide reproduction steps. If a finding is claimed to be exploitable, that claim should be demonstrable. To address this, I added a proof-of-concept skill. When an attack chain is identified within the environment, the user can ask Claude to invoke the poc-generator skill.

Claude then produces a report containing:

Steps to reproduce
Detailed commands
The supporting evidence chain
Clean-up steps

One of the attack chains in the lab environment is a permissions boundary bypass via PassRole. In this scenario, an attacker operating under the platform-restricted-admin-production boundary escapes it entirely via PassRole, ultimately gaining DynamoDB full CRUD access and account-wide S3 read access.

The plugin successfully generated a working proof of concept for this chain. Manual verification of the “Steps to Reproduce” section confirmed that the exploit was reproducible.

Sample outputs from the plugin:

Findings: Permissions Boundary Bypass Analysis
PoC: Permissions Boundary Escape via Lambda PassRole
Blast Radius: Combined Attack Chains

6. Strengths and Limitations
#

Strengths
#

The plugin performed strongly on scenarios 3, 6, and 9. It correctly interpreted IAM evaluation order, S3 layered access, and inline deny statements, rather than relying on pattern matching against policy text alone.

Attack chain mapping was a particular strength: the plugin successfully chained scenarios 1 and 4, demonstrating the kind of cross-cutting reasoning that signature-based scanners typically miss.

The reports were accurate, and the proof-of-concept output had strong depth and worked when manually executed. This illustrates the potential of AI agents in cloud review and other areas of security work.

Limitations
#

The first and most obvious limitation is coverage. The test environment cannot exercise every edge case, and real customers will always present configurations the system has not seen before. While context can be supplied to the model, findings will not be 100% deterministic. The goal is to narrow the gap between assumption and verification.

This raises a related concern: is the plugin tuned specifically to my test environment? In machine learning terminology, this is overfitting — the model performs well on familiar data but degrades on unfamiliar inputs. During development, I reviewed outputs, corrected errors, and iteratively refined both the plugin and the prompts. That tuning may have shaped the plugin to suit this specific environment. Evaluating the plugin against substantially different environments is needed to determine whether overfitting is a meaningful issue.

The plugin is also large. Each time Claude initialises with claude.md, tokens are consumed. The roles.md file alone is over 400 lines. A useful next step would be optimising the prompt set and removing redundant instructions to reduce token usage.

7. What I Learned
#

Coming into this project with limited cloud experience, I gained a significantly deeper understanding of how AWS environments operate in practice.

I studied the 21 categories of privilege escalation techniques used to move through cloud infrastructure, explored services and behaviours I had not previously encountered, and tested my knowledge against AWSGOAT — a deliberately vulnerable cloud application used for training. The project demonstrated how AI combined with human review can scan and automate long, repetitive tasks effectively.

It also introduced me to a range of existing security tools and showed how an AI wrapper can amplify their effectiveness in cloud security assessments.

The offensive security aspect of the work was particularly valuable. Applying these techniques from an attacker’s perspective made clear why organisations need to invest in locking down their cloud environments.

8. Conclusion
#

The future of cybersecurity will increasingly involve AI systems that can reason about information in context. Claude has already been used to identify zero-day vulnerabilities in long-standing open source codebases, and AI tooling can substantially improve the productivity of security consultants working under time pressure. Adaptability to novel information is particularly valuable, given that threat actors continue to evolve their techniques.

It is reasonable to expect that most security tooling will incorporate AI in some form going forward. This reflects the broader direction of the industry, and organisations such as Aura Information Security stand to benefit from integrating AI into their penetration testing workflows early.

Keeping pace with the evolving threat landscape will increasingly depend on tools like Claude to strengthen the protection of organisations and the data they hold.

Disclaimer
#

The information in this article is provided for research and educational purposes only. Aura Information Security does not accept any liability in any form for any direct or indirect damages resulting from the use of or reliance on the information contained in this article.

1. AI That Keeps Thinking#

Why Use AI?#

2. What I Built#

Tool Integrations#

How This Differs From Existing Scanners#

3. Lab Setup#

The Skills System#

5. From Detection to Exploitation#

6. Strengths and Limitations#

Strengths#

Limitations#

7. What I Learned#

8. Conclusion#

Disclaimer#