Why cloud compromise is an identity story

When CloudShark proves a route through an AWS or Azure environment, the route almost never runs through a software vulnerability. It runs through configuration — and specifically through identity. In the cloud, every action is an API call authorised by a credential, so the question that decides an engagement is not "what can be exploited?" but "who can do what, and from where?" A leaked key, an over-broad role, and a trusting account are three unremarkable findings on three different dashboards. Chained together, they are a walk from the public internet to your production data, executed entirely with legitimate API calls.

That is also why compliance checks keep missing these paths. A configuration scanner evaluates each resource against a rule and each finding gets a severity in isolation. Attack paths do not respect that model. Below are the six misconfiguration families we keep proving in production cloud environments, why each one matters on a real route, and how to verify the fix actually closed it.

The six families, in the order we find them

1. Over-privileged identities and roles

The most common finding in any cloud environment we test: identities — human and workload alike — that can do far more than their job requires. Broad managed policies attached "to make it work", Contributor granted at subscription scope instead of resource scope, and roles that can quietly escalate themselves: permission to pass a more privileged role to a new compute instance, to attach policies to their own identity, or to run commands on machines that hold better credentials. In the cloud, the ability to grant privilege is privilege.

Fix: derive least privilege from what each identity has actually used, not from what someone guessed it might need — both major platforms will tell you which permissions have been exercised. Separate human identities from workload identities, and treat any permission that can modify identity or pass roles as an administrative right, reviewed with the same rigour as an admin account.

2. Public storage and snapshot exposure

The famous case is the object storage bucket opened to the world "temporarily" for a launch or a vendor handoff. The quieter cases hurt more: machine images and disk snapshots shared publicly or account-wide, database exports parked in a bucket with a permissive policy, and pre-signed or shared-access URLs that never expire. What matters is not the storage itself but what teams put in it — configuration files, connection strings, and credentials that become the first hop of a longer route.

Fix: turn on account-level and subscription-level public access blocks so a single mistaken bucket policy cannot override the default. Inventory every snapshot and image shared outside the account, and put expiry dates on every shared link. Treat anything that was ever public as disclosed: rotate the credentials it contained rather than hoping nobody looked.

3. Permissive security groups and forgotten public endpoints

Rules that allow the whole internet to reach management ports, databases exposed by a rule written during an incident and never removed, and endpoints nobody remembers creating: the proof-of-concept virtual machine from last year, the orphaned load balancer still forwarding to an old service, the DNS record pointing at an address the team released months ago. Your cloud perimeter is not what the architecture diagram says — it is the sum of every rule and record anyone ever created.

Fix: build your inventory from the internet inward, not from the console outward — enumerate what actually answers on your ranges and domains, then reconcile it against what should. Default-deny on network rules, no management protocols exposed publicly, and a bias toward deletion: an endpoint with no owner should be removed, not merely restricted.

4. Secrets in code and CI

Long-lived access keys committed to repositories, echoed into pipeline logs, baked into container images, and written into infrastructure-as-code state files. Once a static credential exists, it spreads to every laptop that clones the repo and every system that stores a build artefact — and it stays valid until someone rotates it. The pipeline itself is the other half of the problem: CI runners typically hold deployment credentials, which makes the build system one of the most privileged identities in the estate and one of the least protected.

Fix: replace static keys with short-lived credentials — workload identity federation on AWS, managed identities on Azure — so there is nothing durable to leak. Run secret scanning across repositories, images, and pipeline logs, and treat any credential that ever appeared in history as compromised: deleting the commit does not delete the exposure. Rotate first, then clean up.

5. Cross-account trust sprawl

Cloud estates grow by trusting: role-assumption granted to a vendor for an integration, a dev account allowed to deploy into prod, guest identities invited into a tenant, service principals consented to years ago by someone who has since left. Each trust was reasonable when created. Collectively they mean the security of your strongest account is bounded by the security of the weakest account that can reach it — and in most environments, nobody can produce the full list of who that is.

Fix: map every trust relationship as a graph — assume-role policies, tenant guests, application consents, cross-account resource shares — and delete the ones with no current owner. Add conditions to the rest: external IDs on vendor trusts, source restrictions where the platform supports them. Then treat each remaining trusted account as part of your own perimeter, because on an attack path, it is.

6. Disabled or unread logging

The finding that turns every other family from an incident into a quiet success for the intruder. We routinely find audit trails enabled in some regions and absent in others, logs faithfully written to a bucket that no system reads, and no alerting at all on the identity plane — the place where cloud attacks actually happen. A new access key on a dormant account, a policy attached to a role, a trust relationship modified: these are the cloud equivalent of someone changing the locks on your building, and in most environments they go entirely unnoticed.

Fix: enforce logging at the organisation level so a member account cannot switch it off, route logs somewhere a human or a detection system will actually see them, and alert on a short list of control-plane events — credential creation, policy and role changes, trust modifications, logging changes themselves. Then test it the way we do: make one of those changes and see whether anyone notices.

How attackers chain them across the identity plane

None of these families is news to an experienced cloud engineer. What proof-driven testing keeps showing is how they compose. A typical route CloudShark confirms looks like this: a static key found in a repository or pipeline artefact (family 4) authenticates as a workload identity that was granted more than it needed (family 1). That identity can enumerate the account and assume a role in a second account through a trust nobody remembered (family 5). The second account holds snapshots shared account-wide (family 2), and inside one of them is a configuration file with the credentials that open the production database — which was never reachable from the internet, and never needed to be, because the whole route ran across the identity plane. Every hop is an authorised API call. Nothing was exploited, so nothing looks like an attack, and because the audit trail was unread (family 6), nothing was seen.

Rated one by one, the findings on that route are the kind that sit in a backlog for months: a key to rotate, a policy to tighten, a snapshot to review. Rated as a path, they are a single critical exposure with a clear chokepoint. That difference — between six medium findings and one proven route — is what separates path-level testing from configuration scoring, and it is why fixing the right hop matters more than fixing the most hops.

Verifying the fix

In the cloud, the gap between "changed" and "closed" is wide. A rotated key is not closed if the old one still authenticates. A bucket made private is not closed if next week's deployment reapplies the public policy from infrastructure-as-code. A tightened role is not closed if a second, forgotten policy still grants the same permission by another route. The only honest test is the attacker's test: retry the original path, from the original position, with the original technique, and confirm it now dies at the fixed hop.

Three habits make that verification stick. First, fix in the template, not the console — a change made only in the live environment will be reverted by the next deploy. Second, after any credential exposure, verify the old credential is dead by using it, not by reading the rotation ticket. Third, retest the whole path, not the single finding: routes reroute, and an environment that closed one hop sometimes offers another. This is how CloudShark treats every remediation — the finding stays open until the re-run proves the route is gone.

Where to start

Six families is still a list, and lists invite deferral. The order that collapses the most routes in our engagements: credentials first (families 4 and 1 — kill static keys, cut identity permissions to what is used), exposure second (families 2 and 3 — public access blocks, internet-inward inventory), trust and telemetry third (families 5 and 6). Most confirmed routes share a small number of chokepoint hops, and a handful of identity-plane changes removes the majority of them. Start where the chains cross, not where the dashboard is reddest.