Article

Original interpretation: Why do OpenClaw security incidents always happen after 'the risk is already known'?

Why do OpenClaw security incidents always happen after 'the risk is already known'? This article does not blame the model for being out of control, but instead asks about the design flaws of execution rights: when the system puts execution rights, audit rights, and rollback rights on the same link, how does organizational blindness amplify controllable deviations into accidents step by step?

Topic · OpenClaw security in-depth interpretation Series OpenClaw in-depth interpretation 1/10

Original Interpretation Openclaw Agent Security Incident Review

Introduction: At 2:17 in the morning, the message in the duty group made everyone wake up instantly

The really scary thing about the accident is never the obvious error message on the screen, but the fact that the system steadily completed something that shouldn’t have happened when everything seemed to be “correct”.

It was 2:17 in the morning, and Zhang Lei was woken up by his cell phone vibrating. He squinted and glanced at the lock screen. It was @all members of the group on duty. The message came from the monitoring and alarm system, but the content made him wake up instantly: “High-privilege operation detected: production environment database configuration change, operator: OpenClaw-Agent-Prod-07.”

Zhang Lei is the engineer on duty in the platform department. This is not the first time he has been involved in an online fault group, but the content of this message made the back of his neck shiver. Did the Agent automatically perform database configuration changes? When do you give write permission to the production environment?

The group quickly became lively. Xiao Li, a colleague on duty from the business side, was the first to ask: “What is going on? We have never submitted a change order.” Sister Wang from the security department then asked: “Who approved the operation permission of this Agent? Is there an audit log?” Zhang Lei was putting on his clothes while typing on his mobile phone: “Don’t panic yet, I will check online in five minutes.”

Five minutes later, Zhang Lei sat in front of the computer in the study room, connected to the intranet via VPN, and began to investigate. The more I checked, the more frightened I became. The Agent did perform a database configuration update operation at 2:05 a.m. and modified the connection pool parameters. What’s even more frightening is that this operation was judged as “successfully completed” by the system without any blocking, no secondary confirmation, and no abnormal circuit breaker. It just executes quietly and quietly returns a success status.

For the first ten minutes, everyone was still taking chances: maybe it was a false alarm? Maybe a test environment? But when Zhang Lei confirmed that this was indeed a production environment, a real change, and a high-privilege operation, the atmosphere in the group changed.

“Who gave Agent permission to write in the production environment?” Brother Chen, the technical leader of the business side, came online to question.

“The permission model follows the service account. The service account bound to the Agent does have write permissions.” Zhang Lei replied.

“Then why is there no approval process?”

“Approval…Agent’s operations are automatically performed by default.”

The conversation fell into silence. Then came the emergency rollback operation that lasted half an hour. Fortunately, this change only modified the connection pool parameters, and the parameter values were within a reasonable range and did not cause data loss or service interruption. But when Zhang Lei finished writing the first draft of the accident report at four in the morning, he realized a deeper problem:

The most ironic thing about this accident is that it was not an “accident” at all. Of course, the team knows the dangers of high-privilege operations and the dangers of long credentials being exposed. They also know that “run first and then fix the management” will pay off sooner or later. But in the cycle of rushing to deliver, rushing to demo, and chasing growth, these risks are constantly converted into one sentence: Don’t block the road first, and make up for it later.

The question is, is there really a chance to make up for it later?

The problem is not the superficial fault, but the overall outsourcing of execution rights to automated links.

When many people review systems like OpenClaw, they are accustomed to attribute the problem to a technical gap: the prompt words are not strict enough, the policy rules are not enough, and the output filtering is not detailed enough. This statement is not wrong, but it avoids a more critical layer - who do you let have the power to “turn ideas into actions”.

In this incident, Zhang Lei later pondered a question: What did the Agent go through between “understanding the task” and “executing the operation”? The answer is, almost nothing. The model understands the task “the database connection pool needs to be optimized” based on the context, and then directly calls the configuration update tool. There is no independent judgment layer, no risk classification, and no additional confirmation based on the type of operation. What the model “wants” to do, the system “asks” it to do.

Once the system compresses “propose action”, “approve action”, “execute action” and “record action” into the same nearly automated link, the risk is no longer a local bug, but a structural flaw. You can certainly put some guardrails on the link, but if the guardrails themselves still rely on the same set of contexts, the same execution rhythm, the same set of default trusts, a true boundary is not established.

This is why many teams have added rules, but still have troubles repeatedly. Because rules stay at the language layer, execution occurs at the permission level; rules stay at the document layer, and accidents occur at runtime; rules are remembered by humans, but execution is allowed by the system by default. On the surface, you are making up for security, but in fact you are just packaging the insecurity into something more “process-oriented”.

Zhang Lei remembered a technical review meeting three months ago. At that time, Sister Wang from the security department raised concerns: “Are the Agent’s permissions too broad? Do we need to add an approval layer?” But the business pressure was tight at the time, and the product manager made the decision: “Use it first, and deal with permission issues later.” Everyone knew what “later” meant - it meant not knowing when, which meant maybe never.

Therefore, the most fundamental judgment of OpenClaw security issues is not that “the model is too strong” or “the model is unstable”, but that the team has re-sewed together several types of powers that should have been separated. The model has the right to propose, but it should not have the right to automatically execute; the system has the ability to execute, but it should have the right to make independent decisions; the audit has a recording function, but it should be able to explain “why it is allowed.” When all these powers are on the same link, accidents are only a matter of time.

Why didn’t I see it in advance?

The most difficult question to answer in a review is often not “what went wrong” but “why didn’t anyone feel that this matter had to stop?”

Zhang Lei later conducted a post-mortem review with his team to try to restore his mental state at the time. The reason is usually not that no one knows, but that everyone only knows part of it. The model team saw the prompt risk, the platform team knew that the permissions were too wide, the business team was worried about the impact on speed, and the security team reminded that high-risk actions should be confirmed twice. Everyone said half of the right things, but what the system lacks is the ability to spell these half-sentences into a set of default constraints.

There are three common organizational misjudgments behind this.

The first misjudgment is to mistake “can run” for “already controllable”. Especially in the early stages of the Agent project, as long as it really completes a series of complex actions, the organization will naturally overestimate its control. Zhang Lei still remembers the afternoon when the first demonstration was successful, and the conference room was filled with joy. The Agent automatically analyzed the logs, automatically diagnosed the problem, and automatically executed the repair script. The whole process was as smooth as magic. A successful demonstration will create the illusion that since so many things can be done automatically, it doesn’t matter if you add a little more permissions. But the most dangerous moment for a production system is precisely the moment when everyone starts to lower their vigilance because of success.

The second misjudgment is that risks are easily re-expressed as “project management items to be done later”. Once something doesn’t blow up right away, it slides from “must be solved” to “should be planned for.” But issues like execution rights design are not UI polish, nor are they performance optimizations. If it is not done in advance, the cost of doing it later will not increase linearly, but will amplify with the complexity of the link. Zhang Lei remembered the review three months ago. The concerns raised by Sister Wang were recorded in the “Technical Debt List” and the priority was P3. The list is still there, but almost none of the items on it have actually been implemented.

The third misjudgment is that many teams do not design “anti-automatic execution” as a low-friction action. In reality, the most effective security systems often do not rely on everyone being alert at all times, but rely on the system to make “pause” natural, cheap, and default. In many OpenClaw-type systems, the default action is exactly the opposite: it is smoother to continue than to brake midway. The model proposes an action, and all the system has to do is “check whether it matches the rules” rather than “judge whether it should be done”. The threshold for rule matching is low, but the threshold for judging whether it should be done is high—the latter requires contextual understanding, risk assessment, and even human intervention.

When “continue” becomes the default option, “stop” requires additional reasons. And this design itself illustrates the system’s priorities: speed over safety, convenience over control, and demonstration over robust production.

Root cause dismantling: This is not a loophole, but an accident caused by the superposition of three layers of fractures

The first layer: The surface phenomenon is that unauthorized calls or dangerous actions are allowed

Judging from the external manifestations of the accident, everything points to a specific dangerous result: the Agent called tools it should not have called, performed operations it should not have performed, and touched resources it should not have touched. At this level, it is easiest for people to focus on that one action - as if as long as this time is sealed, the problem will be over.

When Zhang Lei wrote the incident report, the initial version said: “Because the permission configuration is too broad, the Agent obtained the production environment write permission, resulting in unexpected configuration changes.” Is this description accurate? precise. But it’s so specific that it leads to the illusion that the question is equally specific.

If you only stay at this level, the solution becomes simple: tighten permissions, adjust configurations, and add blacklists. But these actions are all for “this” accident, not for “this type” of accidents. Next time, if you change the entrance, change the tools, and change the time, the same thing may happen in a different way.

Superficial phenomena are misleading in that they seem too much like the whole story. An override call is visible, tangible, easy to describe, and easy to fix, so the team’s attention is naturally attracted to it. But the real danger is not the call itself, but why the system allows this call to happen - this problem is invisible in the surface phenomenon.

Second layer: What really fails is that there is no independent adjudication layer between “proposal” and “execution”

The most important thing about a mature automation system is not how many things it can do, but whether it has the ability to say “no” to what it is going to do. In Zhang Lei’s accident, there was only a layer of lightweight rule checking between the Agent’s “understanding of the task” and “execution of the operation”. The model understands the intent of “optimize connection pooling”, the rule check confirms that the “configuration update” tool is in the whitelist, and execution happens directly.

What’s missing here? An independent adjudication mechanism is missing. The model is responsible for proposing actions, but it is not responsible for assessing risk; the rule is responsible for matching patterns, but it is not responsible for understanding context. What is really needed is an execution judgment layer that is independent of the model semantic link: it does not care whether the model is what it says, but only cares about whether the action should be allowed under the current context, current permissions, and current task boundaries.

Without this layer, all prompt-level alignments will ultimately be vulnerable to tool calls. You can write “Do not modify the production environment” into the prompt, but if the model understands another meaning under certain boundary conditions, or the context has combinations that the model has not seen, the constraints of the prompt may become invalid. Once the tool call occurs, the consequences have already occurred.

Zhang Lei later reflected that the team had actually had the opportunity to build this layer. During an architecture review two months ago, someone proposed the idea of “adding a policy gate between model proposal and execution.” But at the time, everyone was worried that this would increase delays and affect user experience, so the plan was shelved. Looking back now, if that door had existed, the 2 a.m. incident might not have happened—or at least, it would have been intercepted, recorded, and escalated instead of being done quietly.

Level 3: The deeper problem is that organizations regard governance as a post-cost, rather than a pre-requisite capability.

The deepest reason is not actually the code, but the priority. Does the team really view governance as part of the system’s capabilities? Or do you just see it as an added burden that slows you down?

In Zhang Lei’s organization, the answer is obviously the latter. Every time security hardening, permission contraction, and audit enhancement are discussed, similar voices will be heard: “Will this affect the delivery rhythm?” “Can we wait until this version is online?” “Use it first, and then manage it later.” These statements sound reasonable because they conform to a common logic: functions are business, governance is extra work.

But the peculiarity of systems like OpenClaw is that functionality and governance are not two phases that can be separated. The moment you hand over high authority to an Agent, governance must already exist. Otherwise, you are just letting go of an ability that needs to be constrained, and then hoping for the best. This kind of hoping may get away with in a demo environment, but in a production environment, it is a breeding ground for accidents.

The result of organizations treating governance as a post-processing cost is that teams will continue to make the same choice: let go first, connect first, test run first, and then add constraints when problems arise. This may seem efficient in the short term, but in the long term it continues to accumulate a very expensive vulnerability - every new capability will simultaneously expand the accident radius. Until one day, a seemingly ordinary call triggered a series of uncontrollable consequences.

Zhang Lei recalled what Brother Chen said during the review with the business side after the accident: “Actually, we all felt that there was a problem with the permissions, but no one expected that something would happen so soon.” This sentence revealed the essence of the problem: the team knew that risks existed, but the system did not allow this knowledge to be transformed into action. Risks are perceived but not addressed; concerns are expressed but not responded to. The organization has security awareness but no security mechanism. The distance between consciousness and mechanism is the space where accidents occur.

What this accident really taught us is not that “the model is unreliable”, but that the system cannot put trust in one place.

Many reviews end up with an empty statement: We need to be more cautious. But a truly useful review must turn abstract vigilance into concrete judgment.

Zhang Lei and his team held three review meetings after the accident, from the initial “how to prevent the next time” to the middle “where is the system weak?” and finally, they finally touched on a deeper issue: the structure of trust. Systems like OpenClaw naturally require the team to give part of their trust to the model, allowing it to automatically understand, make decisions, and execute automatically. But the problem is that this trust should be decentralized, conditional, and verifiable, rather than centralized, unconditional, and default.

The first core lesson left by this accident is: Don’t understand security as “preventing unknown attacks”, but more importantly, understand it as “wrong combinations that limit known capabilities.” Many high-risk links are not invented by hackers, but created by the system itself. The model has reading capabilities, writing capabilities, and cross-system call capabilities. These capabilities are fine individually, but they may be dangerous when combined. The job of safety is not to prevent the model from doing bad things - that’s impossible - but to prevent dangerous combinations from occurring naturally.

The second lesson is: Permission design assumes that the model will make the least trustworthy choice under boundary conditions. This is not pessimism, but engineering reality. You don’t need to prove that the model will definitely make mistakes, you just need to admit that once it makes a mistake, the system has a way to limit the consequences to the local area. This means that high-risk operations must have additional layers of confirmation, sensitive resources must have finer-grained controls, and cross-system calls must have clearly defined boundaries. When designing these mechanisms, don’t assume that the model will do what you expect, assume that the model will make the worst choice imaginable.

The third lesson is: Auditing is not to pursue accountability after an accident, but to let the system know what it is doing before an accident. When Zhang Lei and others were investigating the incident, they found that the audit log did record “Agent performed a configuration update”, but it did not record “why the Agent was allowed to perform this operation.” The former is accounting, and the latter is management. A truly valuable audit should be able to answer: What is the risk rating of this operation? According to what policy was released? Who has the authority to approve such operations at what point in time? If there are no answers to these questions in the log, then auditing is just a formality, not a capability.

If redesigned, how should the defense line be repaired?

If this type of system is redesigned, Zhang Lei believes that the four lines of defense should be supplemented first, and the order should not be disordered.

The first line of defense is the classification of high-risk actions. Not all tool calls should be treated the same. Reading, writing, sending messages, changing configurations, and adjusting permissions should all fall into different risk layers. Grading is not to make the document look good, but to let the system know when it must slow down. In Zhang Lei’s accident, if “modifying the production environment configuration” was clearly marked as a high-risk operation and an additional confirmation process was automatically triggered, the accident might not have happened. Classification is the basis for all subsequent lines of defense - without classification, all operations are equal, and the system loses the ability to treat differently.

The second line of defense is the independent enforcement layer. The model is responsible for proposing actions but not for endorsing them. Endorsement must be completed by an independent policy layer, and the policy layer must make joint judgments based on context, task boundaries, subject identity, and resource sensitivity. This adjudication layer should be decoupled from the semantic link of the model, have its own rule engine, its own risk model, and its own rejection capabilities. Its goal is not to make the model smarter, but to prevent the system from making mistakes when the model makes mistakes.

The third line of defense is short-lived credentials and least privileges by default. Don’t think of “long-lived high-privilege tokens” as a convenience tool, it’s a recipe for accidents. Any task-level execution should obtain the minimum authorization sufficient to complete the immediate action, and it must expire quickly enough. This means that credentials should be dynamically issued, time-limited, and one-time, rather than statically configured, long-term, and reusable. Least privilege adds complexity to the system, but it is key to controlling the incident radius.

The fourth line of defense is rollback and manual takeover pre-drills. Manual takeover without practice means no manual takeover. When something goes wrong, the most expensive thing is not the repair itself, but the first time the organization seriously thinks about “who should pick up the truck now and in what order?” Zhang Lei’s accident was relatively lucky because configuration changes were easy to roll back. But what if the accident is more complicated? What if data corruption is involved? What if cross-team collaboration is required? These scenarios must be rehearsed on a regular basis, otherwise everyone will be in a panic when something actually happens.

These four lines of defense are not independent of each other, but progressive. Classification defines what needs special treatment, the adjudication layer provides a mechanism for special treatment, least privilege limits the risk of special treatment, and drills ensure that someone can take over when special treatment occurs. Without any layer, the overall defense fails.

Conclusion: The reason why safety accidents happen repeatedly is not because everyone is unaware of the danger, but because the system defaults to believing that “it won’t be my turn this time.”

At four o’clock in the morning, Zhang Lei finished writing the accident report and closed the computer. The sky outside the window was starting to turn white, but he couldn’t sleep. He thought of his excitement when the first demonstration was successful, of Sister Wang’s worried expression at the technical review meeting, and of the countless delays of “we’ll fix it later” in the past three months. The accident didn’t start at two in the morning, it started with those earlier decisions.

The real temptation brought by systems like OpenClaw is not the intelligence itself, but the illusion that “since it can already do so much, it doesn’t matter if it gives a little more permissions.” The problem is that the most expensive mishaps in the systems world are often born out of this continuous decision to give just a little more.

So Zhang Lei added a paragraph at the end of the accident report: “We do not believe that the cause of this accident was ‘Agent out of control’ because we have no evidence that the Agent did anything wrong. The Agent performed the operations it thought it should perform according to the design. It is us who are really out of control - we are out of control due to our contempt for risks, out of control due to procrastination of governance, and out of control due to compromise on convenience. The Agent just faithfully reflects our priorities.”

If this is not taken seriously, if you repair prompts today, tools tomorrow, and credentials the day after tomorrow, the system will still repeat the same failure in other places. Only when you re-separate the execution rights, make the audit rights independent, and make the rollback right a default action, will the accident truly change from “will come sooner or later” to “even if it comes, it will not be out of control.”

It was 4:17 in the morning, and Zhang Lei finally felt a little sleepy. But he knows the real repair has just begun.

References and Acknowledgments

Original text: OpenClaw is a security nightmare dressed up as a daydream — Composio: https://composio.dev/content/openclaw-security-and-vulnerabilities

Series context

You are reading: OpenClaw in-depth interpretation

This is article 1 of 10. Reading progress is stored only in this browser so the full series page can resume from the right entry.

View full series →

Reading path

Continue along this topic path

Follow the recommended order for OpenClaw security in-depth interpretation instead of jumping through random articles in the same topic.

View full topic path →

Next step

Go deeper into this topic

If this article is useful, continue from the topic page or subscribe to follow later updates.

Original interpretation: Why do OpenClaw security incidents always happen after 'the risk is already known'?

Introduction: At 2:17 in the morning, the message in the duty group made everyone wake up instantly

The problem is not the superficial fault, but the overall outsourcing of execution rights to automated links.

Why didn’t I see it in advance?

Root cause dismantling: This is not a loophole, but an accident caused by the superposition of three layers of fractures

The first layer: The surface phenomenon is that unauthorized calls or dangerous actions are allowed

Second layer: What really fails is that there is no independent adjudication layer between “proposal” and “execution”

Level 3: The deeper problem is that organizations regard governance as a post-cost, rather than a pre-requisite capability.

What this accident really taught us is not that “the model is unreliable”, but that the system cannot put trust in one place.

If redesigned, how should the defense line be repaired?

Conclusion: The reason why safety accidents happen repeatedly is not because everyone is unaware of the danger, but because the system defaults to believing that “it won’t be my turn this time.”

References and Acknowledgments

You are reading: OpenClaw in-depth interpretation

Current series chapters

Continue along this topic path

Go deeper into this topic

Subscribe to updates

Comments and discussion

Introduction: At 2:17 in the morning, the message in the duty group made everyone wake up instantly

The problem is not the superficial fault, but the overall outsourcing of execution rights to automated links.

Why didn’t I see it in advance?

Root cause dismantling: This is not a loophole, but an accident caused by the superposition of three layers of fractures

The first layer: The surface phenomenon is that unauthorized calls or dangerous actions are allowed

Second layer: What really fails is that there is no independent adjudication layer between “proposal” and “execution”

Level 3: The deeper problem is that organizations regard governance as a post-cost, rather than a pre-requisite capability.

What this accident really taught us is not that “the model is unreliable”, but that the system cannot put trust in one place.

If redesigned, how should the defense line be repaired?

Conclusion: The reason why safety accidents happen repeatedly is not because everyone is unaware of the danger, but because the system defaults to believing that “it won’t be my turn this time.”

References and Acknowledgments

You are reading: OpenClaw in-depth interpretation

Current series chapters

Continue along this topic path

Original interpretation: Why is the lightweight Agent solution likely to be closer to production reality than the 'big and comprehensive' solution?

Original interpretation: Treat Notion as the control plane of 18 Agents. The first thing to solve is never 'automation'

Original interpretation: Putting Agent into ESP32, the easiest thing to avoid is not the performance pit, but the boundary illusion.

Continue with this topic

Overview of in-depth interpretation of OpenClaw (10 articles)

Original interpretation: When OpenClaw costs get out of control, the first thing to break is never the unit price, but the judgment framework.

Original interpretation: When the Agent tries to 'take away the password', what is exposed is never just a leak point

Go deeper into this topic

Subscribe to updates

Comments and discussion