Article
Original interpretation: Treat Notion as the control plane of 18 Agents. The first thing to solve is never 'automation'
This article does not discuss whether the console interface is good-looking or not, but discusses a more fundamental production issue: when you connect 18 OpenClaw Agents to the Notion control plane, is the system amplifying team productivity, or is it amplifying scheduling noise and status chaos?
Copyright Statement and Disclaimer This article is an original interpretation based on “I Turned Notion Into a Control Plane for my 18 OpenClaw AI Agents”. The copyright of the original text belongs to the original author and source. This article does not constitute an official translation and is only used for learning, research and discussion of opinions.
Original reference I Turned Notion Into a Control Plane for my 18 OpenClaw AI Agents — AWS Heroes: https://dev.to/aws-heroes/i-turned-notion-into-a-control-plane-for-my-18-openclaw-ai-agents-5624
Beginning: Most multi-agent systems die unspectacularly, they are slowly brought down by scheduling noise
The first time Wang Fang felt something was wrong was on an ordinary Wednesday afternoon. Her team has just connected the twelfth Agent to the Notion control plane. In theory, this should be a milestone - moving from single-point automation to multi-role collaboration. But when she looked at the rows of pulsing status tags on Notion’s board, she felt only uneasiness.
“Processing”, “Waiting”, “Failed - to be retried”, “Exception - requires manual confirmation”, “Timed out”, “Partially completed” - these statuses are like a group of interns who are polite to each other but don’t really understand the boundaries. They are busy but don’t know what the other is doing. A data cleaning task failed, but it blocked three downstream analysis agents; a report generation task timed out, but it occupied the database connection pool and was not released; a task that should have been automatically retried failed because the status label was mistakenly marked as “requires manual confirmation” and has been lying in the queue for six hours without anyone taking care of it.
This is not a breakdown, this is chronic suffocation. It’s not that the system isn’t working, it’s that it’s working too noisily. Each Agent is “completing tasks”, but the relationship between tasks, responsibilities after failure, and the timing of manual intervention are all unclear.
That night, Wang Fang sent a message in the team group: “I think we need to stop and redefine what problems the control plane needs to solve.” After the message was sent, the group was silent for a long time. Finally, the technical leader replied: “You are right, but we can’t stop now, we have to deliver tomorrow.”
This sentence illustrates a common dilemma: teams are trapped in the inertia of “more automation”, but forget to ask a more fundamental question - when things don’t go well, who knows what is happening now, who has the authority to take over, and how can the system stop?
When making Notion or any other tool into a multi-agent control plane, the first priority is never “to automatically flow more tasks”, but to answer this more realistic question first.
First determine which stage you are at
If you are building a multi-agent control plane, I suggest you first determine which stage you are at, rather than pursuing a “complete platform” right away. When Wang Fang later reviewed the situation, she realized that her team blindly pursued goals beyond their current capabilities without a clear understanding of the stage.
Stage 1: Can run, but mainly depends on the brain
The typical characteristics of this stage are: each Agent can accomplish something, but the system’s operating order still exists in the minds of team members. Who should be contacted if someone fails, who should be reassigned if a timeout occurs, and what status means manual takeover is required. These all rely on tacit understanding rather than the system.
Wang Fang’s team was in this state when it connected to the first five Agents. Everyone knows roughly: if data cleaning fails, contact Xiao Li, if report generation times out, contact Xiao Wang, if API call exception occurs, contact Xiao Zhang. This kind of “human brain scheduling” is even efficient on a small scale - just shout if there is a problem. But when the number of agents exceeds ten, the human brain begins to be overloaded. Who remembers the dependencies between twelve Agents? Who can accurately determine who should take over a cross-agent failure at two o’clock in the morning?
The biggest risk for a stage one team is to mistakenly believe that “can run” means “controllable”. The system does accomplish tasks, but these accomplishments are achieved through overdraft at the organizational level. Members need to be online at all times, alert at all times, and remember various implicit rules at all times. This model is unsustainable and will collapse if the scale is slightly larger.
Phase 2: The state flow begins to form, but exception handling is still loose
Wang Fang’s team entered this stage when they connected to the sixth to tenth Agent. They began to create task lists, status columns, trigger rules, and even automatic write-back in Notion. On the surface, the system looks more “automated” - task status is automatically updated, completion is automatically notified, and failure is automatically marked.
But exception handling is still loose. The system can tell you “what’s wrong”, but it can’t tell you “who must act next”. A task fails, it is marked as “failed”, but there is no clear escalation rule: is it automatically retried? Is it transferred to manual work? Is it to terminate the entire link? Or continue to wait? These decisions still rely on the on-the-spot judgment of the personnel on duty.
More troublingly, Stage 2 systems often create the illusion that “governance is complete.” We have task tracking, we have status panels, we have automation rules – it seems like it’s all there. But in fact, the system just changes the chaos from “totally invisible” to “visible but no one responsible”. The failed state is marked, but no one is clearly responsible for handling it; the timeout task is highlighted, but the system does not know what to do next.
Phase 3: The control plane begins to assume real governance responsibilities
Only at this stage is your control plane not just an observation board, but a scheduling system. It not only records tasks, but also defines responsibilities; not only displays status, but also restricts flow; not only supports collaboration, but also provides back pressure and braking.
Wang Fang later spent a lot of time thinking about what phase three should look like. She believes that the core difference is that the stage three system also incorporates “failure” into the design. Rather than assuming that all tasks will succeed, it assumes that a certain percentage will fail and designs clear paths for handling those failures. Who is responsible for what type of failure, how long the processing time is, what the escalation chain is, and under what circumstances a circuit breaker must be broken - these are clearly defined.
If you are still in the first two stages but want to directly pursue the elegant collaboration of eighteen Agents, you will most likely turn the system into a high-resolution chaos amplifier. This is the case with Wang Fang’s team - they forced in more Agents at the end of Phase 2, only to present the original loose chaos in a clearer way.
The most important thing to add at this stage is not more Agents, but the status and responsibility model.
When many people look at the control plane, their first reaction is to add more capabilities: add an audit agent, a routing agent, and a remediation agent. But in the field, this is often wrong.
When Wang Fang was most anxious, she also thought about this plan: Since the system is so messy, should we add a “scheduling agent” to coordinate specifically? But she soon realized that there was something wrong with this idea. The reason why the system is messy is not that there are not enough roles, but that the status meaning is not hard enough. As long as there are no clear responsibilities for the states of “Processing”, “Waiting”, “Failed” and “Taken Over”, adding new Agents will only add more fuzzy actions.
What really needs to be done first are two things: state machine and responsibility model.
State machines mean that each state has a clear definition: what it means, where it is allowed to flow from to where, and how long it is stuck before it needs to be upgraded. In Wang Fang’s initial implementation, the “processing” status can simultaneously indicate “executing”, “waiting for dependencies”, “temporary suspension”, “possible deadlock” and other situations. This ambiguity makes it impossible for the system to make accurate judgments, and also makes manual intervention difficult - when you see a task in the “processing” status, you don’t know whether it is running normally or has a problem.
Building a state machine requires the team to sit down and clearly define the meaning and flow rules of each state. For example:
- “Pending”: The task has been created and is waiting for resource allocation. If it is not allocated for more than 5 minutes, it will be upgraded to “insufficient resources” and operation and maintenance will be notified.
- “Processing”: The task has been assigned to the Agent and is being executed. If it exceeds 150% of the expected time, it will be marked as “Possible Timeout” and trigger a check.
- “Waiting for dependencies”: The task itself is normal, but the dependent predecessor tasks have not been completed. If the predecessor tasks fail, the task is automatically marked as “Dependency Failure”
- “Failed”: There was an error in task execution and it has been stopped. The next step needs to be decided based on the error type (automatic retry/transfer to manual/terminate link)
- “Completed”: The task was successfully completed and the results have been written to the specified location
Each state must have clear entry conditions, exit conditions, timeout processing and upgrade paths.
The responsibility model means who must act behind each state, when to act, and who will take over if the action fails. In Wang Fang’s original system, the “failed” status was just marked, but it was not clear who was responsible for handling it. Is it the person who created the task? Are you an Agent maintainer? The engineer on duty? Or should the system handle it automatically? Without clear responsibilities, failure will hang around until someone stumbles upon it.
Building a responsibility model requires answering:
- Who is the first person responsible for each failure type?
- How long does the responsible person have to respond?
- If the first responsible person does not respond, who will be escalated to?
- What is the processing time after the upgrade?
- What should the system do (default retry/terminate/leave status quo) if all upgrade chains are unresponsive?
Without these two layers, the beauty of the control plane means nothing. No matter how much it looks like a control tower, it is essentially just a more advanced to-do list.
A more realistic progression sequence
Wang Fang and her team later developed a new progression sequence that was less sexy but more likely to succeed.
Step 1: Define the “failure state” clearly
Don’t define the ideal collaboration flow first, define the failure flow first. Which failures allow automatic retry, which failures must be taken over manually, and which failures directly terminate the link. Wang Fang’s team spent a week listing all possible failure scenarios in the system, and then defining processing rules for each scenario.
They found that about 40% of failures can be retried safely and automatically (such as network timeout, temporary resource shortage); 30% of failures must be handled manually (such as business logic errors, data format abnormalities); 20% of failures should directly terminate the link (such as complete failure of pre-dependencies, exhaustion of resource quotas); and 10% of failures require more complex judgments (such as partial success and partial failure).
After writing these rules into the system, the pressure on the engineers on duty was immediately reduced a lot. The system began to be able to automatically handle some failures, and only escalated when human judgment was really needed.
Step 2: Design manual takeover as a system capability rather than an emergency posture
What the front-line team fears most is not having to manually take over, but not knowing how to take over. The control plane must provide a fixed entrance for takeover: who takes over, where is the handover context, what is the current stuck point, and what remedial actions are allowed.
Wang Fang and others designed a “takeover work order” system. When a task requires manual intervention, the system automatically creates a work order, which includes:
- Basic information of the task (type, creation time, expected completion time)
- Current status and historical trajectory
- Completed steps and results
- Failure reason and error log
- Recommended solutions (automatically recommended based on failure type)
- Related upstream and downstream tasks
The work order will be assigned to the corresponding responsible person, and the responsible person will complete the entire process of taking over, processing, and closing in the system. All operations are recorded for easy review later.
This design changes manual takeover from an “emergency firefighting” to a “standard process.” Engineers on duty no longer have to guess what happened at two in the morning looking at vague status labels; they can clearly see the context and make informed judgments.
Step Three: Do expansion and role segmentation
Only after the first two steps are established, adding new Agents will amplify productivity rather than amplify uncertainty. Wang Fang’s team only started to increase the number of Agents again after improving the state machine and responsibility model. This time, every time an Agent is added, they will answer first:
- What is the failure mode of this Agent?
- How are these failures detected and handled?
- Who is responsible for this Agent’s failure?
- What are its dependencies on existing Agents? How to deal with dependency failure?
Otherwise, you will find that the more roles there are, the longer the meeting will be. In the end, everyone will rely more on one person who knows the overall situation to make on-the-spot arrangements.
What things should not be done in a hurry right now?
If you haven’t hardened your status and responsibilities yet, Wang Fang suggests a few things to do before you rush.
First, don’t rush to pursue “fully automatic closed loop”
A lot of teams die here first because they want to skip the semi-autonomous stage and go straight to something that looks like an autonomous system. As a result, the system makes errors autonomous. Wang Fang has seen a team that designed complex retry and compensation logic in pursuit of “full automation.” As a result, an underlying data error was automatically retried by the system more than a dozen times, generating a large amount of dirty data. The final cleanup cost far exceeded the cost of manual processing.
In the semi-automatic stage, the default behavior of the system should be “upgrade to manual when encountering uncertain situations” rather than “guess one and continue execution”. Only add automatic rules to a certain failure mode when you are confident enough that it will be handled automatically.
Second, don’t rush into fine-grained Agent role split
The more detailed the roles are, the more collaboration interfaces there are, and the higher the cost of status interpretation. Before there is a mature state machine, fine-graining will only create scheduling debt. Wang Fang initially split “data cleaning” into four roles: “format checking agent”, “missing value processing agent”, “outlier detection agent” and “data conversion agent”. As a result, the transfer and coordination of tasks between them occupied a large amount of system resources, and the real business logic was drowned in the scheduling noise.
Later, she merged these back into a “data cleaning agent”, which used traditional functions to handle different steps internally, and the system complexity was immediately reduced a lot.
Third, don’t rush to use the control plane as a management visualization screen
The value of a large screen is based on the premise that the system rules are hard enough, otherwise you are just visualizing noise. Wang Fang’s team once made great efforts to create a beautiful real-time dashboard that displays the distribution, success rate, and processing time of all tasks. But when the state machine is imperfect, these data are suspicious - how many of the “processing” tasks are actually deadlocked? How many of the “completed” tasks are actually partially completed? The big screen makes everything look great, until you dig deeper and discover the issues behind the numbers.
First make the rules hard, and then let the visualization shine. The order cannot be reversed.
If resources are limited, what should a minimal executable version look like?
If resources are limited, Wang Fang would suggest starting with a very restrained MVP.
The first is to track only one category of high-value tasks. Don’t try to include all tasks in the control plane. Choose the most core and representative ones first. For example, if you are working on a data analysis pipeline, start by tracking only the “report generation” category of tasks, master this category thoroughly, and then expand to other types.
The second is to only access a small number of key Agents. You may only need two or three Agents initially, but they should cover the entire value chain. For example: data collection agent, data processing agent, report generation agent. These three Agents collaborate to complete a complete business value, which is more valuable for learning than ten isolated Agents each doing scattered tasks.
Then the number of states is controlled to a minimum but has clear responsibility implications. Wang Fang recommends defining up to five states at the beginning: pending, processing, failed, requiring manual intervention, and completed. Each should have clear definitions and handling rules.
Furthermore, each state is bound to upgrade or takeover rules. Even for “pending” status, timeout and upgrade actions must be defined. For example: if the process is pending for more than 10 minutes, it will automatically be marked as “Insufficient Resources” and the operation and maintenance personnel will be notified.
Finally, all manual takeovers leave a structured rationale. When manual intervention is required to handle a failure, it is required to record: failure cause classification, actions taken, and estimated resolution time. These records will become valuable data for subsequent optimization of automatic rules.
This type of minimal system may not look cool, but it has a huge advantage: it exposes your real governance shortcomings very early on. Is the status definition too vague? Is the upgrade chain too long? Or does everyone simply not have a unified understanding of “whose fault is failure?” The sooner you see this, the cheaper it will be to expand later.
Conclusion: The value of the control plane is not to make the system appear smarter, but to allow someone to catch the system when it is out of order.
It took Wang Fang six months to adjust the team’s multi-agent system from chaos to controllability. Looking back on this process, her biggest realization is: The easiest thing about the multi-agent system to deceive the team is that it looks like “the future is already here.” Tasks are jumping, states are flowing, and roles are collaborating. Everything is like a more advanced operating system.
But what really determines whether it is a production system is never these superficial dynamics, but whether the system can still clarify responsibilities, shut down the noise, and make manual takeover a non-embarrassing task when collaboration is out of order.
Therefore, if you are planning to make Notion or other tools into the control plane of OpenClaw, Wang Fang’s advice is not sexy: don’t think about how beautiful, automatic, or complicated it is. First, write down the failure status clearly, make the takeover entrance hard, and shorten the chain of responsibility.
Because the maturity of the control plane is not reflected in how many Agents it can dispatch, but in whether it can prevent the organization from relying on one person who best understands the overall situation to stay up late to put out fires.
Last week, Wang Fang’s system experienced its first real test: an upstream data source suddenly changed its format at three in the morning, causing widespread failure of the entire data processing link. But this time, the system automatically marked the failure type according to the preset rules, created a work order, notified the responsible person, and prevented automatic retries before manual intervention. The engineer on duty completed the repair within twenty minutes, with no data loss or service interruption.
This result was unimaginable six months ago. The system might then be in a state of confusion trying to retry, generating dirty data, and then sitting in a completely unpredictable state waiting for human discovery. The change is not that there are more Agents, but that the control plane has begun to truly assume governance responsibilities.
This is the true meaning of “the future is here” in Wang Fang’s eyes.
References and Acknowledgments
- Original text: I Turned Notion Into a Control Plane for my 18 OpenClaw AI Agents — AWS Heroes: https://dev.to/aws-heroes/i-turned-notion-into-a-control-plane-for-my-18-openclaw-ai-agents-5624
Series context
You are reading: OpenClaw in-depth interpretation
This is article 3 of 10. Reading progress is stored only in this browser so the full series page can resume from the right entry.
Series Path
Current series chapters
Chapter clicks store reading progress only in this browser so the series page can resume from the right entry.
- Original interpretation: Why do OpenClaw security incidents always happen after 'the risk is already known'? Why do OpenClaw security incidents always happen after 'the risk is already known'? This article does not blame the model for being out of control, but instead asks about the design flaws of execution rights: when the system puts execution rights, audit rights, and rollback rights on the same link, how does organizational blindness amplify controllable deviations into accidents step by step?
- Original interpretation: Why is the lightweight Agent solution likely to be closer to production reality than the 'big and comprehensive' solution? This is not a chicken soup article praising 'lightweight', but an article against engineering illusion: many OpenClaw Agent stacks that appear to be stronger only front-load complexity into demonstration capabilities, but rearrange the cost into production failures and early morning duty costs.
- Original interpretation: Treat Notion as the control plane of 18 Agents. The first thing to solve is never 'automation' This article does not discuss whether the console interface is good-looking or not, but discusses a more fundamental production issue: when you connect 18 OpenClaw Agents to the Notion control plane, is the system amplifying team productivity, or is it amplifying scheduling noise and status chaos?
- Original interpretation: Putting Agent into ESP32, the easiest thing to avoid is not the performance pit, but the boundary illusion. This article does not describe the ESP32 Edge Agent as a cool technology trial, but dismantles the four most common misunderstandings: running the board does not mean the system is usable, being offline is not just a network problem, and local success does not mean on-site maintainability. Edge deployments require new engineering assumptions.
- Original interpretation: When OpenClaw costs get out of control, the first thing to break is never the unit price, but the judgment framework. If OpenClaw API fee control only focuses on the unit price of the model, it will usually turn into an illusion of cheapness in the end: the book will look good in the short term, but structural waste will still quietly accumulate in the background. This paper reconstructs a cost framework including budget boundaries, task layering and entry routing.
- Original interpretation: When the Agent tries to 'take away the password', what is exposed is never just a leak point Rewrite 'Agent knows your password' into a more uncomfortable accident review: the real failure is not a certain encryption action, but the team's use of credentials as a default capability that is always online, constantly visible, and constantly callable. This article discusses runtime governance gaps.
- Original interpretation: Why what OpenClaw really lacks is not more prompt words, but a tool firewall that dares to say 'no' Many teams pin OpenClaw safety on prompt constraints, but what really determines the upper limit of accidents is not what the model thinks, but whether the system allows the model's ideas to be directly turned into tool execution. This article proposes a four-layer governance framework of 'intention-adjudication-execution-audit'.
- Original interpretation: It is not difficult to deploy OpenClaw to AWS. The difficulty is not to mistake 'repeatable deployment' for 'already safe' Dispel a very common but dangerous illusion: when teams say 'we've reinforced it with Terraform', they often just complete the starting point, but mistakenly believe that they are at the end. IaC can make deployment consistent, but it cannot automatically make OpenClaw systems continuously secure.
- Original interpretation: The real priority for Agent credential security is not 'where to put it', but 'who can touch it and when' Refuting an all-too-common misconception: OpenClaw credential security is complete as long as key escrow, encrypted storage, and rotation are done. The reality is just the opposite. The most likely place for trouble often occurs at runtime - not 'where' it is placed, but 'who can touch it and when'.
- Original interpretation: Looking at the three types of OpenClaw security articles together, it is not the vulnerabilities that are really revealed, but the lag in governance. When the three topics of prompt word injection, credential leakage, and tool firewalls are put on the same table, you will find that they point to the same core contradiction: OpenClaw's capabilities are expanding faster than execution rights management. This article synthesizes the common conclusions of three security articles.
Reading path
Continue along this topic path
Follow the recommended order for OpenClaw security in-depth interpretation instead of jumping through random articles in the same topic.
Next step
Go deeper into this topic
If this article is useful, continue from the topic page or subscribe to follow later updates.
Loading comments...
Comments and discussion
Sign in with GitHub to join the discussion. Comments are synced to GitHub Discussions