Hualin Luan Cloud Native · Quant Trading · AI Engineering
Back to articles

Article

Best Practices for Collaborating with AI: Task Agreement, Dialogue Control and Feedback Closed Loop

The core skill of being a Coding Mentor for AI is not to write longer prompt words, but to design task protocols, control the rhythm of conversations, identify error patterns, and precipitate the collaboration process into verifiable and reusable feedback signals.

Meta

Published

3/30/2026

Category

interpretation

Reading Time

15 min read

Copyright Statement and Disclaimer This article is based on human-computer interaction research such as Chain-of-Thought and ReAct, and is combined with practical experience in prompt engineering for a comprehensive interpretation. The copyright of the original text belongs to the respective authors and research institutions.

Original Nature The task protocol, dialogue control, feedback closed loop and collaboration anti-pattern proposed in this article are the author’s original synthesis based on theoretical research and engineering practice. This article is not a collection of prompt word templates, nor is it a paragraph-by-paragraph translation of an external paper.


Beginning: Why does the same AI have so much different effects when used by different people?

The same AI programming assistant often has very different effects in the hands of different teams and different engineers. The superficial reason is like “can you write prompts?”, but the real difference is usually not in the wording, but in the way of collaboration: some people treat AI as a one-time code generator, while others put AI into a set of collaboration protocols with goals, boundaries, feedback, and acceptance.

If you only give AI a vague task, such as “Write a user authentication system for me”, the model can only complete the missing conditions by itself: authentication method, token strategy, password strategy, error handling, database table, refresh mechanism, and security boundary. It may give a seemingly complete solution, but many key judgments are not made by the team, but are guessed by the model.

A more mature way of collaboration is not to write infinitely long requirements, but to establish a task agreement first: what are the goals, what are the non-goals, what are the boundaries of the technology stack and interfaces, what security and performance constraints cannot be compromised, whether AI will produce solutions first or directly change the code, at which nodes humans will accept acceptance, and how to correct after failure. The clearer the protocol, the less guesswork the model has to do; the more humans can control the pace, the easier the output is to verify.

What really needs attention is not “how to make AI more obedient”, but “how humans manage an AI programming collaboration.” The value of Coding Mentor does not lie in giving universal prompt words, but in organizing a conversation into a controllable engineering process: first clarify, then generate; first verify the idea, then expand the implementation; first record the error, and then enter the next round of improvement.

AI Programming Collaboration Agreement

Fix the task agreement first, then start the conversation

Many collaboration failures occur not during the code generation phase, but before the first round of input. Unclear task goals, incomplete context, missing acceptance criteria, and blurred modification boundaries will turn subsequent conversations into remediation.

A qualified AI programming collaboration agreement contains at least six types of information.

Agreement elementsquestions to answerWhat happens if it’s missing
mission objectivesWhat problem is this collaboration going to solve, and how will success be judged after completion?AI may have solved an adjacent but wrong problem
non-targetWhich things should not be done this round, and which areas should not be touched?Output is inflated and changes are out of bounds
contextWhere are the relevant code, interfaces, constraints, error logs, and business termsAI guesses project structure based on common experience
Division of rolesIs the AI ​​responsible for analysis, drafting, implementation, testing or review?Humans cannot tell when to intervene
quality standardsHow to accept function, performance, security, maintainability, and test coverageThe output appears to be usable, but cannot be evaluated stably
feedback rulesHow to point out errors, how to verify repairs, and which conclusions need to be retainedMultiple rounds of dialogue cannot accumulate experience

These six types of information do not necessarily have to be written in a long prompt at once. For small tasks, you can explain it clearly in a few sentences; for cross-module tasks, the agreement should be written in a more stable task card or issue description. The focus is not on form, but on making AI output have boundaries and allowing human feedback to have a basis.

The agreement also distinguishes between “variable conditions” and “immutable constraints”. Variable conditions allow AI to make suggestions, such as whether to choose a caching strategy, whether to split functions, and whether to add auxiliary tests; immutable constraints must be made clear by humans, such as not modifying public APIs, not introducing new dependencies, not lowering permission verification, and not bypassing existing tests.

When the agreement is clearly written, the goal of the first round of conversations should not be to get the complete code right away, but to confirm that the model understands the task correctly. For complex tasks, asking AI to recite goals, list assumptions, identify risks, and give plans is more stable than asking it to write code directly. This can prevent errors in the design stage instead of rework after the implementation is completed.

Three dialogue modes: not skill classification, but risk control methods

“Guidance, confrontation, and collaboration” can easily be understood as three chat techniques. To be more precise, they are three ways of risk control.

Guided conversations are used for clarification and learning. It is suitable for scenarios where the requirements are incomplete, the problem structure is unclear, and the model easily jumps to implementation details. Through continuous questioning, humans allow AI to expose assumptions, fill in boundaries, and explain trade-offs. The goal is not to get the AI ​​right the first time, but to make the task gradually understandable.

Adversarial dialogue is used for verification and stress testing. It is suitable for scenarios where the model has given a solution, but the solution may ignore boundaries, performance, safety or abnormal paths. Humans do not refute for the sake of refutation, but test solutions with extreme inputs, resource constraints, concurrency conditions, failure scenarios, and counterexamples. Its goal is to make hidden risks explicit.

Collaborative conversations are used for delivery and integration. It is suitable for scenarios where there is already a task agreement and basic plan, and it is necessary to divide the work between implementation, testing, review, and documentation. Humans handle architecture and acceptance, and AI handles drafting, partial implementation, test recommendations, and risk checks. Its goal is to increase delivery efficiency without giving up human control.

Risk control in three dialogue modes

The three modes are not mutually exclusive. A real task usually goes through a mode switch: first using guided clarification of requirements, then using collaborative generation of solutions and partial implementation, then using adversarial stress testing to test boundaries, and finally returning to collaborative supplementary testing and documentation.

The key is to know where the current conversation is. If you enter collaborative implementation before the requirements are clear, AI will quickly generate a lot of erroneous details; if you enter confrontational questioning before the plan is formed, the conversation will turn into empty arguments; if you only ask guided questions after reaching the acceptance stage, the task will be slow to converge.

Feedback Design: From “Feeling Bad” to Modifiable Signals

The quality of AI programming collaboration depends largely on the quality of human feedback. Fuzzy feedback can only bring about random corrections, while structured feedback allows the model to know what to change, why to change it, and how to prove that the change is correct.

Invalid feedback usually has three characteristics: it only expresses satisfaction without pointing out evidence; it only says “rewrite” without explaining which constraint was violated; it only asks for “better” without defining better standards. Such feedback causes the model to randomly walk between style, structure, and implementation.

Effective feedback should contain four levels.

feedback levelcontenteffect
factual evidenceWhich piece of output, which test, which log, or which requirement was violated?Avoid subjective arguments
Error typeIs it an interface error, boundary omission, performance issue, security risk, or modification out of bounds?Make errors classifiable
correct directionDo you need to add constraints, change the implementation, add tests, reduce the scope or redesign?Make the next round executable
Acceptance methodWhat tests, rubrics or manual checks will be used to prove that the correction is effective?Avoid repeated rework

For example, “This is not robust enough” is not valid feedback; “When the input list is empty, the current implementation will access the first element, violating the empty input contract in the question; please add an empty input branch and add a new unit test covering the empty list” is valid feedback. The former expresses feelings, while the latter provides evidence, error types, corrective actions, and acceptance criteria.

Feedback also needs to be graded. Not all issues are equally important. Security vulnerabilities, data corruption, permission bypasses, and broken public APIs are must-fixes; naming, comments, and local structures are generally quality improvements; and style preferences should not block tasks if they are not supported by team norms. Graded feedback can reduce conversational noise and focus both AI and humans on high-stakes issues.

Dialogue control: making multi-round collaboration convergent

The risk of multiple rounds of dialogue is divergence. AI will gradually lose early constraints as context accumulates, and humans will easily deviate from the original goal in new suggestions. Coding Mentor needs to proactively control conversational convergence.

First, the stages must be clear. Requirements clarification, solution design, interface confirmation, implementation, testing, review, documentation, the output of each stage is different. Do not ask for complete code during the requirements clarification phase, and do not revisit business goals during the code review phase unless the goals themselves are found to be wrong.

Second, checkpoints must be clear. Long tasks must set up at least a few synchronization points: no code will be written before the plan is confirmed, no implementation will be implemented before the interface is confirmed, no documentation will be added before the core logic is passed, and no repairs will be merged until test failures are explained. The clearer the checkpoints are, the better it will be to prevent the model from continuously generating large amounts of unusable content.

Third, context needs to be compressed. In long conversations, the AI ​​needs to periodically summarize the current agreements: goals, non-goals, interfaces, risks, completed, to be completed, and open issues. This summary is not for looks, but to prevent subsequent rounds from overwriting early constraints.

Fourth, acceptance must be preceded. Each round of output should know how to judge quality next. The solution must be checked for constraints, the code must be checked for tests, the tests must be checked for failure patterns, and the documentation must be checked for reader tasks. Multiple rounds of dialogue without acceptance criteria, just a pile of content.

From a collaboration to Mentor signals

Here is also the key to a collaborative approach moving towards case studies and data closure: the collaborative process itself can generate Mentor signals. What content is worth retaining in an AI conversation does not depend on whether the language is fluent, but whether it can help subsequent evaluation, training or governance.

Signals worthy of precipitation include: immutable constraints in the task protocol, erroneous assumptions of AI, evidence of manual feedback, differences before and after repairs, test failure and pass records, final acceptance conclusions, and which data cannot enter training or evaluation.

What is not worth settling on are temporary words, one-time emotional evaluations, subjective experiences that cannot be reproduced, code snippets without context, and original content that already contains sensitive information or license risks.

Collaborative feedback closed loop

This is why “prompt word templates” should not be the centerpiece of your article. Templates expire, collaboration agreements do not. A model that requires very explicit formatting hints today may no longer need it tomorrow because of tool calls or context mechanism upgrades; but task goals, immutable constraints, error evidence, rubrics, and acceptance mechanisms are still in effect.

What is truly reusable is human judgment. Determine which requirements must be clarified, which boundaries must be stress tested, which outputs must be rejected, which feedback can be entered into the error type library, and which session records can be entered into subsequent data closed loops. These judgments are Coding Mentor abilities.

A collaborative path for an engineering task

You can use an HTTP client that supports current limiting as an example, but the point is not to show ten rounds of prompts, but to see how a collaboration is organized.

The first phase is mission agreement. The team first made it clear: it must support QPS current limit, concurrency limit, circuit breaker and retry; the technology stack is Python and httpx; no heavy dependencies are introduced; asynchronous calls, exception classification, metrics and tests need to be covered. The acceptance criteria at this stage is that the AI ​​can recite the goals, outline the boundaries, and propose a plan to achieve them.

The second stage is program design. AI can propose solutions such as token buckets, leaky buckets, sliding windows, semaphores, and fuse state machines, and humans are responsible for making choices. For example, token bucket allows bursts, leaky bucket is smoother, and sliding window has higher accuracy but higher cost. The key to this stage is not to let the AI ​​choose an advanced-sounding solution, but to let it explain the matching relationship between the solution and the constraints.

The third stage is interface confirmation. First confirm the configuration object, client life cycle, asynchronous context management, error types, metrics output and test entry. The core implementation should not be deployed before the interface is confirmed. Otherwise, the implementation details will be solidified in advance, and it will be very costly to change the interface later.

The fourth stage is partial implementation. The AI ​​could be responsible for current limiters, circuit breakers, retry strategies, or a certain piece of the test draft. Humans need to control the scope of modifications to avoid making it difficult to review after the AI ​​generates all modules at once. After each module is completed, it must return to the task agreement to check: whether the immutable constraints are met and whether new risks are introduced.

The fifth stage is adversarial verification. Use scenarios such as high concurrency, failure storm, zero configuration, timeout, task cancellation, service recovery, and repeated exceptions for stress testing. This stage is not about letting the AI ​​continue to write more code, but to verify that its previous implementation actually meets the collaboration agreement.

The sixth stage is the precipitation signal. In the end, what is left is not only the code, but also which boundaries must be tested, which error types are most common, which feedback can make the model stable and corrected, which tasks are suitable for AI to draft first, and which parts must be led by humans. These contents will enter the subsequent case, evaluation and data closed loop.

Common anti-patterns

The first anti-pattern is using AI as a search box. Just throw in a task name, wait for the complete answer, and then feel satisfied or dissatisfied depending on how you feel. The problem with this is that humans do not provide the constraints and retain the right of acceptance.

The second anti-pattern is using a longer prompt to mask unclear judgments. Long prompt does not equal good deal. If goals, non-goals, constraints and acceptance are still confused, the longer it is written, the easier it is for the model to miss the point.

The third anti-pattern is to write feedback as an emotional evaluation. “Not good enough”, “too wordy” and “more professional” cannot provide stable guidance and correction. Feedback must be connected to evidence, error types, and acceptance criteria.

The fourth anti-pattern is to let multiple rounds of conversations diverge indefinitely. In each round, new requirements are added, solutions are changed, and the scope is expanded. The final result is a bunch of seemingly complete but unverifiable content. Good collaboration requires stages, checkpoints, and stopping conditions.

The fifth anti-pattern is to only precipitate prompts but not judgments. The team saved a bunch of templates, but not error types, rubrics, test evidence, and acceptance conclusions. Templates will soon become invalid after being migrated to a new model, and assets will be judged to be valid for a long time.

Conclusion: Collaboration ability is the entrance to Mentor’s ability

Collaborating with AI is not a metaphysics or a simple prompt word trick. It is a set of engineering control methods: using task protocols to reduce model guessing, using dialogue patterns to control risks, using structured feedback to drive corrections, using acceptance mechanisms to prevent output drift, and using collaborative recording to precipitate Mentor signals.

When this approach stabilizes, the team is no longer just “using the AI” but is training itself to calibrate the AI. Humans don’t need to write all the code themselves every time, but they must determine goals, boundaries, risks, acceptance, and feedback. This role is Coding Mentor.

The next article will go into case studies: how to transform these collaborative methods into evaluable, trainable, and reusable engineering assets in four scenarios: model selection, feedback protocol, code review, and programming education.


References and Acknowledgments

  • Chain-of-Thought Prompting — Wei et al., Google Research
  • ReAct — Yao et al., Princeton

Series context

You are reading: AI Coding Mentor Series

This is article 5 of 9. Reading progress is stored only in this browser so the full series page can resume from the right entry.

View full series →

Series Path

Current series chapters

Chapter clicks store reading progress only in this browser so the series page can resume from the right entry.

9 chapters
  1. Part 1 Previous in path Why do you need to be a coding mentor for AI? When AI programming assistants become standard equipment, the real competitiveness is no longer whether they can use AI, but whether they can judge, calibrate and constrain the engineering output of AI. This article starts from trust gaps, feedback protocols, evaluation standards and closed-loop capabilities to establish the core framework of "Humans as Coding Mentors".
  2. Part 2 Previous in path Panorama of AI programming ability evaluation: from HumanEval to SWE-bench, the evolution and selection of benchmarks Public benchmarks are not a decoration for model rankings, but a measurement tool for understanding the boundaries of AI programming capabilities. This article starts from benchmarks such as HumanEval, APPS, CodeContests, SWE-bench, LiveCodeBench and Aider, and explains how to read the rankings, how to choose benchmarks, and how to convert public evaluations into the team's own Coding Mentor evaluation system.
  3. Part 3 Previous in path How to design high-quality programming questions: from question surface to evaluation contract High-quality programming questions are not longer prompts, but assessment contracts that can stably expose the boundaries of abilities. This article starts from Bloom level, difficulty calibration, task contract, test design and question bank management to explain how to build a reproducible question system for AI Coding Mentor.
  4. Part 4 Previous in path Four-step approach to AI capability assessment: from one test to continuous system evaluation Serving as a coding mentor for AI is not about doing a model evaluation, but establishing an evaluation operation system that can continuously expose the boundaries of capabilities, record failure evidence, drive special improvements, and support collaborative decision-making.
  5. Part 5 Current Best Practices for Collaborating with AI: Task Agreement, Dialogue Control and Feedback Closed Loop The core skill of being a Coding Mentor for AI is not to write longer prompt words, but to design task protocols, control the rhythm of conversations, identify error patterns, and precipitate the collaboration process into verifiable and reusable feedback signals.
  6. Part 6 Practical cases: feedback protocol, evaluation closed loop, code review and programming education data Case studies should not stop at “how to use AI tools better”. This article uses four engineering scenarios: model selection evaluation, feedback protocol design, code review signal precipitation, and programming education data closed loop to explain how humans can transform the AI ​​collaboration process into evaluable, trainable, and reusable mentor signals.
  7. Part 7 From delivery to training: How to turn AI programming collaboration into a Coding Mentor data closed loop The real organizational value of AI programming assistants is not just to increase delivery speed, but to precipitate trainable, evaluable, and reusable mentor signals in every requirement disassembly, code generation, review and revision, test verification, and online review. This article reconstructs the closed-loop framework of AI training, AI-assisted product engineering delivery, high-quality SFT data precipitation, and model evaluation.
  8. Part 8 From engineering practice to training data: a systematic method for automatically generating SFT data in AI engineering Following the data closed loop in Part 7, this article focuses on how to process the screened engineering assets into high-quality SFT samples and connect them to a manageable, evaluable, and iterable training pipeline.
  9. Part 9 Future Outlook: Evolutionary Trends and Long-term Thinking of AI Programming Assessment As the final article in the series, this article reconstructs the future route of AI Coding Mentor from the perspective of engineering decision-making: how evaluation objects evolve, how organizational capabilities are layered, and how governance boundaries are advanced.

Reading path

Continue along this topic path

Follow the recommended order for AI programming assessment instead of jumping through random articles in the same topic.

View full topic path →

Next step

Go deeper into this topic

If this article is useful, continue from the topic page or subscribe to follow later updates.

Return to topic Subscribe via RSS

RSS Subscribe

Subscribe to updates

Follow new articles in an RSS reader without checking the site manually.

Recommended readers include Follow , Feedly or Inoreader and other RSS readers.

Comments and discussion

Sign in with GitHub to join the discussion. Comments are synced to GitHub Discussions

Loading comments...