Article
Original interpretation: MCP protocol - the USB-C moment of the Agent ecosystem
An in-depth analysis of the essence of the Model Context Protocol protocol design and why standardization is the key to the prosperity of the Agent ecosystem
📋 Copyright Statement and Disclaimer
This article is an original analysis article based on the author’s personal practical experience, and is inspired by the Kaggle white paper “Agent Tools & Interoperability with MCP”.
Opinion Attribution Statement:
- All specific cases, practical data, and pitfall experiences in the article come from the author’s personal project experience
- The core methodology and framework are reconstructed into the author’s original thinking
- Only refer to the academic expressions of the white paper for some concept definitions
Original reference:
- Title: “Agent Tools & Interoperability with MCP”
- Link: Read original text
Original Nature: This article is an independently created practice summary article, not a translation or rewriting. The views expressed in this article represent only the author’s personal understanding and may differ from the original author’s position.
Introduction: That Monday morning of integration nightmare
It was a Monday morning in the spring of 2024, and I arrived an hour earlier than usual. Not because of diligence, but because of anxiety.
In the past two weeks, our team has been working on a “simple” task: enabling the newly developed Agent to call the company’s internal system - query the user database, send notification emails, and create work order records. According to initial estimates, this should only take three days: one day to read the documentation, one day to write code, and one day to test.
But two weeks passed and we were stuck.
First system - User database, using PostgreSQL. We wrote the SQL query, but found that the permission models did not match. Agent needs to query as “application”, and our permission system is designed for “user”. After rewriting the permission logic, it was discovered that there was a problem with the connection pool configuration, and the connections would be exhausted when concurrency was high.
Second System - Mail service, using SendGrid. The API call is very simple, but the email template needs to be generated dynamically. We found that the HTML generated by the Agent was often malformed, and the email client rendered it into a mess. Later, it was discovered that the API flow is limited when sending in batches, and retry and backoff logic need to be implemented.
Third system - Work order system, using internal REST API. The documentation is incomplete, and some parameters can only be known by looking at the source code. To make matters worse, the API version is being upgraded, and the endpoint we are connecting to is marked as “deprecated”, but the documentation for the new version has not yet been completed.
By the end of the third week, we had written thousands of lines of “adaptation code”: database connection pool management, email template rendering, API error handling, current limit control, retry logic… These codes have nothing to do with the “intelligence” of the Agent, they are just tedious plumbing.
Ironically, another team heard about our “integration experience” and wanted to reuse our code. But what they use is not PostgreSQL, but MySQL; not SendGrid, but Mailgun; although the work order system has the same origin, it has different versions. What we call “experience” is almost impossible to reuse.
At that moment I realized: **Agent’s ability lies not in how strong the model is, but in how many external systems it can collaborate with. The cost of integration is often the main reason for project failure. **
So when Anthropic released MCP (Model Context Protocol), I felt a sense of long-lost excitement - this might be the solution we’ve been waiting for.
Chapter 1: Why the Agent World Needs “USB-C”
1.1 The painful reality of tool integration
Before we dive into MCP, let’s take a look at what a world without MCP would look like.
Scenario 1: Database query
Each database has a different connection method: PostgreSQL uses psycopg2, MySQL uses mysql-connector, MongoDB uses pymongo, and Snowflake uses a dedicated SDK. The connection parameters vary: some use URL, some use host + port + database name, and some also require warehouse and schema.
Permission management is even more diverse: some use username and password, some use IAM roles, some use OAuth, and some use certificates.
The agent needs to “know” all these differences, hardcoding the adaptation logic in the code. Every time a database is added, a set of adaptation codes must be added.
Scenario 2: API call
External APIs are called in different ways:
- Authentication method: API Key, OAuth, JWT, HMAC signature
- Parameter passing: JSON body, form data, query string
- Error handling: HTTP status code, JSON error body, custom format
- Current limiting strategy: requests per second, daily quota, concurrency limit
Agent needs to write specific client code for each API. To make matters worse, these APIs will be upgraded, deprecated, and behavior will change, and the adaptation code requires continuous maintenance.
Scenario Three: Documentation and Discovery
Suppose you are an Agent and want to know what tools are currently available. Under traditional architecture, you need:
- Look through the code to find definitions for all tools
- Read the comments or documentation to understand the purpose of the tool
- View the function signature to understand parameters and return values
- Guess the behavior of certain edge cases
The process is manual, non-standard, and error-prone.
1.2 MCP Analogy: Unification of USB-C
The value of MCP can be understood by analogy with USB-C.
Before the emergence of USB-C, electronic devices had various charging interfaces: Apple’s Lightning, Android’s Micro-USB, laptop’s round power supply, and dedicated interfaces for various devices. Traveling requires bringing a bunch of different charging cables and chargers.
USB-C provides a unified standard: an interface that simultaneously supports charging, data transmission, and video output. Device manufacturers only need to support USB-C to be compatible with various accessories. Consumers only need a USB-C cable to charge various devices.
MCP plays a similar role in the Agent world:
- Unified interface: Agents and tools communicate using standardized protocols.
- Self-description: The tool automatically declares its capabilities and the Agent automatically discovers them.
- Plug and Play: MCP compliant tool that can be used by any MCP client
1.3 The value of standardized protocols
The value of MCP is not only at the technical level, but also at the economic level.
For Agent Developers:
- No need to write adaptation code for each tool
- Quick access to a large number of ready-made tools
- Low tool switching cost (switching from PostgreSQL to MySQL does not require rewriting code)
For tool developers:
- No need to write adaptation code for each Agent platform
- Implement once, use everywhere
- Can be automatically discovered and increase exposure
For the ecology as a whole:
- Lower the threshold for integration and promote tool innovation
- Form a network effect: the more Agents support MCP, the more willing tool developers are to implement MCP; the more tools support MCP, the more Agent developers are willing to adopt MCP
- Eventually a standardized Agent tool market will be formed
Chapter 2: Core design of MCP protocol
2.1 Layered architecture of the protocol
MCP adopts a clear layered design, similar to the layering of network protocols.
Transport layer: Defines how messages are transmitted
- stdio: standard input and output, suitable for local process communication
- SSE: Server-Sent Events, suitable for remote services
- HTTP: RESTful style, widely compatible
Protocol layer: Define message format
- Based on JSON-RPC 2.0
- Standard message types: Request, Response, Notification
- Error handling and timeout mechanisms
Application layer: Define semantic content
- Tool declaration: tool description, parameters, return value
- Tool calling: how to call and how to pass parameters
- Capability negotiation: Capability exchange between client and server
The advantages of this layering are: the transport layer can be flexibly replaced (switching from a local process to a remote service does not require changing the application layer code), the protocol layer ensures interoperability, and the application layer defines business semantics.
2.2 Tool life cycle
MCP defines the complete life cycle of the tool: declaration, discovery, invocation, response.
Statement: The tool provider describes the capabilities of the tool through Schema
- Name: Unique tool identifier
- Description: Function description of the tool (for the Agent to “see”)
- Parameters: Schema of input parameters (type, required, constraints, etc.)
- Return value: Schema of the output result
Discovery: After the Agent connects to the MCP server, it automatically obtains the list of available tools.
Call: Agent selects the appropriate tool based on user intent and passes in parameters.
Response: After the tool is executed, structured results are returned
The key to this process is self-description: the tool’s capability description is machine-readable, and the Agent can understand and use it, without the need for manual writing of documents.
2.3 Security model
MCP has designed a complete security mechanism.
Authentication: Supports multiple authentication methods
- API Key: Simple key authentication
- OAuth 2.0: Standard authorization flow
- No certification: local development or trusted environment
Authorization: fine-grained permission control
- Which agents can use which tools
- Which users can use which features
- Which operations require additional confirmation
Sandboxing: Environment isolation for tool execution
- Resource limits: CPU, memory, disk usage upper limit
- Network restrictions: whether external networks can be accessed
- Timeout control: prevent long hangs
The design principle of this security model is: Security by Default. Tool providers can define security policies, which the Agent will automatically comply with when executing.
Chapter 3: From integration dilemma to ecological prosperity
3.1 Integration mode before MCP
Before MCP, there were several modes of Agent tool integration, each with obvious flaws.
Mode 1: Hard-coded integration
The tool API is called directly in the Agent code. This is the most common pattern and also the most vulnerable.
defect:
- Every time a tool is added, the Agent code needs to be modified.
- Changes in tool APIs will disrupt Agent functionality
- The tool’s error handling logic is scattered throughout
Mode 2: Configuration integration
The tool calling method is defined through the configuration file and loaded dynamically when the Agent is running.
defect:
- There is no standard configuration format, and various Agent platforms are incompatible.
- Configuration cannot express complex interaction logic
- Semantic information of tools (such as parameter descriptions) is difficult to convey
Mode 3: Plug-in integration
The tool is provided in the form of a plug-in, and the Agent is called through the plug-in interface.
defect:
- The plug-in interface is different for each Agent platform.
- Plug-in development and maintenance costs are high
- Compatibility between plug-ins is difficult to guarantee
3.2 Paradigm changes brought about by MCP
MCP changes the paradigm of Agent tool integration from “adaptation” to “plug and play”.
Adaptation mode: Agent to adapt tools
- Agent needs to know how to call each tool
- Agents need to handle the special cases of each tool
- Agent needs to maintain tool-related code
Plug and Play Mode: Tools declare their own standard interfaces
- The tool implements the MCP protocol and declares its capabilities
- Agent communicates with any tool through MCP protocol
- The specific implementation of the tool is transparent to the Agent
The core of this transformation is separation of concerns: Agents focus on “what tools to use to solve what problems”, and tools focus on “how to perform tasks efficiently”.
3.3 The flywheel effect of ecological development
MCP has the potential to form a benign ecological flywheel.
Phase 1: Infrastructure
- MCP protocol definition and SDK release
- Early adopters (Anthropic Claude, etc.) support MCP
- Basic tools (file system, database, etc.) implement MCP interface
Phase 2: Tool richness
- More tool developers join to implement MCP interface
- Tool market/warehouse is formed to facilitate discovery
- Rapid growth in tool quality and variety
The third stage: Agent popularization
- Agent developers can easily access a large number of tools
- Rapid expansion of agent capability boundaries
- More scenarios can be solved with Agent
Stage 4: Ecological Prosperity
- Agents and tools create network effects
- Specialized division of labor emerges: some people focus on being agents, others focus on making tools
- Mature business model: tools can be charged and agents can be platform-based
Chapter 4: The relationship between MCP and Function Calling
4.1 Differences in positioning between the two
Many people ask: What is the difference between MCP and OpenAI/Claude’s Function Calling?
Function Calling is the capability layer of LLM:
- Models can generate structured function call requests
- Defined at the model API level
- It is up to the application developer to implement the specific logic of the function
MCP is an application layer protocol:
- Define communication standards between agents and tools
- Cross-model platform compatibility
- Tools can self-declare and agents can automatically discover
The relationship between the two is not competition, but complementarity.
4.2 Collaboration model
Typical collaboration process:
- User input -> Agent understands intent
- Agent queries the MCP server to obtain a list of available tools
- Agent decides that it needs to call the “query weather” tool
- Agent generates call request through Function Calling
- The MCP client converts the request into the MCP protocol and sends it to the tool server
- The tool server executes and returns the results
- The MCP client returns the result to the Agent
- Agent generates final reply
In this process:
- Function Calling is the ability of the model to generate call requests
- MCP is the interoperability protocol of the tool ecosystem
4.3 Migration and coexistence
Agents that have already implemented Function Calling can be smoothly migrated to MCP.
Migration Strategy:
- Retain the capability layer of Function Calling
- Migrate tool implementation to MCP server
- Add MCP client layer to convert Function Calling request to MCP protocol
Coexistence Strategy:
- Core tools are accessed through MCP
- Special tools retain Function Calling direct connection
- Migrate gradually to reduce risk
Chapter 5: Challenges and Responses in MCP Practice
5.1 The Art of Tool Design
Even with MCP, tool design is still an art. Good tool design can make the Agent get twice the result with half the effort, while poor tool design can make the Agent feel at a loss.
Principle 1: Atomicity - the power of single responsibility
Each tool only does one thing. Don’t make “query user orders and send email notification” into one tool, but split it into two tools, “query order” and “send email”, and let the agent decide when to use them together.
Why insist on atomicity?
Composability. Atomic tools are like Lego bricks, which can be combined in different ways to solve different problems. If you make “query orders + send emails” into a tool, then this tool cannot be used when the user only wants to check orders and does not want to send emails. But if you split them into two tools, the Agent can decide whether to use only the first one, only the second one, or both according to the specific situation.
Testability. Atomic tools are easier to test. You can independently test the “query order” function without worrying about interference from email sending; you can also independently test the “send email” function without worrying about database problems. Test coverage is simpler and bug location is faster.
Reusability. Atomic tools can be reused in different scenarios. The “Send Email” tool can not only be used for order notifications, but can also be used in various scenarios such as password reset, marketing push, and system alarms.
Principle 2: Self-descriptive - let the agent truly understand the tool
The name and description of the tool are for the Agent to “see”, not for humans. Describe the function in a way that the Agent can understand.
Common description errors:
Too technical:
- Bad description: “execute_sql”
- Bad description: “Call the database API to get data”
- Good description: “query_database”
- Good description: “Query data records with specified conditions from the database”
Too vague:
- Bad description: “Handling user requests”
- Good description: “Get user details including name, contact details and account status based on user ID”
Includes implementation details:
- Bad description: “Querying the orders table using REST API”
- Good description: “Get the order list of the specified user, support filtering by time range and order status”
A good tool description should answer three questions: What does this tool do? What information does it require? What result does it return?
Principle 3: Idempotence - Guarantee of safe retry
The same input should produce the same result. This way the Agent can retry safely without worrying about side effects.
Why is idempotence particularly important in Agent systems?
The non-deterministic nature of the Agent system means that tool calls may fail, or they may succeed but not return the correct results. Agents need to be able to safely retry failed calls. But if the tool is not idempotent, retrying may lead to repeated operations - such as repeated deductions, repeated emails, and repeated creation of records.
Ways to achieve idempotence:
Unique identifier: Generate a unique ID for each operation, and the system determines whether it has been processed based on the ID.
Status Check: Check the current status before performing an operation. If the target state has been reached, success will be returned directly.
Optimistic Lock: When updating data, check whether the data versions match to prevent concurrent modifications.
Principle 4: Defensive Design—Assume Agent Will Make Mistakes
When designing tools, it should be assumed that the Agent may pass in wrong parameters and take precautions.
Parameter verification: Check whether required parameters exist, whether the parameter type is correct, and whether the value is within a reasonable range.
Default Value: Provide reasonable default values for optional parameters to reduce the decision-making burden of the Agent.
Error prompt: When a parameter is wrong, clear and actionable error information is returned to help the Agent understand how to correct it.
Boundary processing: Handle boundary situations, such as empty results, oversized results, special characters, etc.
Principle 5: Contextual awareness - let the tool understand the environment
A good tool should be context-aware and adapt its behavior to the environment.
For example, a “send notification” tool should be able to:
- Select notification channel (email, SMS, App push) according to user preference
- Choose the right sending time based on time (avoid late night interruptions)
- Choose the appropriate message format according to the length of the content (email for long content, text message for short content)
This context-awareness can be passed through parameters or implemented through state management within the tool.
5.2 Performance Optimization - Cost and Balance of Abstraction Layer
MCP introduces an additional communication layer, which inevitably brings performance overhead. The key is finding a balance between flexibility and efficiency.
Understand the sources of performance overhead
Serialization cost: MCP uses JSON as the message format. Each call needs to serialize the parameters into JSON and deserialize when returning the result. This adds additional CPU overhead compared to direct function calls.
Network Latency: If the MCP server is remote, network round trip time (RTT) can become a bottleneck. A single tool call can require tens to hundreds of milliseconds of network latency.
Connection establishment: If there is no connection pool, each call needs to establish a new connection, which will become a serious performance problem in high concurrency scenarios.
Protocol processing: Although MCP’s message routing, error handling, timeout management and other functions improve reliability, it also increases processing overhead.
Detailed explanation of optimization strategy
Connection Pooling: The Art of Reuse
Implementing MCP connection pool needs to consider:
- Pool Size: Set an appropriate pool size based on concurrency requirements. Too small will cause waiting, too large will waste resources.
- Health Check: Regularly check whether the connection is available and remove failed connections in a timely manner.
- Timeout Management: Set reasonable connection timeout and idle timeout to prevent resource leakage.
- Load Balancing: If there are multiple MCP servers, load balancing needs to be achieved at the pool level.
Cache: space for time
Caching can significantly reduce duplicate calls:
- Tool metadata caching: Schema declarations of tools usually do not change frequently and can be cached for a long time.
- Result Caching: For idempotent query tools, results can be cached to avoid repeated execution.
- Intelligent caching strategy: Design different caching strategies (TTL, LRU, etc.) based on tool characteristics and parameter characteristics.
Batch Processing: Reduce Round Trips
If the Agent needs to call multiple tools continuously, consider:
- Batch call: One request contains multiple tool calls, reducing network round-trips.
- Preloading: Predict the data that may be needed, query and cache it in advance.
- Parallel Calls: Tool calls without dependencies can be executed in parallel.
Localized deployment: eliminate network delays
For frequently called tools:
- Local MCP Server: Deploy the tool on the same machine or network as the Agent to eliminate network delays.
- Edge deployment: Deploy tools closest to users to reduce transmission delays.
Performance degradation strategy
In extreme cases, performance degradation needs to be considered:
- Direct call mode: In performance-sensitive scenarios, it is allowed to directly call tools bypassing MCP.
- Asynchronous processing: Non-critical operations can be executed asynchronously without blocking the main process.
- Downgraded results: When the tool call times out, return cached data or default values.
Key indicators for performance monitoring
Establish a complete performance monitoring system:
- End-to-end delay: The complete time from the Agent initiating the call to receiving the result.
- Tool call delay: Pure tool execution time after network transmission is excluded.
- Success Rate: The proportion of successful tool calls.
- Retry rate: The proportion of calls that need to be retried.
- Queue Depth: Number of calls waiting to be executed.
5.3 Error handling - the philosophy of graceful failure
Tool calls can fail, this is an unavoidable reality in a production environment. MCP defines a standard error format, but how to handle errors still needs to be carefully designed.
The Art of Misclassification
Not all errors should be treated equally. Proper error classification is key to designing robust systems.
Retryable Error: Usually a temporary problem, retrying may succeed
- Network timeout
- Service is temporarily unavailable
- rate limit trigger
- Connection interrupted
Processing strategy: Exponential backoff retry, set the maximum number of retries, and turn it into a non-retryable error after exceeding it.
Non-retryable error: Usually a logic problem, retry will not succeed
- Insufficient permissions
- Invalid parameter
- Resource does not exist
- business rule violation
Processing strategy: Return the error immediately, do not try again, and let the Agent decide how to handle it.
Partial Success: The operation is partially completed and requires special handling
- Batch operation partially successful
- Multi-step operation partially completed
- Data part updated
Processing strategy: Return detailed operation results, allowing the Agent to understand which ones succeeded and which ones failed, and decide whether compensation operations are needed.
Readability of error messages
The error message is not only for the system to see, but also for the Agent to “see”.
Bad Error Message:
Error code: 500
Internal server error
Good error message:
Tool call failed: database connection timeout.
Possible cause: database load is too high or the network is unstable.
Recommended action: Retry after 30 seconds, or ask the database administrator to check database health.
A good error message should contain:
- What error occurred
- why it happens
- How to solve or circumvent
- Is manual intervention required?
Downgrade plan design
When a tool fails, are there alternatives?
Active and standby switching:
- When the primary database fails, switch to the standby database
- When the main API fails, switch to the backup API
- NOTE: Data for alternative options may not be up to date
Function downgrade:
- When real-time data query fails, cached data is returned
- When complex analysis fails, simplified analysis is returned
- When multi-source data fails, data from available data sources is returned.
Manual intervention:
- When key operations fail, manual processing is performed
- Record the failure context to facilitate manual takeover
- Provide convenient manual intervention interface
User Feedback Strategy
Is the user notified that the tool call failed? This is a matter of trade-off.
Transparency:
- Tell users what problems they encountered
- Describe the remedial measures being taken
- Provide alternatives or suggestions
Silent processing:
- Users switch to the downgrade solution without noticing
- Record errors and alert in the background
- Suitable for scenarios that have a greater impact on user experience
Mixed Processing:
- Determine notification strategy based on error type
- Critical errors must be communicated to the user
- Minor errors can be handled silently
Best Practices for Error Handling
Fail Fast: If the error is unrecoverable, fail as quickly as possible instead of trying continuously.
Graceful Downgrade: Always have a Plan B to ensure that the system can still provide services in the event of partial failure.
Context retention: Preserve complete context information during error propagation to facilitate problem diagnosis.
User First: The primary goal of error handling is to protect the user experience, not mask the problem.
5.4 Security Boundary—The Eternal Game of Convenience and Security
MCP provides a security mechanism, but you still need to be careful how to configure it. Security design requires finding a balance between convenience and security.
Practice of the principle of least privilege
Tool Level Permissions:
- Only open necessary tools to the Agent
- Regularly audit tool usage and remove unused tools
- Assign different tool permissions based on the Agent’s role
Operation Level Permissions:
- Distinguish between read-only operations and write operations
- Sensitive operations (deletion, transfer, configuration modification) require additional authorization
- Set upper limits for batch operations to prevent accidental large-scale changes
Data Level Permissions:
- Limit the range of data that the Agent can access
- Sensitive data desensitization processing
- Restrict data access based on user identity
Sensitive operation confirmation mechanism
Which operations require additional confirmation?
Financial related:
- Any operation involving funds
- Operations where the amount exceeds the threshold
- Transfer to new payee
Data security related:
- Deletion of data
- Modify key configuration operations
- Batch data export
Compliance related:
- Operations involving personal privacy information
- Data access across data boundaries
- Operations that may violate regulations
Confirmation mechanism design:
- Explicit Confirm: Ask the user to explicitly enter “confirm” or click the confirm button
- Two-step verification: Second-step verification through SMS, email, etc.
- Delayed execution: Delayed execution of sensitive operations, giving the user a time window for cancellation
- Manual review: Key operations are submitted to manual review and executed only after passing
Construction of audit log
Comprehensive audit logs are the basis for post-event tracing and problem diagnosis.
Record content:
- Who (which Agent/user)
- what time
- What tool was called?
- What parameters are passed in?
- What results are returned
- Execution time
- Is it successful?
Log storage strategy:
- Structured storage for easy query and analysis
- Set reasonable retention periods to balance storage costs and audit needs
- Desensitize sensitive information to prevent log leaks from causing security issues
Log Analysis:
- Real-time monitoring of abnormal calling patterns
- Regularly analyze tool usage trends
- Identify potential security threats
Rate Limiting and Abuse Protection
Preventing tools from being misused is a must in production environments.
Multi-dimensional current limiting:
- Current limiting by Agent: The upper limit of the calling frequency of each Agent
- Current limiting by tool: The upper limit of concurrent calls for each tool
- Per-user current limit: Call quota for each user
- Global throttling: Overall capacity protection of the system
Current limiting strategy:
- Token Bucket: Smooth burst traffic and allow certain bursts
- Leaky Bucket: Strictly control the output rate
- Sliding window: Precisely control the number of calls within the time window
Abuse Detection:
- Identify abnormal calling patterns (such as a large number of calls in a short period of time)
- Monitor the success rate of tool calls. A sudden drop may be a signal of attack.
- Establish a blacklist mechanism to block abnormal sources
fuse mechanism
When tools continue to fail, fuse protection should be used to avoid cascading failures.
Circuit trigger conditions:
- The error rate exceeds a threshold (such as 50%)
- The number of consecutive failures exceeds the threshold
- Response time exceeds threshold
Post-circuit breaker behavior:
- Return the error directly and no longer call the tool
- Switch to backup plan
- Notify operation and maintenance personnel
Circuit Break Recovery:
- Periodically try the half-open state to check if service has been restored
- Automatically close the circuit breaker after service is restored
- Record circuit break events to facilitate root cause analysis
Chapter Six: Future Prospects of MCP
6.1 Protocol evolution direction
MCP is still developing rapidly and may add:
- Resource Subscription: Support real-time data push
- Streaming Response: Supports long-running tools returning results incrementally
- Multi-modal: Supports non-text content such as images and audio
6.2 Ecological construction
Key nodes of MCP ecology:
- Official Tool Library: High-quality tools provided by Anthropic and ecological partners
- Tool Market: A discovery and distribution platform for third-party tools
- Certification System: Safety and quality certification of tools
6.3 Possibility of industry standards
MCP has the potential to become the de facto standard for agent tools:
- The technical design is reasonable and solves real pain points.
- Have strong promoters (Anthropic)
- Open source, community participation
But whether it can become a true industry standard also depends on:
- Will other major manufacturers follow suit (OpenAI, Google, etc.)
- Can ecology form a network effect?
- Verification of actual production environment
Chapter 7: Suggestions for Practitioners - Action Guide for MCP Implementation
7.1 Decision-making framework for the initial stage
When you decide to adopt MCP, the following decision framework can help you make an informed choice.
Phase 1: Assessment Phase (1-2 weeks)
**Is it suitable for MCP? **
Ask yourself the following questions:
- How many external tools does your Agent need to call? (less than 3 probably not worth it)
- Will these tools be reused by multiple agents? (The more reuse, the greater the value)
- Is the tool’s interface stable? (Frequent changes require the decoupling value of MCP)
- Does the team have the ability to maintain the agreement? (MCP requires additional development and operation and maintenance investment)
**MCP or other solutions? **
Compare other integration solutions:
- Direct call: Suitable for scenarios with a small number of tools, simple interfaces, and infrequent changes
- Configurable integration: Suitable for scenarios with a medium number of tools and limited team technical capabilities
- MCP: Suitable for scenarios with a large number of tools that need to be shared across teams and maintained for a long time.
Phase 2: Pilot phase (2-4 weeks)
Select pilot tool:
- Choose 1-2 most commonly used tools to pilot
- Give priority to tools with relatively stable interfaces and high frequency of use.
- Avoid selecting business-critical tools as first pilots
Verify value:
- Comparing the development costs of MCP integration versus direct integration
- Test whether the performance of MCP integration meets the requirements
- Collect feedback from developers on MCP development experience
Phase 3: Promotion phase (1-3 months)
Gradual migration:
- Adapt tools and strategies based on pilot experience
- Migrate remaining tools in batches to avoid a one-time overhaul
- Keep the old and new solutions running in parallel for a period of time
Establish specifications:
- Formulate MCP tool development specifications
- Establish tool registration and discovery processes
- Training team members
7.2 In-depth comparison between MCP and Function Calling
Many people are confused about the relationship between MCP and Function Calling. Let’s compare these two concepts in depth.
Positioning and abstraction levels
Function Calling:
- Level: Model capability layer
- Function: Allow the model to generate structured function call requests
- Scope: Defined at the model API level, it is the “language capability” for the model to interact with the external world.
MCP:
- Level: Application layer protocol
- Role: Define the communication standard between Agent and tool
- Scope: Cross-model platform, the “interoperability protocol” of the tool ecosystem
Analogy understanding:
- Function Calling is “the ability to speak”
- MCP is “Content and Format of Speech”
Without Function Calling, the Agent does not know how to “speak”; without MCP, the Agent and the tool “speak the same language” but “cannot understand each other”.
Technical implementation comparison
| Dimensions | Function Calling | MCP |
|---|---|---|
| Protocol format | Model manufacturer customization | Standardized JSON-RPC 2.0 |
| tool discovery | Application layer hard coding | Autodiscover (list_tools) |
| Tool description | Application layer definition | Tool self-description (Schema) |
| Transmission method | direct function call | Support stdio/SSE/HTTP multiple transmissions |
| security model | Application layer implementation | Built-in authentication and authorization mechanism |
| Cross-platform | Depend on specific model | Cross-model platform compatibility |
Detailed explanation of collaboration mode
Typical collaboration process:
- Intent Understanding: User input → Agent understands the intent
- Tool Selection: The Agent decides which tool to call based on the intent and the list of available tools.
- Call generation: Generate structured call requests through Function Calling
- Protocol conversion: MCP client converts Function Calling request into MCP protocol format
- Service call: MCP server receives the request and calls specific tools
- Result Return: The tool execution result is returned through the MCP protocol
- Response generation: Agent generates final response based on the results
In this process:
- Function Calling is responsible for “generating calls”
- The MCP is responsible for “executing the call”
The relationship between the two is upstream and downstream, not a substitution relationship.
Migration Strategy and Coexistence Mode
For Agents that have already implemented Function Calling:
Smooth migration strategy:
- Keep the capability layer of Function Calling unchanged.
- Migrate tool implementation to MCP server
- Add MCP client layer to convert Function Calling request to MCP protocol
- Gradually migrate tools to maintain compatibility
Hybrid Architecture:
- New tools are accessed via MCP
- Legacy tools retain direct connection to Function Calling
- Unified calling through adapter
This hybrid architecture is useful during transition periods and allows for gradual evolution without a one-time overhaul.
Select Suggestions
When to use Function Calling:
- Low number of tools (less than 5)
- Tools change infrequently
- rapid prototyping
- For internal use and not for external sharing
When to use MCP:
- Large number of tools (more than 10)
- Tools need to be reused by multiple agents
- Tools need to be provided externally
- Long-term maintenance system
Mixed Use:
- Core tools are accessed through MCP
- Special tools retain Function Calling direct connection
- Mask differences through unified interface layer
7.3 Team Capacity Building—Skills Model in the MCP Era
The introduction of MCP is not only a technical choice, but also a challenge to team capabilities.
Three types of key roles
1. MCP Architect
Responsible for the overall design and evolution of the MCP system.
Core Competencies:
- Understand the underlying principles of the MCP protocol
- Ability to design scalable MCP architectures
- Possess safety design capabilities
- Learn how to optimize performance
Main Responsibilities:
- Formulate MCP development specifications
- Classification and organization of design tools
- Evaluate and introduce new MCP tools
- Solve complex integration problems
2. Tool Developer
Responsible for packaging existing services into MCP tools.
Core Competencies:
- Familiar with MCP SDK and protocol details
- Have API design and packaging capabilities
- Understand Schema definition and validation
- Have error handling and logging capabilities
Main Responsibilities:
- Implement MCP tool interface
- Write tool documentation and examples
- Maintain tool versions and compatibility
- Handling tool-related bugs
3. MCP Operation and Maintenance Engineer
Responsible for the stable operation of the MCP system.
Core Competencies:
- Familiar with MCP deployment and monitoring
- Ability to diagnose and recover from faults
- Understand performance tuning methods
- Have security audit capabilities
Main Responsibilities:
- Deploy and maintain MCP servers
- Monitor MCP system health status
- Handling MCP related faults
- Conduct regular security audits
Skill Development Suggestions
Theoretical Learning:
- Read the MCP protocol specification in depth
- Study official examples and best practices
- Learn the JSON-RPC 2.0 protocol
Practical Training:
- Start with simple tool packaging
- Participate in the MCP open source project
- Establish internal MCP tools marketplace
Community Engagement:
- Join the MCP Developer Community
- Share experiences and pitfall experiences
- Contribute tools and tool libraries
7.4 Common pitfalls and avoidance strategies
Trap 1: Over-instrumentation
Symptoms: Encapsulating every small function into a tool leads to an explosion in the number of tools.
as a result of:
- Agent selection tool is difficult
- Increased tool management costs
- Too long call chain affects performance
avoid:
- Follow the principle of atomicity, but also consider practicality
- Regularly review the need for tools and merge or remove redundant tools
- Establish a tool classification and labeling system
Trap 2: Ignoring backward compatibility
Symptoms: The interface is directly modified when the tool is upgraded, causing the Agent that relies on it to fail.
as a result of:
- Production environment failure
- emergency rollback
- Trust among teams damaged
avoid:
- Follow semantic versioning specifications
- Maintain backward compatibility when interface changes
- Use incremental migration for major changes
Trap 3: Lack of safety design
Symptoms: Only focus on functional implementation and ignore safety design.
as a result of:
- data breach
- Unauthorized access
- The system is attacked
avoid:
- Safety design front
- Conduct regular security audits
- Establish a security response mechanism
Trap 4: Performance Neglect
Symptoms: Only focus on functions when developing and do not test performance.
as a result of:
- Performance is not up to standard after launch
- Deterioration of user experience
- Requires massive refactoring
avoid:
- Incorporate performance testing into the development process
- Establish performance baselines and monitor
- Design with performance optimization in mind
Trap Five: Lack of Monitoring
Symptoms: MCP system is running but lacks monitoring.
as a result of:
- Problem discovery lags
- Difficulty in troubleshooting
- Unable to continuously optimize
avoid:
- Establish a comprehensive monitoring system
- Set reasonable alarm thresholds
- Perform regular performance analysis
Appendix: Three real cases of cheating in MCP practice
Case 1: The “standard” but incompatible MCP implementation
Background: We implemented a database query tool in accordance with the MCP protocol specification and released it to the internal tool market with confidence. The Agent developer of another team connected according to the MCP specification, but found that it could not be used normally.
Problem Phenomenon: The connection is successful and the tool is successfully discovered, but the error “Parameter format error” is always reported when calling.
Troubleshooting:
After two days of troubleshooting, it was found that the problem lies in the parsing of JSON Schema:
- Our tool uses a relaxed JSON Schema validator that allows certain “approximate” matches
- The other party’s Agent uses a strict validator that requires exact Schema matching.
- Although the MCP protocol defines standards, there are differences in implementation details
The deeper problem is: some fields of the MCP protocol are vaguely defined, and different implementations have different understandings. For example, should the “description” field be plain text or support Markdown? How are “required” fields inherited in nested objects? These details are not spelled out in the agreement.
Solution:
- Conservative Implementation: Implement the protocol according to the strictest interpretation, ensuring compatibility with any compliant client
- Clear Documentation: Clearly state implementation details in tool documentation, especially those related to protocol ambiguities
- Compatibility Test: Compatibility test with mainstream MCP clients
- Version Lock: Clearly declare the supported MCP protocol version to avoid version confusion
After implementation, compatibility issues were significantly reduced. But this also exposes a reality: there are still differences in the actual implementation of so-called “standards”.
Lesson: Protocol standards are the starting point, not the end point. Actual interoperability requires more testing and coordination.
Case 2: The performance nightmare MCP call chain
Background: We transformed multiple tools into MCP interfaces, and Agent can call all tools through a unified MCP client. The architecture looks elegant.
Problem Phenomenon: After going online, it was found that the response time of the Agent increased from 2 seconds to 8 seconds, and the user experience seriously deteriorated.
Troubleshooting:
After in-depth analysis, the performance bottleneck was found:
- Each tool call needs to establish an MCP connection (we did not implement a connection pool)
- MCP’s message serialization/deserialization overhead is 3 times greater than direct API calls
- Data transfer between tools requires multiple encoding and decoding
The deeper problem is: MCP’s abstraction layer brings flexibility, but it also brings overhead. When the tool is called frequently, these overheads accumulate and become a serious problem.
Solution:
- Connection Pool: Implement MCP connection pool and reuse connections instead of creating new ones every time
- Batch call: Batch tool calls as much as possible to reduce the number of round trips
- Local cache: Cache tool metadata to avoid repeated queries
- Performance degradation: In performance-sensitive scenarios, direct calls are allowed to bypass MCP.
After implementation, the response time dropped to 3.5 seconds, which is still slower than a direct call, but within the acceptable range.
Lesson: Abstraction has a cost. In performance-sensitive scenarios, flexibility and efficiency need to be weighed.
Case 3: The abused MCP tool
Background: We open the company’s core database query tool to multiple agents through MCP. The original intention was to improve the standardization of data access.
Problem Phenomenon: One month after going online, the CPU usage of the database soared, and some queries caused the database to lock up.
Troubleshooting:
The investigation found:
- An Agent repeatedly calls the database query tool in a loop without caching the results.
- Another Agent generated a complex SQL query, but did not limit the amount of returned data, querying millions of data at once.
- There is also an Agent that does not limit the calling frequency in concurrent scenarios, causing the database connection pool to be exhausted.
The deeper problem is: MCP makes it easy to use the tool, but it also makes it easy to abuse the tool. Agent can dynamically generate calling parameters, which makes it difficult for traditional current limiting and protection mechanisms to take effect.
Solution:
- Call current limit: Implement call frequency limit at the MCP server layer
- Cost Quota: A query cost quota is assigned to each Agent. If it exceeds the limit, you need to apply.
- Query Review: Static analysis of generated SQL to intercept dangerous queries
- Audit Log: records all tool calls and regularly reviews for abnormal patterns
- Circuit breaker: When the database load is too high, new query requests are automatically rejected
After implementation, the database load returned to stability. But this made us realize: MCP’s security model needs a more rigorous design.
Lesson: Convenience and security are often at odds. While lowering the threshold for use, safety control must be strengthened.
Conclusion: Standardization is the prerequisite for scale
Back to the integration nightmare at the beginning of the article - if MCP had become popular at that time:
- The database provides MCP interface, we do not need to write connection pool management code
- The mail service provides an MCP interface, and we do not need to deal with templates and current limiting.
- The work order system provides an MCP interface, so we don’t need to chew through incomplete documents.
The integration work has changed from “writing thousands of lines of adaptation code” to “configuring several MCP connections”.
**The value of MCP lies not in the new capabilities it creates, but in the fact that it lowers the threshold for integration. **
In the history of software development, standardized protocols are often the starting point for ecological prosperity:
- HTTP allows web applications to communicate with each other
- REST unifies API design
- USB allows peripherals to plug and play
MCP is expected to become a similar catalyst for the Agent ecosystem:
- Lower the barriers to tool development and integration
- Promote innovation in Agent applications
- Form a healthy tool market
For Agent developers: Embracing MCP means being able to access a rich tool ecosystem and focus on the intelligence of the Agent itself.
For tool developers: Embracing MCP means developing once and using it everywhere, expanding the influence of tools.
For the entire ecosystem: MCP may be a key step for Agent to move from “proof of concept” to “scale application”.
Standardization is never an end, but a means. The real goal is to enable Agent technology to serve more people, solve more problems, and create greater value.
MCP may not be perfect, but it takes an important step.
Reference resources
original:
MCP official resources:
- MCP official documentation: https://modelcontextprotocol.io
- MCP GitHub: https://github.com/modelcontextprotocol
- Python SDK: https://github.com/modelcontextprotocol/python-sdk
*This article is an original practical summary, written based on personal project experience. *
Last updated: 2026-03-12
Reading path
Continue along this topic path
Follow the recommended order for AI engineering practice instead of jumping through random articles in the same topic.
Next step
Go deeper into this topic
If this article is useful, continue from the topic page or subscribe to follow later updates.
Loading comments...
Comments and discussion
Sign in with GitHub to join the discussion. Comments are synced to GitHub Discussions