Building with MCP, for Real

This year I built several Model Context Protocol (MCP) servers to support ongoing R&D. Here’s my report.

But first, some context. What is MCP, and why am I using it?

Why Am I Using MCP?

MCP is an open-source standard for connecting AI applications to external systems. Here’s some detail from a recent article:

“Mike Krieger, [MCP’s] chief product officer, says the idea came while linking Claude, its chatbot, to services like Gmail, an email platform, and GitHub, a repository of code. Instead of integrating each application with Claude on a case-by-case basis, the firm wanted a shared set of rules to help agents directly access a user’s emails or files.” — The Economist, “Intelligent Protocol,” Dec 13, 2025

The MCP experience goes like this:

Me: “How many ninja did the academy train last year who use Water Style?”
LLM: “I can answer that using my ninja_academy_students tool! May I call it?”
Me: “You may.”
LLM: (some time later) “22 students in the academy last year use Water Style. 7 graduated and are now in the field.”

I built two servers that answered questions like this using Claude Desktop; one called a REST API, the other queried SQL.

What Worked

First, the good stuff. Here are some things I felt worked well.

While sparse, the MCP docs are useful. I was able to build a working server using the guide in half an hour. That’s encouraging to me for something so new.

Claude Desktop is good at writing MCP server code. If you’re using Claude Desktop as your LLM, you’ll find it’s also good at generating MCP server code. Not surprisingly, the company that specified MCP also has an LLM that knows a lot about MCP. Using Claude Desktop to write some code and test it makes sense.

TypeScript is an ideal MCP server language. LLMs know a lot about TypeScript, and they write it well.

What Didn’t Work

I am excited by the promise of MCP! But the rest of this post outlines challenges and skepticism.

For problems like mine, MCP seems like the right kind of tool. But today, it feels limited.

Still Requires Real Engineering Work

My first issue with MCP is that the questions-to-tools ratio feels too tightly coupled. It still requires real engineering work.

Imagine we want to know about our village ninja performance. Consider this query:

Me: “Who is our village’s best ninja?”
Claude Desktop: “You gave me tools to get ninja battles and ninjas. I’m going to get both, combine them, and then apply my concept of ‘best.’ And so… (some time later) it’s Sasuke Uchiha.”

To be fair, you’ll get some context about the answer that I’m leaving out.

So, what did Claude do? It:

Relied on me (a human!) to provide a set of verbose tools.
Chose a tool based on my query.
Ran the tool and applied its intelligence to the result.
Told me the answer (with ample emojis)

This feels like magic. But it took more engineering effort than you might think. I was surprised by this. No-code or low-code, this isn’t.

We’ve been trained to think AI always has an answer. But with a simple MCP, it’s easy to stump the LLM. The more sophisticated the query ideas, the more sophisticated the MCP server.

Garbage In, Garbage Out

MCP didn’t invent messy, human-compromised data. However, it also doesn’t solve it, and it might expose how limiting it can be.

Here’s an example. When I asked my MCP server, “Who won the most ninja battles on December 29?”, it returned seven battles on December 29, three of which were won by Sasuke Uchiha.

What does that mean? Did Sasuke defeat three enemies? Or did he win three skirmishes of little consequence? Were some training battles? Or should we be reading the battle report differently? The data by itself is incomplete.

In the real world, no two teams are going to use their data tools in the exact same way. “Winning” could be “status: ‘victory’” or “status: 9”. Understanding this means reading docs and writing software.

Security

Giving client data to an LLM has some sharp security edges.

This is a problem with everything we do with LLMs, but it escalates when you’re giving the LLM your most valuable data. I don’t think we’re talking about this enough.

Random Thoughts

Solving the Data Constraint

One idea I had was to make the data more intelligible is to survey datastore users about how they use their data, which could then be fed to the LLM and used while prompting the MCP code. If a battle can be won by killing the enemy or enemy’s surrender, or the MCP server knows that training battles aren’t meaningful, that could help it return better answers.

Also, when doing SQL MCP in particular, giving the LLM the database schema while prompting the code saved me a lot of time. Tables are never named what you expect— “ninjas” is called named “TblNinjas”, or whatever. Don’t make the LLM guess.

Consider Agentic Programming

I think if you’re doing MCP, you perhaps should be doing agentic programming. I wasn’t, and wasted time telling Claude about the code on my computer that it had initially written.

Who Should Not Use This, Yet 

I think about technology adoption using the Thoughtworks Technology Radar. For me, MCP is “Assess”:

Assess. “Things to look at closely, but not necessarily trial yet — unless you think they would be a particularly good fit for you.”

This is R & D. You’re going to have to pay people to learn it. You can’t have a crucial business outcome riding on it. Your team might work with it for a month and have no profitable product at the end. Don’t invest the capital unless you’re willing to accept these outcomes.

Conclusion

For more thoughts on MCP, check out Theo’s recent YouTube videos. He’s a critic of the protocol, and at this stage, I think that’s an important voice.

For now, I’m going to keep learning about and experimenting with many AI solutions, including MCP.

Why Am I Using MCP?#

What Worked#

What Didn’t Work#

Still Requires Real Engineering Work#

Garbage In, Garbage Out#

Security#

Random Thoughts#

Solving the Data Constraint#

Consider Agentic Programming#

Who Should Not Use This, Yet #

Conclusion#