top of page

Building a Smart Home Assistant with MCP: A Practical Guide to Home Automation

Oct 15

4 min read

Fabiano Calado

0

38

grabs popcorn and sits on the couch

– Hey Siri: turn off living room lights and open Netflix


In the rapidly evolving landscape of generative AI, the Model-Context-Protocol (MCP) framework is emerging as a game-changer. This article will guide you through building your own simple smart home assistant that can understand natural language commands and control your devices seamlessly. Full code at https://github.com/caladoxd/home-control.


Why MCP?

MCP represents a paradigm shift in how we interact with systems. Unlike traditional home automation systems that require specific commands or complex integrations, MCP allows us to create a more natural, conversational interface with anything, with the AI choosing what to do based on our requests. Think of it as having a personal butler who understands your needs and can control your entire home ecosystem.


MCP also allows rapid expansion of the system, by integrating already existing servers into it (like we did with BrowserMCP).


What makes MCP particularly powerful for our use case is its ability to:

  • Handle complex, multi-step commands

  • Maintain context across conversations

  • Provide natural, human-like responses

  • Scale to include new servers and capabilities


The Architecture

Our smart home assistant consists of four main components:

  • MCP API client: The brain of our system, powered by Google's Gemini AI

  • Tuya Server: Handles communication with Tuya smart devices (cost effective)

  • BrowserMCP for automated interactions with the web browser

  • iPhone: Uses the Shortcuts app to send voice commands to the assistant


This modular architecture allows us to:

  • Process natural language commands

  • Control smart devices

  • Interact with web services

  • Scale the system easily: add more devices and more services


Getting Started

To build your own MCP-powered smart home:

  1. Set up the basic infrastructure:

  • Install required dependencies

  • Configure your environment variables

  • Set up your Tuya account

  1. Implement the core components:

    • MCP client with Gemini AI

    • Tuya server for device control

    • Browser MCP for web interactions

  2. Configure your devices:

    • Add device mappings

    • Set up aliases

    • Test basic commands


Building the Assistant

Let's break down the key components and how to set them up:


1. The custom MCP Client / MCP API:

app = FastAPI(lifespan=lifespan)
@app.post("/")
async def process_command(request: CommandRequest):
    """Process a natural language command"""
    try:
        conversation_history = []
        logger.info(f"Processing command: {request.command}")

        # Store device mappings if we get them
        device_mappings = None
        results = []
        command_count = 0
        MAX_COMMANDS = 10  # Safety limit

        while command_count < MAX_COMMANDS:
            # Get tool call from Gemini
            gemini_result = ask_gemini_for_tool(
                request.command,
                tools,
                device_mappings,
                command_count,
                conversation_history
            )
            logger.info(f"Command {command_count}: {gemini_result}")

            if not gemini_result or "tool" not in gemini_result or "arguments" not in gemini_result:
                break

            tool_name = gemini_result["tool"]
            arguments = gemini_result["arguments"]
            mcp_result = await call_mcp_tool(tool_name, arguments)

            # Add tool call and its result to conversation history
            conversation_history.append({
                "role": "assistant",
                "content": {"tool": tool_name, "arguments": arguments}
            })
            conversation_history.append({
                "role": "system",
                "content": mcp_result
            })

            results.append(mcp_result)
            command_count += 1

            # If we've hit the command limit or received done,break
            if command_count >= MAX_COMMANDS or tool_name =="done":
                break

        # Return the last result
        return 
          {
                "status": "success", 
              "message": json.loads(results[-1])["arguments"]["message"]
          } 
        if results else 
          {
                "status": "error", 
              "message": "No results"
          }

    except Exception as e:
        logger.error(f"Error processing command: {str(e)}")
        raise HTTPException(status_code=500, detail=str(e))

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=int(PORT))

To control the system by voice, we need to build a custom MCP client with an API endpoint that receives commands and uses Google Gemini AI to control everything. It can:

  • Interpret complex requests

  • Handle multiple commands in a single request

  • Maintain conversation context

  • Provide natural responses


The lifespan function reads a JSON with MCP server configuration, just like Cursor console does, using async stack to instantiate multiple servers in the same session and adding the tools to a list that's passed to the agent.


@asynccontextmanager
async def lifespan(app: FastAPI):
    """Lifespan context manager for FastAPI app"""

    # Startup
    logger.info("Starting application lifespan...")
    async with contextlib.AsyncExitStack() as stack:
        # First create all stdio clients
        stdio_pairs = {}

        for name, params in server_params.items():
            stdio_client_ctx = stdio_client(StdioServerParameters(**params))
            stdio_pairs[name] = await stack.enter_async_context(stdio_client_ctx)

        for name, (read, write) in stdio_pairs.items():
            session_ctx = ClientSession(read, write)
            session = await stack.enter_async_context(session_ctx)
            await session.initialize()
            server_sessions[name] = session
            tools_result = await session.list_tools()

            global tools

            # Add Tuya tools
            for tool in tools_result.tools:
                tools.append({
                    "name": f"{name}/{tool.name}",
                    "description": tool.description,
                    "inputSchema": tool.inputSchema
                })
        yield


2. Device Control

DEVICE_MAPPINGS = {
    "living room": {
        "id": "<device_id>",
        "aliases": ["dining room", "living-room"]
    },
    # ... more devices
}

 # Add Tuya tools
for tool in tools_result.tools:
        tools.append({
            "name": f"{name}/{tool.name}",
            "description": tool.description,
            "inputSchema": tool.inputSchema
        })
    yield

Our setup uses a flexible device mapping system that:

  • Supports multiple device names and aliases (for different languages)

  • Makes it easy to add new devices

  • Provides a natural way to reference devices


3. The command flow

The flow then looks as follows:



Real-World Examples

Here are some commands our system can handle:

  • "Turn on the living room lights and set them to 50% brightness"

  • "Turn off all lights in the bedroom"

  • "Open Youtube, search for pop live radio and then play the first result"


The Future of Home Automation

MCP is revolutionizing home automation by:

  • Making it more accessible to non-technical users

  • Providing a more natural interaction model

  • Enabling complex automation scenarios

  • Supporting multiple device ecosystems


Conclusion

The combination of MCP and modern AI technologies is creating a new era of home automation. By building on this framework, we can create smart homes that are not just automated but truly intelligent and responsive to our needs.The code we've explored demonstrates how to create a practical, scalable home automation system that can understand and execute complex commands while maintaining a natural, conversational interface. As MCP continues to evolve, we can expect even more sophisticated capabilities and integrations.

Oct 15

4 min read

Related Posts

bottom of page