YottaDynamics Technical Notes

Infrastructure, AI agents, and platform engineering — written for engineers building production systems.

Abstract cover illustration for getting started with Ollama and hosting local LLMs

Get Started Hosting Your Own LLMs Using Ollama: Your Private AI Playground

Ollama is one of the fastest ways to move from cloud API dependency to local model serving on your own machine. This guide covers installation, first-run commands, model selection, Open WebUI, Modelfiles, and basic API usage.

Abstract cover illustration for AI agent observability with logs traces and metrics

Observability for AI Agents: Logs, Traces, and Metrics That Actually Tell You Something

Monitoring an agent is not the same as monitoring a service. The question shifts from whether it is running to whether it is reasoning correctly — and that requires a different observability stack built around structured traces, quality metrics, and cost attribution.

Abstract cover illustration for AI agent failure modes in production

Why Agents Fail in Production (And How to Catch It Before It Reaches Your Users)

Non-deterministic systems require evaluation strategies that traditional QA cannot provide. Closing the gap requires a golden dataset, trajectory analysis, an LLM-as-judge pipeline, and a feedback loop that runs before every deployment.

Abstract cover illustration showing the AI agent memory loop: write, store, manage, and retrieve across short-term and long-term memory layers

How Memory Works in AI Agents: Turning Stateless LLMs into Persistent, Learning Systems

LLMs are stateless by design. Memory is the external infrastructure that turns them into agents that learn from experience, remember preferences, and actually improve over time. Here’s how it works — technically.

Abstract cover showing the RAG pipeline stages: offline indexing with vector and graph pipelines, and online querying with hybrid search, re-ranking, and grounded LLM generation.

A Deep Dive into Retrieval-Augmented Generation (RAG)

RAG fixes the core problems with pure LLMs — hallucination, stale knowledge, private data — by making retrieval a first-class citizen. Here’s the full technical picture: vector search, hybrid retrieval, GraphRAG, agentic patterns, and what a production stack actually looks like in 2026.

Abstract cover showing Markdown document structure and a RAG retrieval pipeline side by side

The Technical Blueprint for AI Speed: Markdown vs. RAG

The storage format you choose for AI knowledge directly shapes your system’s latency, token density, and semantic clarity. A pragmatic breakdown of when to use raw Markdown, when to build a RAG pipeline, and why the best production systems use both.

Abstract cover illustration for AI agent architecture covering memory, tools, orchestration, and production observability

AI Agent Architecture: Memory, Tools, Orchestration, and Production

Most ‘my agent broke’ investigations don’t end at the model. They end in memory design, tool scope, orchestration logic, or missing observability. This post covers the plumbing that actually determines whether an agent works in production.

Abstract cover illustration for hosting local LLMs on Kubernetes enterprise architecture guide

Hosting Local LLMs on Kubernetes: A Complete Enterprise Architecture Guide

A deep-dive into every layer of a production-grade, fully open-source stack for self-hosting large language models — from the API gateway to the GPU compute plane.

Abstract cover illustration for a practical guide to AI agents — the think-act-observe loop

What Is an AI Agent? A Practical Guide for Builders

The term ‘AI agent’ gets applied to everything from a ChatGPT thread with a button to systems that autonomously manage deployments. That imprecision directly shapes the architectures you choose and the failure modes you inherit.

Abstract cover illustration for Claude Code terminal deep-dive reference

Claude Code in the Terminal: The Deep-Dive

Most teams use Claude Code like a smart autocomplete. The teams getting real leverage treat it as an engineering platform — with CLAUDE.md, permissions, hooks, sub-agents, skills, and CI/CD integration designed in from the start.

Stay current on AI infrastructure and platform engineering