<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AR-Kube (Mesut Oezdil)]]></title><description><![CDATA[GPU systems, scheduling, and AI infra. Breaking down HAMi and real-world GPU problems.]]></description><link>https://mesutoezdil.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!zABN!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F645d6c03-4d5b-4067-ab03-473c24075b55_500x500.png</url><title>AR-Kube (Mesut Oezdil)</title><link>https://mesutoezdil.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 23 May 2026 02:15:20 GMT</lastBuildDate><atom:link href="https://mesutoezdil.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Mesut Oezdil]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[mesutoezdil@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[mesutoezdil@substack.com]]></itunes:email><itunes:name><![CDATA[AR-Kube (Mesut Oezdil)]]></itunes:name></itunes:owner><itunes:author><![CDATA[AR-Kube (Mesut Oezdil)]]></itunes:author><googleplay:owner><![CDATA[mesutoezdil@substack.com]]></googleplay:owner><googleplay:email><![CDATA[mesutoezdil@substack.com]]></googleplay:email><googleplay:author><![CDATA[AR-Kube (Mesut Oezdil)]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[I Ran an MCP Server on the Serverless Endpoints]]></title><description><![CDATA[Real commands, real latency numbers, one unexpected detour]]></description><link>https://mesutoezdil.substack.com/p/i-ran-an-mcp-server-on-the-serverless</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/i-ran-an-mcp-server-on-the-serverless</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Mon, 11 May 2026 11:13:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9mI9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most MCP setups follow the same pattern. The server runs on your laptop, or on a VM that you keep alive somewhere. Both options mean you are managing a process. You are responsible for it. When it crashes, you restart it. When your laptop closes, the tool disappears.</p><p>I wanted to see if I could move the MCP server somewhere else. Somewhere I did not have to think about.</p><p>Nebius recently shipped something called <a href="https://nebius.com/services/serverless">Endpoints</a> as part of their new serverless compute offering. You give it a container image and it runs the container for you. You get back a public URL. No server to set up, no drivers to install, no cluster to configure. I thought about my <code>mcp-gpu-server</code> project, where I built an MCP server that talks directly to NVIDIA hardware using <code>NVML</code>. That one has to run near the hardware. But most MCP servers do not. Most of them are just <code>HTTP</code> services. So why do they need to live on a machine I manage?</p><p>This is the story of what happened when I tried to run one on Nebius instead.</p><h2>How the pieces fit together</h2><p>The setup has two parts. On your Mac, Claude Desktop talks to a small local Python script called <code>bridge.py</code> over the <code>stdio</code> protocol that MCP uses. That script takes every tool call Claude makes and forwards it as an <code>HTTP</code> request to a container running on Nebius. That container then calls <a href="https://nebius.com/services/token-factory">Nebius Token Factory</a> to produce an embedding and sends the result back. The model itself, <code>Qwen3-Embedding-8B</code>, lives in Token Factory. My container is just a thin layer that receives requests and forwards them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9mI9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9mI9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png 424w, https://substackcdn.com/image/fetch/$s_!9mI9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png 848w, https://substackcdn.com/image/fetch/$s_!9mI9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png 1272w, https://substackcdn.com/image/fetch/$s_!9mI9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9mI9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png" width="1456" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:149974,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/197141706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9mI9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png 424w, https://substackcdn.com/image/fetch/$s_!9mI9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png 848w, https://substackcdn.com/image/fetch/$s_!9mI9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png 1272w, https://substackcdn.com/image/fetch/$s_!9mI9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9f8be8a-9b2c-4eff-9228-e650a60ae7a3_1846x964.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The machine I used</h2><p>I did this on a Nebius GPU VM called <code>nebius-tarantula</code>, which I already had running for other work. Ubuntu, <code>eu-north1 </code>region. I checked two things before writing a single line of code.</p><pre><code><code>python3 --version
Python 3.12.3

docker --version
Docker version 29.2.1, build a5c7197</code></code></pre><p>Python 3.12 and Docker 29. That was enough to start.</p><h2>Setting up the project</h2><p>I created a separate directory and a virtual env. The reason for the virtual env is simple: I did not want whatever I install here to affect anything else running on the machine. This VM has other projects on it.</p><pre><code><code>mkdir mcp-serverless &amp;&amp; cd mcp-serverless
python3 -m venv venv
source venv/bin/activate
pip install fastapi uvicorn requests</code></code></pre><p><code>FastAPI</code> handles the <code>HTTP</code> routing. <code>Uvicorn</code> is the server that runs it. Requests is how the container calls Token Factory. That is the whole dependency list. There is no ML framework here because this server does not run any models. It receives a request, forwards it to Token Factory, and returns the result.</p><h2>The server itself</h2><p>The server needs three routes. The first one, <code>/tools</code>, is how MCP clients discover what a server can do. Without it, Claude has no idea what tools are available. The second, <code>/call</code>, is what gets hit when Claude actually wants to use a tool. The third, <code>/health</code>, is specifically for Nebius. When the container starts, Nebius pings <code>/health</code> before it routes any real traffic. If that route does not exist, the platform assumes the container is broken.</p><pre><code><code>from fastapi import FastAPI
from pydantic import BaseModel
from typing import Any
import requests
import os
import time

app = FastAPI()

NEBIUS_API_KEY = os.getenv("NEBIUS_API_KEY", "")
NEBIUS_EMBED_URL = "https://api.studio.nebius.ai/v1/embeddings"

@app.get("/tools")
def list_tools():
    return {
        "tools": [
            {
                "name": "embed_text",
                "description": "Generate embeddings via Nebius Token Factory",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "text": {
                            "type": "string",
                            "description": "Text to embed"
                        }
                    },
                    "required": ["text"]
                }
            }
        ]
    }

class CallRequest(BaseModel):
    tool: str
    parameters: dict[str, Any]

@app.post("/call")
def call_tool(req: CallRequest):
    if req.tool == "embed_text":
        text = req.parameters.get("text", "")
        start = time.time()
        response = requests.post(
            NEBIUS_EMBED_URL,
            headers={
                "Authorization": f"Bearer {NEBIUS_API_KEY}",
                "Content-Type": "application/json"
            },
            json={
                "model": "Qwen/Qwen3-Embedding-8B",
                "input": text
            }
        )
        elapsed = time.time() - start
        data = response.json()
        embedding = data["data"][0]["embedding"]
        return {
            "tool": "embed_text",
            "result": {
                "embedding_dim": len(embedding),
                "first_5_values": embedding[:5],
                "latency_seconds": round(elapsed, 3)
            }
        }
    return {"error": f"Unknown tool: {req.tool}"}

@app.get("/health")
def health():
    return {"status": "ok"}</code></code></pre><p>Notice that the API key comes from an env variable. It is never written into the code or baked into the container image. The image will end up on Docker Hub, and I do not want credentials there.</p><p>Before touching Docker I tested this directly on the VM. The reason is that debugging a broken container on a remote platform is much harder than debugging a broken script on a machine where you can see what is happening. Test first, containerize second.</p><pre><code><code>export NEBIUS_API_KEY="your_token_factory_key"
nohup uvicorn server:app --host 0.0.0.0 --port 8000 &amp;</code></code></pre><p>I used nohup and the ampersand so the server runs in the background. That way I can keep using the same terminal to run curl.</p><pre><code><code>curl http://localhost:8000/tools</code></code></pre><p>The tool description came back. Then the actual embedding call:</p><pre><code><code>curl -X POST http://localhost:8000/call \
  -H "Content-Type: application/json" \
  -d '{"tool": "embed_text", "parameters": {"text": "Merhaba d&#252;nya"}}'</code></code></pre><p>4096-dimensional vector. 85ms. That 85ms is the floor for everything that follows. It is what a Token Factory call costs when you are already inside the Nebius network in the same region.</p><h2>Packaging as a container</h2><p>The Dockerfile is deliberately small. No GPU base image, no CUDA toolkit, no large ML libraries. This server does not need any of that. A slim Python image is everything it needs.</p><pre><code><code>FROM python:3.12-slim

WORKDIR /app

COPY server.py .

RUN pip install fastapi uvicorn requests

EXPOSE 8000

CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8000"]</code></code></pre><p>I built the image and tested it on port <code>8001</code> so it would not clash with the server already running on <code>8000</code>.</p><pre><code><code>docker build -t mcp-server .

docker run -d -p 8001:8000 \
  -e NEBIUS_API_KEY=$NEBIUS_API_KEY \
  --name mcp-test \
  mcp-server

curl http://localhost:8001/tools</code></code></pre><p>The tool list came back from the container. The image works.</p><h2>The Container Registry detour</h2><p>My original plan was to push the image to Nebius Container Registry. I went to Storage in the Nebius console and created a registry called mcp-registry.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o2Yw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o2Yw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png 424w, https://substackcdn.com/image/fetch/$s_!o2Yw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png 848w, https://substackcdn.com/image/fetch/$s_!o2Yw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png 1272w, https://substackcdn.com/image/fetch/$s_!o2Yw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o2Yw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png" width="1456" height="759" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6275d618-f90a-425c-918e-f74d62437685_3002x1564.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:759,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:357182,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/197141706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o2Yw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png 424w, https://substackcdn.com/image/fetch/$s_!o2Yw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png 848w, https://substackcdn.com/image/fetch/$s_!o2Yw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png 1272w, https://substackcdn.com/image/fetch/$s_!o2Yw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6275d618-f90a-425c-918e-f74d62437685_3002x1564.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Container Registry page, no registries yet</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s-3w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s-3w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png 424w, https://substackcdn.com/image/fetch/$s_!s-3w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png 848w, https://substackcdn.com/image/fetch/$s_!s-3w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png 1272w, https://substackcdn.com/image/fetch/$s_!s-3w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s-3w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png" width="1456" height="666" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:666,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:295164,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/197141706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s-3w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png 424w, https://substackcdn.com/image/fetch/$s_!s-3w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png 848w, https://substackcdn.com/image/fetch/$s_!s-3w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png 1272w, https://substackcdn.com/image/fetch/$s_!s-3w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b850d0d-b3ac-4d2f-bbe5-dd2afda307a4_3024x1384.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">mcp-registry created, status Active, ID registry-e00c28eawvhhb0ew1r</figcaption></figure></div><p>Then I tried to authenticate from the VM. The Nebius CLI was installed but had no config. Setting it up requires a service account and a PEM-encoded private key. I created the service account, added it to the editors group, generated a key, and then realized the key I got was an S3-compatible access key, not an IAM token. The registry needed a different kind of authentication.</p><p>At that point I switched to Docker Hub. Nebius Endpoints accepts images from Docker Hub without any extra setup, and I had already spent enough time on authentication.</p><pre><code><code>docker login -u mesutoezdil
docker tag mcp-server mesutoezdil/mcp-server:latest
docker push mesutoezdil/mcp-server:latest</code></code></pre><p>If you want to use Nebius Container Registry properly you can, once you have the Nebius CLI configured with a service account key. For this experiment I moved on.</p><h2>Creating the endpoint</h2><p>In the Nebius console, AI Services in the left sidebar, then Endpoints. The page was empty.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U4dp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U4dp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png 424w, https://substackcdn.com/image/fetch/$s_!U4dp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png 848w, https://substackcdn.com/image/fetch/$s_!U4dp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!U4dp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U4dp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png" width="1456" height="607" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:607,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:289863,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/197141706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U4dp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png 424w, https://substackcdn.com/image/fetch/$s_!U4dp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png 848w, https://substackcdn.com/image/fetch/$s_!U4dp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!U4dp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0f8202f-b62e-423d-b3e5-a788fe642164_3012x1256.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Endpoints page, Deploy your first Serverless endpoint in minutes</figcaption></figure></div><p>I clicked Configure it yourself. The quick start option pre-fills an nginx image and does not show env variable settings until later in the flow. I needed to set the API key from the start, so the manual form made more sense.</p><p>The form has several fields. For name I typed <code>mcp-server-endpoint</code>. For image path I typed <code>docker.io/mesutoezdil/mcp-server:latest</code>. For port I changed the default <code>8080</code> to <code>8000</code>, which is the port uvicorn listens on inside the container.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uiu-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uiu-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png 424w, https://substackcdn.com/image/fetch/$s_!uiu-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png 848w, https://substackcdn.com/image/fetch/$s_!uiu-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png 1272w, https://substackcdn.com/image/fetch/$s_!uiu-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uiu-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:449221,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/197141706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uiu-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png 424w, https://substackcdn.com/image/fetch/$s_!uiu-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png 848w, https://substackcdn.com/image/fetch/$s_!uiu-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png 1272w, https://substackcdn.com/image/fetch/$s_!uiu-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24e2f9bb-4546-429a-8a8f-3295f382b412_3014x1484.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Create endpoint form, mcp-server-endpoint, docker.io/mesutoezdil/mcp-server:latest, port 8000, cost estimate 0.14 per hour on the right</figcaption></figure></div><p>Under env variables I added <code>NEBIUS_API_KEY</code> with the Token Factory key as the value. This is what the container reads when it starts up.</p><p>The default compute selection is GPU, which costs around 1.59 per hour. This server does not run any models locally, so GPU is waste of money. I clicked Without GPU, selected Non-GPU AMD Epyc Genoa, and chose 4 vCPUs with 16 GiB of RAM. Cost: 0.14 per hour.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oA6N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oA6N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png 424w, https://substackcdn.com/image/fetch/$s_!oA6N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png 848w, https://substackcdn.com/image/fetch/$s_!oA6N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png 1272w, https://substackcdn.com/image/fetch/$s_!oA6N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oA6N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png" width="1456" height="1229" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/de1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1229,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:312506,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/197141706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oA6N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png 424w, https://substackcdn.com/image/fetch/$s_!oA6N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png 848w, https://substackcdn.com/image/fetch/$s_!oA6N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png 1272w, https://substackcdn.com/image/fetch/$s_!oA6N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde1d2c95-22df-41bf-b980-f2c33bca9424_1962x1656.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Endpoint settings, NEBIUS_API_KEY filled in, Without GPU selected, Non-GPU AMD Epyc Genoa chosen</figcaption></figure></div><p>I clicked Create. The status showed Provisioning. About two minutes later it changed to Running and a public endpoint address appeared in the Network section: <code>89.xxx.yyy.cc:8000</code>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ijUi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ijUi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png 424w, https://substackcdn.com/image/fetch/$s_!ijUi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png 848w, https://substackcdn.com/image/fetch/$s_!ijUi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png 1272w, https://substackcdn.com/image/fetch/$s_!ijUi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ijUi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png" width="1456" height="799" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:799,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:436870,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/197141706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ijUi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png 424w, https://substackcdn.com/image/fetch/$s_!ijUi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png 848w, https://substackcdn.com/image/fetch/$s_!ijUi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png 1272w, https://substackcdn.com/image/fetch/$s_!ijUi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8a020cc-2f51-4df9-bccd-809ed2ca3124_3024x1660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">mcp-server-endpoint Running, public endpoint 89.169.111.36:8000, 4 vCPUs, 16 GiB, Non-GPU AMD Epyc Genoa</figcaption></figure></div><p>This was the moment that made the whole thing real. A container I had built on a VM was now running on a platform I did not configure, reachable from the public internet, and I had not touched a single network setting.</p><p>I tested it immediately from the VM:</p><pre><code><code>curl http://89.169.111.36:8000/tools</code></code></pre><p>Tool list. Then from my Mac:</p><pre><code><code>curl -X POST http://89.169.111.36:8000/call \
  -H "Content-Type: application/json" \
  -d '{"tool": "embed_text", "parameters": {"text": "Hello from Nebius Serverless MCP"}}'</code></code></pre><p>4096 dimensions. 139ms from Germany to eu-north1 and back. The endpoint was working.</p><h2>The bridge</h2><p>Claude Desktop speaks MCP over <code>stdio</code>, which is a local process protocol. The Nebius Endpoint speaks <code>HTTP</code>. They cannot talk to each other directly, so I wrote a bridge script that runs locally on my Mac. It speaks stdio to Claude and <code>HTTP</code> to the endpoint. One small script that translates between the two.</p><p>This is the same kind of separation I keep seeing in <a href="https://github.com/project-hami/hami">HAMi</a> work. One layer decides, another layer applies. Here one layer translates, another layer executes.</p><p>My Mac had Python 3.9. The MCP library needs 3.10 or above, so I installed 3.11 first.</p><pre><code><code>brew install python@3.11
python3.11 -m venv ~/mcp-bridge-env
source ~/mcp-bridge-env/bin/activate
pip install mcp requests</code></code></pre><p>Then the bridge script at <code>~/mcp-bridge-env/bridge.py</code>:</p><pre><code><code>import asyncio
import requests
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp import types

NEBIUS_ENDPOINT = "http://89.169.111.36:8000"

app = Server("nebius-mcp-bridge")

@app.list_tools()
async def list_tools() -&gt; list[types.Tool]:
    return [
        types.Tool(
            name="embed_text",
            description="Generate embeddings via Nebius Serverless Endpoint (Token Factory)",
            inputSchema={
                "type": "object",
                "properties": {
                    "text": {"type": "string", "description": "Text to embed"}
                },
                "required": ["text"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -&gt; list[types.TextContent]:
    if name == "embed_text":
        response = requests.post(
            f"{NEBIUS_ENDPOINT}/call",
            json={"tool": "embed_text", "parameters": arguments}
        )
        result = response.json()
        return [types.TextContent(type="text", text=str(result))]
    raise ValueError(f"Unknown tool: {name}")

async def main():
    async with stdio_server() as (read, write):
        await app.run(read, write, app.create_initialization_options())

if __name__ == "__main__":
    asyncio.run(main())</code></code></pre><p>To tell Claude Desktop about this script I edited <code>~/Library/Application Support/Claude/claude_desktop_config.json</code> and added the <code>mcpServers</code> section. The preferences that were already in the file stayed untouched.</p><pre><code><code>{
  "mcpServers": {
    "nebius-serverless": {
      "command": "/Users/mesutoezdil/mcp-bridge-env/bin/python3",
      "args": ["/Users/mesutoezdil/mcp-bridge-env/bridge.py"]
    }
  },
  "preferences": {
    "coworkScheduledTasksEnabled": true,
    "ccdScheduledTasksEnabled": true
  }
}</code></code></pre><p>Then I restarted Claude Desktop so it would pick up the new config.</p><pre><code><code>pkill -a Claude &amp;&amp; sleep 2 &amp;&amp; open -a Claude</code></code></pre><p>I checked the log file to confirm the server had loaded correctly.</p><pre><code><code>cat ~/Library/Logs/Claude/mcp-server-nebius-serverless.log</code></code></pre><p>The log showed the bridge initializing, completing a handshake with Claude, and responding to a tools/list request with the <code>embed_text tool</code>. Connected, no errors.</p><h2>Claude calls the tool</h2><p>I opened a new chat in Claude Desktop and typed one message.</p><p>Use the <code>embed_text</code> tool to embed this sentence: Nebius serverless MCP is working</p><p>Claude called the tool. No prompting, no extra config.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eGjV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eGjV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png 424w, https://substackcdn.com/image/fetch/$s_!eGjV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png 848w, https://substackcdn.com/image/fetch/$s_!eGjV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png 1272w, https://substackcdn.com/image/fetch/$s_!eGjV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eGjV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png" width="1456" height="1121" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1121,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:338841,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/197141706?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eGjV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png 424w, https://substackcdn.com/image/fetch/$s_!eGjV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png 848w, https://substackcdn.com/image/fetch/$s_!eGjV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png 1272w, https://substackcdn.com/image/fetch/$s_!eGjV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27f581fc-76ec-4edc-8a56-3949b980a7d6_2458x1892.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Claude Desktop chat showing embed_text tool call result, 4096 dimensions, first 5 values, 564ms latency</figcaption></figure></div><p>4096 dimensions. 564ms end to end.</p><p>That 564ms covered the MCP protocol on my Mac, the bridge script, the network from Germany to eu-north1, the container receiving the request, the Token Factory call, and everything back. For a tool running on a server I never configured, on a platform that handled all the infrastructure, that is a number I am comfortable with.</p><h2>What the numbers mean</h2><p>The Token Factory call from inside the Nebius VM was 85ms. From my laptop to the endpoint and back was 139ms. The full Claude Desktop round trip was 564ms. The difference between 139ms and 564ms is the MCP protocol and bridge overhead. The Nebius infrastructure itself is fast.</p><p>The endpoint costs 0.14 per hour. You stop it from the console when you do not need it. There is no state to worry about, no teardown procedure.</p><h2>What this actually changes</h2><p>In my HAMi work I keep coming back to the same point. The problem in GPU scheduling is not the scheduler itself. It is how the GPU is modeled in the first place. You cannot fix a modeling problem by tuning a scheduler. The model has to change.</p><p>The same idea applies here. Most MCP tool work focuses on what the tool does. But underneath that question is another one that rarely gets asked: where does the tool live, and who keeps it alive? If that answer is always a server you manage, it puts a ceiling on what you can build with MCP. Every tool becomes infrastructure debt.</p><p>Running the server on Nebius Endpoints lifts that ceiling. I still write the tool. I still decide what it does. I just do not manage the machine anymore.</p><p>The one thing I would add before using this in production is authentication. The endpoint URL is currently public. Nebius Endpoints has a token authentication toggle in the settings. One click, then one extra header in the bridge script.</p><p>The code from this article is at <code>github.com/mesutoezdil/mcp-serverless</code>. Server, Dockerfile, bridge script, all there. If you hit the Container Registry authentication wall, use Docker Hub instead.</p>]]></content:encoded></item><item><title><![CDATA[I taught a small LLM to read my HAMi GPU cluster]]></title><description><![CDATA[HAMi already had the data. It just needed a translator.]]></description><link>https://mesutoezdil.substack.com/p/i-taught-a-small-llm-to-read-my-hami</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/i-taught-a-small-llm-to-read-my-hami</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Mon, 04 May 2026 10:12:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!r4l8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>One K8s node and one <a href="https://www.nvidia.com/en-us/data-center/l40s/">NVIDIA L40S</a>. <a href="https://project-hami.io/">HAMi</a> in the middle. A 3B parameter model on the other end. I wanted to ask a local LLM &#8220;what is going on with my <a href="https://github.com/mesutoezdil/Systematic-CUDA-Learning">GPU</a> right now&#8221; and get a sensible answer back. No PromQL and no SSH.</p><p>The missing bit was a translator. HAMi already publishes everything I need over <a href="https://prometheus.io/">Prometheus</a> on port 31993. An LLM does not speak Prometheus. So I wrote a small server in <a href="https://go.dev/">Go</a> that sits in between and speaks the <a href="https://modelcontextprotocol.io/docs/getting-started/intro">Model Context Protocol</a>. The full source, with build and run instructions, is in my <a href="http://github.com/mesutoezdil/hami-mcp">repo</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://github.com/mesutoezdil/hami-mcp&quot;,&quot;text&quot;:&quot;My GitHub&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://github.com/mesutoezdil/hami-mcp"><span>My GitHub</span></a></p><p>This article walks through what I built, what bit me on the way, and how to reproduce it on your own cluster.</p><h2>Why I bother</h2><p>It is curiosity, mostly, plus an old habit. I keep finding pairs of projects that do not know about each other yet, and I want to see what happens when they meet. <a href="http://github.com/Project-HAMi/HAMi">HAMi</a> was already on the cluster I was using. <a href="http://modelcontextprotocol.io">MCP</a> was on every other tab in my browser this month. Adapting the one to the other made for a better afternoon than reading another tutorial on either.</p><p>The work is mostly trial and error. The bugs along the way are not a tax I paid, they are half the reason I write any of this down. <br></p><div class="preformatted-block" data-component-name="PreformattedTextBlockToDOM"><label class="hide-text" contenteditable="false">Text within this block will maintain its original spacing when published</label><pre class="text"><em>Before diving into the details, I also reproduced the same setup on Nebius GPU instances to validate how this behaves in a cloud environment. The goal was to compare my on-prem NVIDIA L40S setup with an on-demand GPU instance and verify that provisioning, NVIDIA runtime configuration, and GPU observability behave consistently without additional adjustments. The instance was ready within minutes with drivers and runtime support available out of the box, allowing me to follow the exact same steps and confirm consistent behavior across environments.</em></pre></div><h2>What we are building, in one picture</h2><p>Two clients touch HAMi in this post. One is a Go program that runs the chain end to end from the command line. The other is MCP Inspector, an official browser based debugger for MCP servers. They share the same <code>hami-mcp-server</code> binary in the middle and they read from the same <code>:31993/metrics</code> endpoint that HAMi already exposes. The shape looks like this.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r4l8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r4l8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png 424w, https://substackcdn.com/image/fetch/$s_!r4l8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png 848w, https://substackcdn.com/image/fetch/$s_!r4l8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png 1272w, https://substackcdn.com/image/fetch/$s_!r4l8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r4l8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png" width="1456" height="698" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:698,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:128814,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/196105875?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r4l8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png 424w, https://substackcdn.com/image/fetch/$s_!r4l8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png 848w, https://substackcdn.com/image/fetch/$s_!r4l8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png 1272w, https://substackcdn.com/image/fetch/$s_!r4l8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59aa8d9e-9fad-4495-8ae8-e2c69ee53a31_1465x702.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mesutoezdil.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://mesutoezdil.substack.com/subscribe?"><span>Subscribe now</span></a></p><p>The Go binary in the middle is the same in both paths. The thing that changes is who launches it. In the command line path it is our own test client. In the browser path it is MCP Inspector. The protocol it speaks (<em>stdio JSON-RPC</em>) does not change either way.</p><p>Why two paths. The command line one is the proof that an LLM can read HAMi without anyone wiring it by hand. The browser one is the proof that the server is a generic MCP server, useful to tools we did not write.</p><h2>What was on the box</h2><p>I started by checking the GPU and the HAMi pods. The first command asks the NVIDIA driver for free and total memory on every GPU.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;f1d2362d-45a0-4184-90d2-4cfeb91d64d0&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ nvidia-smi --query-gpu=name,memory.free,memory.total --format=csv
name, memory.free [MiB], memory.total [MiB]
NVIDIA L40S, 45458 MiB, 46068 MiB</code></pre></div><p>One L40S, 46 GiB total, 45 GiB free. Nothing was using the card.</p><p>The second command lists k8s pods in the system namespace and filters to the ones HAMi runs.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;52f1acdd-b44e-4f24-a1a2-975d2a3f3724&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ kgp -n kube-system | grep hami

hami-device-plugin-5n2mv           2/2     Running   0   15m
hami-scheduler-7bbbfc4f7f-d6n29    2/2     Running   2   40h</code></pre></div><p>Two pods, both up. The device plugin is the one that exposes the metrics endpoint we are about to scrape.</p><h2>The first look at HAMi metrics</h2><p>This command pulls the raw Prometheus text from the device plugin and prints the first ten lines.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;b8f7acc3-561e-4bfb-af76-4c158c12a379&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ curl -s http://localhost:31993/metrics | head -10
# HELP GPUDeviceCoreAllocated Device core allocated for a certain GPU
# TYPE GPUDeviceCoreAllocated gauge
GPUDeviceCoreAllocated{deviceidx=&#8221;0&#8221;,deviceuuid=&#8221;GPU-4cdfc14a...&#8221;,nodeid=&#8221;nebius-tarantula&#8221;,zone=&#8221;vGPU&#8221;} 0
# HELP GPUDeviceCoreLimit Device memory core limit for a certain GPU
# TYPE GPUDeviceCoreLimit gauge
GPUDeviceCoreLimit{deviceidx=&#8221;0&#8221;,deviceuuid=&#8221;GPU-4cdfc14a...&#8221;,nodeid=&#8221;nebius-tarantula&#8221;,zone=&#8221;vGPU&#8221;} 100</code></pre></div><p>Every line is a metric. The names are reasonable once you read them twice. There are about eight metric families on a quiet cluster. The labels are where things get sneaky, but I will come back to that.</p><p>If your cluster is not local, replace <code>localhost:31993</code> with the right host or run kubectl port-forward against the device plugin pod.</p><h2>The four &#8220;tools&#8221;</h2><p>I wanted four tool calls.</p><p>(i) A cluster summary. How many nodes, how many GPUs, how much vGPU memory is checked out, how much is free, who is using the most.</p><p>(ii) Per device numbers. The same data broken down by GPU, with an optional node filter.</p><p>(iii) Per pod attribution. Which pod is sitting on which GPU and how much of it.</p><p>(vi) A free form metric query. Give me one metric with a label filter.</p><p>Each tool is just a Go function. It scrapes the HAMi endpoint, parses the Prometheus text, returns JSON. The MCP library on top handles the JSON-RPC framing over stdio. That is the whole architecture.</p><p>The full implementation is in <code>main.go</code> in the repo. Around 500 lines, four handlers, no caching.</p><h2>Two bugs that bit me</h2><p>I did not get this right on the first try. Two things are worth flagging because they will likely bite anyone else who tries.</p><h3>The Prometheus parser panicked on a clean construction</h3><p>I wrote the obvious code. Declare a <code>TextParser</code>, hand it the response body, get back metric families. On the first call, the program panicked with: <code>Invalid name validation scheme requested: unset</code>.</p><p>The cause is in <code>prometheus/common</code> v0.67. The parser keeps a private validation scheme that defaults to a useless zero value. There is a global called <code>model.NameValidationScheme</code> that looks like it would help. It does not, the parser does not read it.</p><p>The fix is a one liner. Build the parser through <code>expfmt.NewTextParser(model.UTF8Validation)</code> instead of zero initialising it. The repo has the working version in main.go near the top of <code>scrape()</code> in <code>main.go</code>.</p><h3>Two label families that look the same but are not</h3><p>This one was sneakier. The first version of my cluster summary keyed devices on <code>(nodeid, deviceuuid)</code>. Made sense. Then I deployed a pod that asked HAMi for a vGPU, scraped again, and the device count doubled.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;1ce90c1b-fa0d-4012-a19b-406ec1e48f8c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">"device_count": 2,
"devices": [
  {"node": "nebius-tarantula", "device_uuid": "GPU-4cdfc14a...", "memory_allocated_mb": 8192},
  {"node": "",                 "device_uuid": "GPU-4cdfc14a...", "memory_allocated_mb": 0}
]</code></pre></div><p>The phantom device has an empty node. That was the clue. When I diffed the raw metrics from before and after the pod landed, two new series had appeared.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c7c14807-5ed5-40ac-abdd-c1b72bed8840&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">vGPUCoreAllocated{containeridx="0",deviceuuid="...",nodename="nebius-tarantula",podname="hami-demo-workload",...} 30
vGPUMemoryAllocated{containeridx="0",deviceuuid="...",nodename="nebius-tarantula",podname="hami-demo-workload",...} 8.589934592e+09</code></pre></div><p>Look at the label keys. The device level metrics use <code>nodeid</code>. The new per pod metrics use <code>nodename</code>. Same logical thing, different key. My code read <code>s.Labels["nodeid"] </code>against a per pod sample, got an empty string, and the empty string went into the dedup tuple as a new device.</p><p>The fix is small. Whitelist the metric names that the cluster summary cares about, and let the per pod metrics flow through to a different tool. The whitelist and the rest of the fix are in <code>handleGetClusterSummary</code> in <code>main.go</code>.</p><p>The lesson I will keep. When a Prometheus exporter speaks two label dialects, never write a single dedup key that spans both.</p><p>A bonus, the per pod series was also the first place I realised HAMi already publishes pod attribution. The pod label is what an operator actually wants when something is melting on a shared GPU.</p><h2>A workload to make the metrics interesting</h2><p>A clean cluster reports zero everywhere. To see the tools do real work, you need a pod that holds a HAMi reservation. The repo has a manifest at <code>k8s-vgpu-workload.yaml </code>that requests one GPU, 8 GiB of memory, and 30 percent of the cores, then sleeps for half an hour.</p><p>The first command tells Kubernetes to create the pod from that manifest. The second command checks that the pod is actually running.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;51d30390-2a34-49e3-8cc2-12259407cd9b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ k apply -f k8s-vgpu-workload.yaml
pod/hami-demo-workload created

$ kgp hami-demo-workload
NAME                 READY   STATUS    RESTARTS   AGE
hami-demo-workload   1/1     Running   0          8s</code></pre></div><p>Now check what HAMi recorded on the pod. The next command prints the pod&#8217;s full spec and filters to the lines starting with the HAMi annotation prefix.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;6c3a173a-220a-464f-92b6-c94cb7a4c14e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ k describe pod hami-demo-workload | grep hami.io
hami.io/bind-phase:              success
hami.io/vgpu-devices-allocated:  GPU-4cdfc14a-c633-e61e-9235-56118c547d80,NVIDIA,8192,30:;
hami.io/vgpu-node:               nebius-tarantula</code></pre></div><p>The middle line is the one with the most information in it. Reading the comma separated fields, HAMi allocated GPU UUID <code>GPU-4cdfc14a...</code> of type <code>NVIDIA</code>, with <code>8192</code> MiB of memory and <code>30</code> percent of cores. The annotation is how HAMi keeps the device plugin and the scheduler in sync.</p><p>When you are done with the pod, remove it with <code>k delete -f k8s-vgpu-workload.yaml</code>.</p><h2>Hand testing one tool</h2><p>The repo includes a small shell helper at <code>test_stdio.sh</code>. It sends the MCP initialise handshake, the <code>notifications/initialized</code> follow up, and one <code>tools/call</code>, then prints the JSON-RPC responses on stdout. </p><p>The next command runs the helper for <code>get_cluster_summary</code>, takes the last response line, and uses <code>jq</code> to pull out the human readable text payload.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;1fb5698c-70fa-47f8-ae67-de253e8f7874&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ ./test_stdio.sh ./hami-mcp-server get_cluster_summary '{}' \
    | tail -1 \
    | jq -r '.result.content[0].text'

{
  "core_utilization_pct": 30,
  "device_count": 1,
  "devices": [
    {
      "node": "nebius-tarantula",
      "device_uuid": "GPU-4cdfc14a-c633-e61e-9235-8888888888",
      "device_type": "NVIDIA L40S",
      "memory_allocated_mb": 8192,
      "memory_limit_mb": 46068,
      "memory_percent": 0.17782408613354173,
      "core_allocated": 30,
      "core_limit": 100,
      "shared_containers": 1
    }
  ],
  "hami_build_date": "20260417-04:22:41",
  "hami_version": "v2.8.1",
  "memory_utilization_pct": 17.78240861335417,
  "node_count": 1,
  "total_memory_allocated_mb": 8192,
  "total_memory_free_mb": 37876,
  "total_memory_limit_mb": 46068,
  "total_shared_containers": 1
}</code></pre></div><p>Numbers match the demo pod request. 8 GiB out of 46, 30 percent of the cores, one shared container. HAMi version is right there in the JSON, which is handy when you need to file a bug report.</p><h2>Picking a local LLM</h2><p>I started with vLLM and NVIDIA&#8217;s <a href="https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1">Nemotron Nano 8B</a>. Two problems showed up before I got the image on disk. First, <code>vllm/vllm-openai</code> is around 10 GiB compressed, and decompresses to about twice that. The host had 17 GiB free. Second, even if it had fit, the L40S is now sliced by HAMi, and a second inference engine running outside HAMi accounting needs <code>--gpu-memory-utilization</code> trimmed to leave room for the demo pod&#8217;s reservation.</p><p>I switched to Ollama with <code>llama3.2:3b</code> quantised to Q4. Image around 6 GiB, model around 2 GiB. Ollama exposes an OpenAI compatible chat completions endpoint, so the test client does not care which engine is on the other side.</p><p>The next command starts Ollama as a Docker container with full GPU access. We bind it to 127.0.0.1:11434 on purpose. Ollama has no authentication, so a public bind would let any internet host call your local model. The e2e client only needs to reach Ollama from the same host, so localhost is fine.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;446a6fa0-3a26-4be4-8d49-58d68719f492&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ docker run -d --name ollama --gpus all --network host -e OLLAMA_HOST=127.0.0.1:11434 -v ollama-models:/root/.ollama ollama/ollama:latest</code></pre></div><p>The hex string is the new container ID. Docker prints it because we ran with <code>-d</code> (detached). The next command tells the running container to pull the model weights.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;8f0a9c5b-161d-41bf-a9a4-d650987ee681&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ docker exec ollama ollama pull llama3.2:3b
pulling dde5aa3fc5ff: 100%  2.0 GB
...
success</code></pre></div><p>The 2.0 GB layer is the model weights. The smaller blobs are the tokenizer, the chat template, and so on. After success, Ollama has the model on disk and is ready to serve it.</p><p>The last command asks the OpenAI compatible endpoint which models it can serve, to confirm the pull worked.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;a3263c5f-9548-4bfc-9271-487679ed9c82&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ curl -s http://localhost:11434/v1/models
{"object":"list","data":[{"id":"llama3.2:3b","object":"model",...}]}</code></pre></div><p>The Ollama logs confirm it sees the L40S directly through the NVIDIA container runtime, with the full 45 GiB available. That is because Docker bypassed HAMi entirely. HAMi only accounts for what the Kubernetes scheduler hands out. The demo pod that requested 8 GiB shows up in HAMi metrics. The Ollama container does not. That is correct behaviour, but it confuses people the first time.</p><h2>The end to end run</h2><p>The repo has a small Go program at <code>cmd/e2e/main.go</code> that launches <code>hami-mcp-server</code> as a subprocess, performs the MCP handshake, calls two tools, builds a prompt out of their JSON, and asks the LLM to summarise. Run it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;0ed0e227-abec-461e-a46d-e10a8a3965be&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ ./e2e --server ./hami-mcp-server --llm-url http://localhost:11434 --model llama3.2:3b

==&gt; Starting hami-mcp-server
==&gt; tools/call get_cluster_summary
{
  "core_utilization_pct": 30,
  "device_count": 1,
  "devices": [...],
  "memory_utilization_pct": 17.78240861335417,
  "total_memory_allocated_mb": 8192,
  "total_memory_free_mb": 37876,
  "total_memory_limit_mb": 46068
}

==&gt; tools/call get_vgpu_allocation
{
  "allocations": [
    {
      "namespace": "default",
      "pod_name": "hami-demo-workload",
      "node": "nebius-tarantula",
      "device_uuid": "GPU-4cdfc14a-c633-e61e-9235-7878787878787878",
      "container_index": "0",
      "memory_allocated_mb": 8192,
      "core_allocated": 30
    }
  ],
  "match_count": 1
}

==&gt; Asking llama3.2:3b at http://localhost:11434

==&gt; Model answer
The HAMi cluster has a utilization rate of 30% for its single NVIDIA L40S
device, with 30 cores allocated out of a limit of 100. The total memory
allocation is relatively low at 8192 MB, leaving 37876 MB free. There are
no concerns based on the provided metrics, as the utilization rates and
memory allocations appear to be within expected ranges for this type of
GPU. However, it's worth noting that the cluster has only one device,
which may limit its scalability. Overall, the cluster appears to be
running within a healthy range.</code></pre></div><p>The numbers in the answer match the numbers in the JSON. The scalability comment is fair, a real on call engineer would say the same thing.</p><p>I did not write a special prompt to coax this out. The JSON had clear field names, and a 3B Q4 model was enough to write something readable on top of it.</p><h2>The same plumbing, this time in a browser</h2><p>The end to end Go client is a good smoke test, but it bakes in one LLM and one prompt. For day to day debugging you want to poke at the tools by hand and see the raw JSON. MCP Inspector is the official browser based MCP debugger that lets you do exactly that. It is a small Node.js app. You point it at the binary, it opens a browser tab, every tool gets a button you can click.</p><p>The next command starts Inspector and points it at our binary. <code>HOST=0.0.0.0</code> makes it listen on every network interface, which is what lets a browser on a different machine reach it. <code>ALLOWED_ORIGINS</code> is its <code>CORS</code> allow list, the URL the browser loads from. <code>MCP_AUTO_OPEN_ENABLED=false</code> skips the local browser auto open, since this VM has no graphical session. <code>npx -y</code> downloads the package and runs it without asking for confirmation.</p><p>A note before you run this. Binding Inspector to <code>0.0.0.0</code> puts a debug tool on the public internet. Token authentication is on by default and it helps, but a real deployment should sit behind a VPN or an SSH tunnel. The form below is fine for a one off demo on a throwaway VM.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;cf868c3f-222f-48da-bf8a-e08132fc780b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">$ HOST=0.0.0.0 \
  ALLOWED_ORIGINS="http://YOUR_VM_IP:6274" \
  MCP_AUTO_OPEN_ENABLED=false \
  npx -y @modelcontextprotocol/inspector ./hami-mcp-server

Starting MCP inspector...
&#10003; Proxy server listening on 0.0.0.0:6277
&#10003; Session token: &lt;64 hex chars, regenerated on every restart&gt;
&#10003; MCP Inspector is up and running at:
   http://0.0.0.0:6274/?MCP_PROXY_AUTH_TOKEN=&lt;token&gt;</code></pre></div><p>(You do not think I&#8217;m sharing my personal info here, right?)</p><p>Two ports, one role each. <code>6274</code> is the React UI the browser loads. <code>6277</code> is the proxy the UI then calls into. The proxy spawns <code>hami-mcp-server</code> as a subprocess and talks to it over stdio. The session token is a one shot secret printed on startup, treat it like a password and do not paste it into a public chat.</p><p>Open the printed URL in your browser, with your VM&#8217;s IP in place of <code>0.0.0.0</code>. You see a form with a Command field and an Arguments field. Paste the absolute path of the binary into Command, leave Arguments empty, click <code>Connect</code>. The right hand pane lists the four tools the server registers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X0XC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X0XC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png 424w, https://substackcdn.com/image/fetch/$s_!X0XC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png 848w, https://substackcdn.com/image/fetch/$s_!X0XC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png 1272w, https://substackcdn.com/image/fetch/$s_!X0XC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X0XC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png" width="726" height="375.46565934065933" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:753,&quot;width&quot;:1456,&quot;resizeWidth&quot;:726,&quot;bytes&quot;:495533,&quot;alt&quot;:&quot;MCP Inspector connected to hami-mcp-server. The server shows as &#8220;Connected&#8221; in the bottom left - version 0.1.0.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/196105875?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MCP Inspector connected to hami-mcp-server. The server shows as &#8220;Connected&#8221; in the bottom left - version 0.1.0." title="MCP Inspector connected to hami-mcp-server. The server shows as &#8220;Connected&#8221; in the bottom left - version 0.1.0." srcset="https://substackcdn.com/image/fetch/$s_!X0XC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png 424w, https://substackcdn.com/image/fetch/$s_!X0XC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png 848w, https://substackcdn.com/image/fetch/$s_!X0XC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png 1272w, https://substackcdn.com/image/fetch/$s_!X0XC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0422dbed-6920-495f-b144-7f9dcab3dac2_2930x1516.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>MCP Inspector connected to hami-mcp-server. The server shows as &#8220;Connected&#8221; in the bottom left - version 0.1.0.</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!edQI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!edQI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png 424w, https://substackcdn.com/image/fetch/$s_!edQI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png 848w, https://substackcdn.com/image/fetch/$s_!edQI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png 1272w, https://substackcdn.com/image/fetch/$s_!edQI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!edQI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png" width="1456" height="754" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:754,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:193628,&quot;alt&quot;:&quot;The four tools: get_cluster_summary, get_gpu_metrics, get_vgpu_allocation, run_promql - each with a description pulled directly from the Go source.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/196105875?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The four tools: get_cluster_summary, get_gpu_metrics, get_vgpu_allocation, run_promql - each with a description pulled directly from the Go source." title="The four tools: get_cluster_summary, get_gpu_metrics, get_vgpu_allocation, run_promql - each with a description pulled directly from the Go source." srcset="https://substackcdn.com/image/fetch/$s_!edQI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png 424w, https://substackcdn.com/image/fetch/$s_!edQI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png 848w, https://substackcdn.com/image/fetch/$s_!edQI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png 1272w, https://substackcdn.com/image/fetch/$s_!edQI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3bee424-09f1-4e2d-a82f-b53491a95a17_1917x993.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>The four tools: get_cluster_summary, get_gpu_metrics, get_vgpu_allocation, run_promql - each with a description pulled directly from the Go source.</em></figcaption></figure></div><p>Click on <code>get_cluster_summary</code>, leave the metadata pairs empty, click <code>Run Tool</code>. Inspector forwards a JSON-RPC <code>tools/call</code> down to the binary. The binary scrapes HAMi and returns JSON. The panel renders it. With our demo pod running, the result has <code>memory_allocated_mb: 8192</code>, <code>core_allocated: 30</code>, s<code>hared_containers: 1</code>. Run <code>k delete pod hami-demo-workload</code>, click <code>Run Tool</code> again, and every number goes back to zero.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TkYQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TkYQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png 424w, https://substackcdn.com/image/fetch/$s_!TkYQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png 848w, https://substackcdn.com/image/fetch/$s_!TkYQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png 1272w, https://substackcdn.com/image/fetch/$s_!TkYQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TkYQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png" width="1456" height="755" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:755,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:211619,&quot;alt&quot;:&quot;get_cluster_summary result with the demo pod running: 8192 MB allocated, 30 cores, 17.78% memory utilization, HAMi v2.8.1.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/196105875?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="get_cluster_summary result with the demo pod running: 8192 MB allocated, 30 cores, 17.78% memory utilization, HAMi v2.8.1." title="get_cluster_summary result with the demo pod running: 8192 MB allocated, 30 cores, 17.78% memory utilization, HAMi v2.8.1." srcset="https://substackcdn.com/image/fetch/$s_!TkYQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png 424w, https://substackcdn.com/image/fetch/$s_!TkYQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png 848w, https://substackcdn.com/image/fetch/$s_!TkYQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png 1272w, https://substackcdn.com/image/fetch/$s_!TkYQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81fd452-03e8-4123-a943-dfcb2d6a1dda_1917x994.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">get_cluster_summary result with the demo pod running: 8192 MB allocated, 30 cores, 17.78% memory utilization, HAMi v2.8.1.</figcaption></figure></div><p><code>get_vgpu_allocation</code> is the more interesting one. With the pod running it returns a single allocation entry. The entry names the namespace, the pod, the device UUID, the 8 GiB of memory, and the 30 percent of cores. Pass <code>pod_name: hami-demo-workload</code> to the call and the same row comes back. Pass a name that does not exist and you get an empty list with a note saying no matching pod has an active vGPU reservation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eCpi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eCpi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png 424w, https://substackcdn.com/image/fetch/$s_!eCpi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png 848w, https://substackcdn.com/image/fetch/$s_!eCpi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png 1272w, https://substackcdn.com/image/fetch/$s_!eCpi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eCpi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png" width="1456" height="756" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:756,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:216578,&quot;alt&quot;:&quot;get_vgpu_allocation result showing hami-demo-workload in the default namespace: GPU-4cdfc14a..., 8192 MB, 30 cores.  r&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/196105875?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="get_vgpu_allocation result showing hami-demo-workload in the default namespace: GPU-4cdfc14a..., 8192 MB, 30 cores.  r" title="get_vgpu_allocation result showing hami-demo-workload in the default namespace: GPU-4cdfc14a..., 8192 MB, 30 cores.  r" srcset="https://substackcdn.com/image/fetch/$s_!eCpi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png 424w, https://substackcdn.com/image/fetch/$s_!eCpi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png 848w, https://substackcdn.com/image/fetch/$s_!eCpi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png 1272w, https://substackcdn.com/image/fetch/$s_!eCpi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2422f3-8c38-46fc-8ad7-09b1b9646a29_1919x996.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">get_vgpu_allocation result showing hami-demo-workload in the default namespace: GPU-4cdfc14a..., 8192 MB, 30 cores.</figcaption></figure></div><p><code>run_promql</code> lets you write a metric name plus optional label matchers. Put <code>nodeGPUMemoryPercentage{nodeid="nebius-tarantula"}</code> into the query field and the response is one sample with <code>value: 0.17782...</code>, the same 17.78 percent the cluster summary reports, just with more decimal places.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cQRe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cQRe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png 424w, https://substackcdn.com/image/fetch/$s_!cQRe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png 848w, https://substackcdn.com/image/fetch/$s_!cQRe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png 1272w, https://substackcdn.com/image/fetch/$s_!cQRe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cQRe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png" width="1456" height="753" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:753,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:246095,&quot;alt&quot;:&quot;run_promql with nodeGPUMemoryPercentage{nodeid=\&quot;nebius-tarantula\&quot;}: one matched result, value 0.17782408613354173.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/196105875?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="run_promql with nodeGPUMemoryPercentage{nodeid=&quot;nebius-tarantula&quot;}: one matched result, value 0.17782408613354173." title="run_promql with nodeGPUMemoryPercentage{nodeid=&quot;nebius-tarantula&quot;}: one matched result, value 0.17782408613354173." srcset="https://substackcdn.com/image/fetch/$s_!cQRe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png 424w, https://substackcdn.com/image/fetch/$s_!cQRe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png 848w, https://substackcdn.com/image/fetch/$s_!cQRe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png 1272w, https://substackcdn.com/image/fetch/$s_!cQRe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F074d4e4d-4dbb-4265-aad4-15f4a8b4c810_1919x992.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">run_promql with nodeGPUMemoryPercentage{nodeid="nebius-tarantula"}: one matched result, value 0.17782408613354173.</figcaption></figure></div><p>What this exercise proves is small but useful. The MCP server we wrote is not tied to one client. Whatever MCP capable tool you point at this binary will see the same four tools and the same JSON shapes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!axuo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!axuo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png 424w, https://substackcdn.com/image/fetch/$s_!axuo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png 848w, https://substackcdn.com/image/fetch/$s_!axuo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png 1272w, https://substackcdn.com/image/fetch/$s_!axuo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!axuo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png" width="1456" height="755" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:755,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:231708,&quot;alt&quot;:&quot;The full Inspector view with the history panel showing all tool calls from the session.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/196105875?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The full Inspector view with the history panel showing all tool calls from the session." title="The full Inspector view with the history panel showing all tool calls from the session." srcset="https://substackcdn.com/image/fetch/$s_!axuo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png 424w, https://substackcdn.com/image/fetch/$s_!axuo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png 848w, https://substackcdn.com/image/fetch/$s_!axuo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png 1272w, https://substackcdn.com/image/fetch/$s_!axuo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F425f572e-7cc2-49a4-85ca-9957ae601079_1920x996.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The full Inspector view with the history panel showing all tool calls from the session.</figcaption></figure></div><p>I work this way on purpose. Adapting two real systems to play together is more interesting to me than building greenfield, and a real bug story is more useful than a marketing pitch. The post you just read is one of those exercises. The rest of them are at <a href="https://github.com/mesutoezdil/hami-mcp">github.com/mesutoezdil</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mesutoezdil.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://mesutoezdil.substack.com/subscribe?"><span>Subscribe now</span></a></p><h2>Closing</h2><p>Any MCP capable client can now read this cluster. The clients do not need to know how HAMi works, they just call <code>get_cluster_summary</code> or one of the other three and get back small JSON.</p><p>On a one node home lab this is more machinery than the problem deserves. The value shows up when you have several GPU nodes shared across teams and you want a uniform way to ask about them. The server itself is around 500 lines of Go. The hard parts were the two label families and the parser footgun. The rest was mechanical.</p>]]></content:encoded></item><item><title><![CDATA[A 2-Line PR to the MCP Registry: When Is Small Worth It?]]></title><description><![CDATA[I'm not sure either one will get merged. Here's why I opened them anyway.]]></description><link>https://mesutoezdil.substack.com/p/a-2-line-pr-to-the-mcp-registry-when</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/a-2-line-pr-to-the-mcp-registry-when</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Sun, 03 May 2026 14:22:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zABN!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F645d6c03-4d5b-4067-ab03-473c24075b55_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Anyone trying to open their first PR to an open source repo runs into the same problem. Doing a big feature is too risky. </p><p>You wait weeks for review and it can still get closed. Doing a tiny cleanup feels too small. The maintainer might reply &#8220;we didn&#8217;t need this&#8221; and shut it. </p><p>So where&#8217;s the right middle?</p><p>I&#8217;ll talk about this through two small refactor PRs I <a href="https://github.com/modelcontextprotocol/registry/pulls/mesutoezdil">opened</a> to <code>modelcontextprotocol/registry</code>. </p><p>Both are still in review as I write this. Whichever way they go, I wanted to write down why I opened them and where their limits are.</p><p>PR #1240: <a href="https://github.com/modelcontextprotocol/registry/pull/1240">https://github.com/modelcontextprotocol/registry/pull/1240</a> </p><p>PR #1241: <a href="https://github.com/modelcontextprotocol/registry/pull/1241">https://github.com/modelcontextprotocol/registry/pull/1241</a></p><h3>First PR: Delete One Line</h3><p>Inside <code>internal/database/postgres.go</code> there was this line in <code>ListServers</code>:</p><pre><code><code>_ = argIndex // Silence unused variable warning</code></code></pre><p>This is a classic Go pattern. You assign a value to <code>argIndex</code> somewhere and then don&#8217;t use it later. Go compiler complains &#8220;unused variable&#8221;. </p><p>To silence the linter you write <code>_ = argIndex</code> and move on.</p><p>The thing is, <code>argIndex</code> is already used above (for cursor pagination). It only looks &#8220;unused&#8221; because the last increment doesn&#8217;t go anywhere. </p><p>So the warning suppression isn&#8217;t needed. The variable does get used. Just the final assignment is dead.</p><p>The fix is one line:</p><pre><code><code>        whereConditions = append(whereConditions, cursorCondition)
        args = append(args, cursorArgs...)
    }
-   _ = argIndex // Silence unused variable warning

    // Build the WHERE clause
    whereClause := ""</code></code></pre><p>The diff:</p><pre><code><code>$ gh pr diff 1240 --repo modelcontextprotocol/registry

 1 file changed, 1 deletion(-)</code></code></pre><p>What&#8217;s the risk here? The maintainer might say &#8220;we wanted to keep that comment around for context&#8221; and close it. Or maybe they&#8217;re planning to use <code>argIndex</code> for a future flow I don&#8217;t know about. I see it as dead in the current code but I don&#8217;t know the history.</p><h3>Second PR: Pull a Repeated Pattern Into a Helper</h3><p>In the auth handlers I noticed the same pattern in three different files:</p><pre><code><code>if resp.StatusCode != http.StatusOK {
    body, _ := io.ReadAll(resp.Body)
    return nil, fmt.Errorf("GitHub API error (status %d): %s", resp.StatusCode, body)
}</code></code></pre><p>Same exact thing in three places. Twice in <code>github_at.go</code> (<code>getGitHubUser</code>, <code>getGitHubUserOrgs</code>). Once in <code>github_oidc.go</code> (<code>fetchJWKS</code>). </p><p>All three read the response body, swallow the error, and stuff the body into the error message.</p><p>I added a helper to <code>common.go</code>:</p><pre><code><code>func readBody(r io.Reader) string {
    b, _ := io.ReadAll(r)
    return string(b)
}</code></code></pre><p>Then I cut each call site down to one line:</p><pre><code><code> if resp.StatusCode != http.StatusOK {
-    body, _ := io.ReadAll(resp.Body)
-    return nil, fmt.Errorf("GitHub API error (status %d): %s", resp.StatusCode, body)
+    return nil, fmt.Errorf("GitHub API error (status %d): %s", resp.StatusCode, readBody(resp.Body))
 }</code></code></pre><p>I also removed the <code>"io"</code> import from all three files. The package owns it through <code>common.go</code> now. Diff:</p><pre><code><code>$ gh pr diff 1241 --repo modelcontextprotocol/registry

 4 files changed, 8 insertions(+), 9 deletions(-)</code></code></pre><h3>This Helper Has a Known Weakness</h3><p><code>readBody</code> swallows the error. <code>b, _ := io.ReadAll(r)</code>. If the body can&#8217;t be read (broken connection, malformed response), <code>b</code> ends up empty and the error gets lost.</p><p>I left it that way on purpose because the original code did the same thing. </p><p>The PR is a refactor, not a behavior change. But the &#8220;right&#8221; version would return <code>(string, error)</code> and make every caller deal with it:</p><pre><code><code>func readBody(r io.Reader) (string, error) {
    b, err := io.ReadAll(r)
    if err != nil {
        return "", err
    }
    return string(b), nil
}</code></code></pre><p>I didn&#8217;t do that because the scope would balloon. You&#8217;d have to add error handling at three call sites. That stops being a small cleanup. </p><p>If a maintainer asks for it in review I&#8217;ll do it as a follow-up PR. </p><p>For now the PR keeps the existing behavior and just trims the duplicate lines.</p><p>This is a trade-off you make on every PR like this. Keep it small and you have a better chance of merge but it&#8217;s not &#8220;fully right&#8221;. </p><p>Make it bigger and it&#8217;s right but a maintainer might call it scope creep and close it. You weigh this for each one.</p><h3>What Does This Look Like to a Maintainer?</h3><p>I learned these the hard way over a few merged and a few rejected PRs. 2 ways it can go.</p><p>The good way. &#8220;Clean cleanup, code reads a bit better, merging.&#8221; Single-line fixes like #1240 and dedup helpers like #1241 usually get through. Especially if the change really doesn&#8217;t break behavior and there&#8217;s no big counter-argument.</p><p>The bad way. &#8220;Why does this PR exist? Did we need a review cycle for one line?&#8221; Some maintainers see PRs this small as noise. If the repo isn&#8217;t very active or the maintainer has no bandwidth, small PRs sit in a queue and rot.</p><p>I don&#8217;t know which way these will go. As I write this both are in &#8220;Review required&#8221;. If they get closed, that&#8217;s also useful. If they get merged, the repo is a tiny bit cleaner.</p><h3>When Does This Kind of PR Make Sense?</h3><p>A few conditions.</p><p>If you&#8217;re learning the repo. A small PR forces you to read the codebase. What patterns are here, what&#8217;s the lint config, how are tests written, what&#8217;s the naming style. That work has value even if the PR doesn&#8217;t merge.</p><p>If you have a bigger PR coming. A small PR introduces you to the maintainer. A week later when a real feature PR shows up, &#8220;oh this person contributed before&#8221; is a useful memory for them.</p><p>If there&#8217;s actually a meaningful cleanup. &#8220;There&#8217;s a typo here&#8221; is meaningful. &#8220;There&#8217;s an unused line&#8221; is borderline. &#8220;Let&#8217;s reformat every line in the file&#8221; is not.</p><p>If the repo is active. If they merge PRs every week, your small PR has a fair shot. If it&#8217;s a half-dead repo, you&#8217;ll wait a long time and probably waste it.</p><p><code>modelcontextprotocol/registry</code> is active. There are weekly merges. The duplicate auth pattern was really sitting there. When I opened the PRs they felt in tune with the way the repo moves. Even so, the result is up to review.</p><p>Both PRs are still open. Whichever way they go, I&#8217;ll update this post.</p>]]></content:encoded></item><item><title><![CDATA[How MCP Really Works in Prod: kagent, 124 Tools, 2 Sessions, 1 HTTP Header]]></title><description><![CDATA[I deployed an MCP server on k8s and traced every wire protocol call to understand what actually happens between agents and tools.]]></description><link>https://mesutoezdil.substack.com/p/how-mcp-really-works-in-prod-kagent</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/how-mcp-really-works-in-prod-kagent</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Mon, 27 Apr 2026 11:13:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Gid5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I spent a day deploying <a href="https://kagent.dev/">kagent</a> on k8s to understand how <a href="https://modelcontextprotocol.io/docs/getting-started/intro">MCP</a> really works. The real version, with wire protocol traces, session management, and agent-to-agent calls.</p><p>Most MCP tutorials stop at the config file. I wanted to see the HTTP requests, the JSON-RPC calls, the session lifecycle. So I dug in.</p><p>MCP is everywhere now. Anthropic <a href="https://www.anthropic.com/news/model-context-protocol">launched</a> it, everyone is building servers. But when you read the docs, you get quickstart guides. Install this, configure that, done.</p><p>What none of them tell you: how does the session actually start? Where does the session ID come from? What does a real tool call look like on the wire?</p><p>I needed to know this because I work with k8s. If I am going to run MCP servers in prod, I need to understand what is actually happening. Not just it works on my laptop.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://mesutoezdil.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">This Substack is reader-supported. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Setup</h2><p>I used <a href="https://nebius.com/?utm_term=nebius%20ai%20cloud&amp;utm_campaign=FY26_DM_NB_PSE_PUR_GG_EMEA_brand&amp;utm_source=google&amp;utm_medium=cpc&amp;hsa_acc=3900112445&amp;hsa_cam=21863834140&amp;hsa_grp=190401967102&amp;hsa_ad=797256119432&amp;hsa_src=g&amp;hsa_tgt=kwd-2322087582896&amp;hsa_kw=nebius%20ai%20cloud&amp;hsa_mt=p&amp;hsa_net=adwords&amp;hsa_ver=3&amp;gad_source=1&amp;gad_campaignid=21863834140&amp;gbraid=0AAAAA-WCWHXqqwuYdc_UDPUsa5LfedKX0&amp;gclid=CjwKCAjwzLHPBhBTEiwABaLsSlEFcGx-lQFDNp3ESx9iGuK5Xir7Gek7uRdEMLJXyrBbhLHTgFo63RoC7C8QAvD_BwE">Nebius AI Cloud</a>. They have GPU instances (L40S with 46GB VRAM). I did not actually use the GPU for this test. kagent runs fine on CPU. But in prod, if your MCP tools need to run image generation or video analysis, you will need GPU scheduling.</p><p>For multi-tenant GPU sharing, I contribute to Project HAMi (a CNCF sandbox project). It splits GPUs across pods. Different topic, but worth mentioning if you are thinking about AI workloads on k8s. The VM: 8 vCPUs, 32GB RAM, Ubuntu 24.04. Fresh install.</p><h3>Installing k3s</h3><p><a href="https://k3s.io/">k3s</a> is lightweight k8s. One command, 15 seconds:</p><pre><code><code>curl -sfL https://get.k3s.io | sh -</code></code></pre><p>From my terminal:</p><pre><code><code>[INFO]  systemd: Starting k3s
real    0m15.794s</code></code></pre><p>I picked k3s because a 5-node cluster is overkill here. It gives me a working k8s API in under 20 seconds.</p><h3>Installing kagent</h3><p>kagent is a k8s-native agent framework that includes MCP servers. Version 0.9.0 just came out. First, the CRDs:</p><pre><code><code>time helm install kagent-crds \
  oci://ghcr.io/kagent-dev/kagent/helm/kagent-crds \
  --namespace kagent \
  --create-namespace</code></code></pre><p>From my terminal:</p><pre><code><code>Pulled: ghcr.io/kagent-dev/kagent/helm/kagent-crds:0.9.0
NAME: kagent-crds
LAST DEPLOYED: Sat Apr 25 15:53:23 2026
NAMESPACE: kagent
STATUS: deployed
REVISION: 1
real    0m5.476s</code></code></pre><p>Check what got installed:</p><pre><code><code>k get crds | grep kagent.dev</code></code></pre><p>Output:</p><pre><code><code>agents.kagent.dev                              2026-04-25T15:53:24Z
mcpservers.kagent.dev                          2026-04-25T15:53:24Z
memories.kagent.dev                            2026-04-25T15:53:24Z
modelconfigs.kagent.dev                        2026-04-25T15:53:24Z
modelproviderconfigs.kagent.dev                2026-04-25T15:53:24Z
remotemcpservers.kagent.dev                    2026-04-25T15:53:24Z
sandboxagents.kagent.dev                       2026-04-25T15:53:24Z
toolservers.kagent.dev                         2026-04-25T15:53:24Z</code></code></pre><p>Eight CRDs. One new CRD in 0.9.0 is <code>sandboxagents.kagent.dev</code>. I have not dug into what that does yet, but it is new since the last version.</p><p>Then the main kagent install:</p><pre><code><code>export NEBIUS_API_KEY="your-api-key-here"

time helm install kagent \
  oci://ghcr.io/kagent-dev/kagent/helm/kagent \
  --namespace kagent \
  --set providers.default=openAI \
  --set providers.openAI.apiKey=$NEBIUS_API_KEY \
  --set providers.openAI.endpoint=https://api.studio.nebius.ai/v1 \
  --set providers.openAI.model=meta-llama/Llama-3.3-70B-Instruct \
  --wait --timeout=10m</code></code></pre><p>I am using Nebius Token Factory (OpenAI-compatible API) with Llama 3.3 70B. Why not OpenAI directly? Because I wanted to test with an open model, and Nebius pricing is good ($2.56 for this entire test session).</p><p>The install timed out at 10 minutes:</p><pre><code><code>Error: INSTALLATION FAILED: context deadline exceeded
real    10m2.775s
</code></code></pre><p>This is not actually a failure. Helm&#8217;s <code>--wait</code> flag expects all pods to be Ready. One pod (kmcp-controller-manager) was still pulling its image. After another minute, everything was running.</p><p>Check the pods:</p><pre><code><code>kgp -n kagent</code></code></pre><p>Output (17 pods total):</p><pre><code><code>NAME                                              READY   STATUS    RESTARTS   AGE
argo-rollouts-conversion-agent-985c94668-4zm2t    1/1     Running   0          13m
cilium-debug-agent-5bdfbbf8b5-8qw5z               1/1     Running   0          13m
cilium-manager-agent-7c85d7b7df-9qrlh             1/1     Running   0          13m
cilium-policy-agent-5dd67dd9fc-skf4m              1/1     Running   0          13m
helm-agent-589bf45db8-j48tr                       1/1     Running   0          13m
istio-agent-6b44cd4fd-vcd9r                       1/1     Running   0          13m
k8s-agent-895864cb6-756sb                         1/1     Running   0          13m
kagent-controller-87fb569dc-wv48f                 1/1     Running   4          26m
kagent-grafana-mcp-dc4d9d79d-4f8tx                1/1     Running   0          26m
kagent-kmcp-controller-manager-777746db7c-kvcpx   1/1     Running   0          26m
kagent-postgresql-68f97986df-g9xqs                1/1     Running   0          26m
kagent-querydoc-7dbd595b58-db4cb                  1/1     Running   0          26m
kagent-tools-77c6f575c8-t7cwt                     1/1     Running   0          26m
kagent-ui-df66c64bd-pzht2                         1/1     Running   0          26m
kgateway-agent-5cc8d7fdc6-sfx9d                   1/1     Running   0          13m
observability-agent-78c9d59d54-5kck2              1/1     Running   0          13m
promql-agent-7685b45dc-h9cps                      1/1     Running   0          13m</code></code></pre><p>All Running. Good.</p><h3>The Manual Patch Nobody Tells You About</h3><p>Something the docs do not mention. The Helm parameter --set providers.openAI.endpoint creates the API key secret correctly. But it does not populate the <code>baseUrl</code> field in the ModelConfig CRD.</p><p>Check it:</p><pre><code><code>k get modelconfig default-model-config -n kagent -o yaml</code></code></pre><p>You will see:</p><pre><code><code>spec:
  apiKeySecret: kagent-openai
  apiKeySecretKey: OPENAI_API_KEY
  model: meta-llama/Llama-3.3-70B-Instruct
  provider: OpenAI</code></code></pre><p>No <code>openAI.baseUrl</code> field. That means kagent will try to call <code>https://api.openai.com/v1</code> instead of Nebius.</p><p>The fix:</p><pre><code><code>k patch modelconfig default-model-config -n kagent --type=merge -p '{
  "spec": {
    "openAI": {
      "baseUrl": "https://api.studio.nebius.ai/v1"
    }
  }
}'</code></code></pre><p>Output:</p><pre><code><code>modelconfig.kagent.dev/default-model-config patched</code></code></pre><p>Check again:</p><pre><code><code>k get modelconfig default-model-config -n kagent -o jsonpath='{.spec.openAI.baseUrl}'</code></code></pre><p>Output:</p><pre><code><code>https://api.studio.nebius.ai/v1</code></code></pre><p>Good. This is a known limitation in kagent 0.9.0. The Helm chart sets the secret but not the CRD field. You need the manual patch. I am mentioning this because if you deploy kagent with a non-OpenAI provider, you will hit this.</p><h2>The MCP Wire Protocol</h2><p>This is the part nobody writes about. What does MCP look like on the wire?</p><p>Here is a simple diagram showing the flow:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gid5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gid5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png 424w, https://substackcdn.com/image/fetch/$s_!Gid5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png 848w, https://substackcdn.com/image/fetch/$s_!Gid5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!Gid5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gid5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png" width="1388" height="1500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1500,&quot;width&quot;:1388,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184585,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/195456083?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gid5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png 424w, https://substackcdn.com/image/fetch/$s_!Gid5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png 848w, https://substackcdn.com/image/fetch/$s_!Gid5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!Gid5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbba6479a-512b-4968-bb4c-4062d4c0782c_1388x1500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I port-forwarded the kagent-tools service to my laptop:</p><pre><code><code>k port-forward -n kagent svc/kagent-tools 8084:8084 --address 127.0.0.1 &amp;</code></code></pre><p>kagent-tools is an MCP server. It exposes 124 k8s-related tools (we will get to that).</p><h3>Step 1 &gt; Initialize a Session</h3><p>MCP uses JSON-RPC 2.0. First request: <code>initialize</code>.</p><pre><code><code>curl -i -X POST http://localhost:8084/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "capabilities": {},
      "clientInfo": {"name": "test", "version": "1.0"}
    }
  }'</code></code></pre><p>The response from my terminal:</p><pre><code><code>HTTP/1.1 200 OK
Content-Type: application/json
Mcp-Session-Id: mcp-session-bed70407-ca06-4689-a2a6-e1ab213add6a
Date: Sat, 25 Apr 2026 16:19:30 GMT
Content-Length: 175

{"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2024-11-05","capabilities":{"tools":{"listChanged":true}},"serverInfo":{"name":"kagent-tools-server","version":"0.1.3"}}}</code></code></pre><p>Look at the header: <code>Mcp-Session-Id</code>.</p><p>The session ID is in an HTTP header. Not in the JSON response. This matters because if you are building an MCP client, you cannot just parse the JSON and expect to find the session ID. You need to read headers.</p><p>The MCP docs mention sessions, but they do not show you the actual HTTP exchange.</p><h3>Step 2 &gt; List Available Tools</h3><p>Now that I have a session ID, I can call <code>tools/list</code>:</p><pre><code><code>curl -s -X POST http://localhost:8084/mcp \
  -H "Content-Type: application/json" \
  -H "Mcp-Session-Id: mcp-session-bed70407-ca06-4689-a2a6-e1ab213add6a" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/list",
    "params": {}
  }' &gt; /tmp/tools-list.json

cat /tmp/tools-list.json | jq '.result.tools | length'</code></code></pre><p>Output:</p><pre><code><code>124</code></code></pre><p>124 tools. Let me show you what one looks like:</p><pre><code><code>cat /tmp/tools-list.json | jq '.result.tools[0]'</code></code></pre><p>Output:</p><pre><code><code>{
  "annotations": {
    "readOnlyHint": false,
    "destructiveHint": true,
    "idempotentHint": false,
    "openWorldHint": true
  },
  "description": "Check the logs of the Argo Rollouts Gateway API plugin",
  "inputSchema": {
    "type": "object",
    "properties": {
      "namespace": {
        "description": "The namespace of the plugin resources",
        "type": "string"
      },
      "timeout": {
        "description": "Timeout for log collection in seconds",
        "type": "string"
      }
    }
  },
  "name": "argo_check_plugin_logs"
}</code></code></pre><p>The <code>annotations</code> are interesting. MCP has a safety hint system. Each tool tells you:</p><ul><li><p><code>destructiveHint: true</code> means this tool can change things</p></li><li><p><code>idempotentHint: false</code> means running it twice is not safe</p></li><li><p><code>readOnlyHint: false</code> confirms it is not read-only</p></li></ul><p>These hints let LLMs (or orchestrators) make smarter decisions about when to call tools.</p><p>The 124 tools are organized by category:</p><pre><code><code>Argo Rollouts:      3 tools
Cilium networking: 15 tools
Helm:               8 tools
Istio:             12 tools
Kubernetes core:   45 tools
Gateway API:        6 tools
Prometheus/PromQL: 35 tools</code></code></pre><h3>Step 3 &gt; Call a Tool</h3><p>Let me actually execute something. I will call <code>k8s_get_resources</code> to list namespaces:</p><pre><code><code>curl -s -X POST http://localhost:8084/mcp \
  -H "Content-Type: application/json" \
  -H "Mcp-Session-Id: mcp-session-bed70407-ca06-4689-a2a6-e1ab213add6a" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "k8s_get_resources",
      "arguments": {
        "resource_type": "namespace"
      }
    }
  }' | jq '.'</code></code></pre><p>From my terminal:</p><pre><code><code>{
  "jsonrpc": "2.0",
  "id": 3,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "NAME              STATUS   AGE\ndefault           Active   28m\nkagent            Active   27m\nkube-node-lease   Active   28m\nkube-public       Active   28m\nkube-system       Active   28m\n"
      }
    ]
  }
}</code></code></pre><p>That is a real <code>k get ns</code> executed inside the MCP server pod. The tool server has cluster-admin RBAC, so it can read anything.</p><p>MCP itself is just JSON-RPC over HTTP. What feels like magic is the LLM picking the right tool and formatting arguments correctly.</p><h3>The Transport: Server-Sent Events</h3><p>One more thing I noticed. When I made a plain GET request to the MCP endpoint:</p><pre><code><code>curl -v http://localhost:8084/mcp 2&gt;&amp;1 | head -25</code></code></pre><p>Response headers:</p><pre><code><code>HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: keep-alive
Content-Type: text/event-stream
Date: Sat, 25 Apr 2026 16:16:03 GMT
Transfer-Encoding: chunked</code></code></pre><p><code>text/event-stream</code> is Server-Sent Events (SSE). MCP&#8217;s STREAMABLE_HTTP transport uses SSE under the hood.</p><p>Makes sense for streaming. The MCP server can push tokens to the client as the LLM generates them.</p><h2>Agent-to-Agent Communication</h2><p>kagent has built-in support for agent-to-agent (<a href="https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/">A2A</a>) delegation. I wanted to see how this works with MCP.</p><p>Here is what happens when one agent delegates to another:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yxro!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yxro!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png 424w, https://substackcdn.com/image/fetch/$s_!yxro!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png 848w, https://substackcdn.com/image/fetch/$s_!yxro!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png 1272w, https://substackcdn.com/image/fetch/$s_!yxro!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yxro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png" width="1116" height="1262" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1262,&quot;width&quot;:1116,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:153843,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/195456083?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yxro!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png 424w, https://substackcdn.com/image/fetch/$s_!yxro!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png 848w, https://substackcdn.com/image/fetch/$s_!yxro!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png 1272w, https://substackcdn.com/image/fetch/$s_!yxro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0c26c44-d956-4e43-be4a-050c3da7aeab_1116x1262.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I created a custom agent called <code>sre-orchestrator</code>:</p><pre><code><code>cat &lt;&lt;'EOF' | k apply -f -
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: sre-orchestrator
  namespace: kagent
spec:
  type: Declarative
  description: "SRE Orchestrator that delegates to specialized agents"
  declarative:
    modelConfig: default-model-config
    systemMessage: |
      You are an SRE orchestrator. When asked about metrics or PromQL,
      delegate to promql-agent. When asked about cluster state, use K8s tools directly.
    tools:
    - type: McpServer
      mcpServer:
        name: kagent-tool-server
        kind: RemoteMCPServer
        apiGroup: kagent.dev
        toolNames:
        - k8s_get_resources
        - k8s_describe_resource
        - k8s_get_pod_logs
    - type: Agent
      agent:
        name: promql-agent
EOF</code></code></pre><p>Output:</p><pre><code><code>agent.kagent.dev/sre-orchestrator created</code></code></pre><p>Wait for it to be ready:</p><pre><code><code>k wait --for=condition=Ready agent/sre-orchestrator -n kagent --timeout=120s</code></code></pre><p>Output:</p><pre><code><code>agent.kagent.dev/sre-orchestrator condition met</code></code></pre><p>This agent has two types of tools:</p><ol><li><p>Direct MCP tools (k8s_get_resources, k8s_describe_resource, k8s_get_pod_logs)</p></li><li><p>Another agent (promql-agent)</p></li></ol><p>I gave it a task:</p><pre><code><code>kagent invoke --agent sre-orchestrator \
  --task "First, list all pods in kagent namespace. Then ask promql-agent to write a PromQL query for CPU usage of those pods." \
  --stream</code></code></pre><p>What happened (simplified from the JSON stream):</p><ol><li><p>sre-orchestrator called <code>k8s_get_resources</code> with namespace=kagent, resource_type=pod</p></li><li><p>Got back 18 pod names</p></li><li><p>Called <code>kagent__NS__promql_agent</code> (agent delegation)</p></li><li><p>promql-agent generated this PromQL query:</p></li></ol><pre><code><code>sum(rate(container_cpu_usage_seconds_total{pod=~"argo-rollouts-conversion-agent-985c94668-4zm2t|cilium-debug-agent-5bdfbbf8b5-8qw5z|..."}[5m])) by (pod)</code></code></pre><ol start="5"><li><p>Response came back to sre-orchestrator</p></li></ol><p>Two separate sessions. That surprised me.</p><p>From the JSON stream I saw:</p><pre><code><code>Parent session: 2640382e-4884-417e-95ac-20d9af90085a (sre-orchestrator)
Sub-session:    b0354a88-bdef-4290-8442-c1f88138a9fa (promql-agent)</code></code></pre><p>I checked the PostgreSQL database to confirm this:</p><pre><code><code>k exec -n kagent deployment/kagent-postgresql -it -- psql -U kagent -d kagent -c "
SELECT 
  id,
  LEFT(id, 8) as short_id,
  agent_id,
  created_at,
  updated_at
FROM session 
WHERE id IN (
  '2640382e-4884-417e-95ac-20d9af90085a',
  'b0354a88-bdef-4290-8442-c1f88138a9fa'
)
ORDER BY created_at;
"</code></code></pre><p>My output:</p><pre><code><code>id | short_id | agent_id | created_at | updated_at           
--------------------------------------+----------+------------------------------+------
 2640382e-4884-417e-95ac-20d9af90085a | 2640382e | kagent__NS__sre_orchestrator | 2026-04-25 16:22:35.674079+00 | 2026-04-25 16:22:35.674079+00
 b0354a88-bdef-4290-8442-c1f88138a9fa | b0354a88 | kagent__NS__promql_agent     | 2026-04-25 16:22:49.573249+00 | 2026-04-25 16:22:49.573249+00
</code></code></pre><p>The sub-session was created 14 seconds after the parent. Two separate database records. Complete isolation.</p><p>Token accounting is also separate (from the JSON stream metadata):</p><ul><li><p>Parent session: 3,707 tokens total</p></li><li><p>Sub-session: 2,407 tokens (prompt: 2,081, completion: 326)</p></li></ul><p>When one agent delegates to another, the sessions do not share state. The parent only sees the final response from the child, not the internal reasoning or tool calls.</p><p>Worth knowing if you are debugging a multi-agent flow. You will need to dig into both sessions, not just the parent.</p><h2>The Agent Card <a href="https://agent2agent.info/docs/concepts/agentcard/">Protocol</a></h2><p>A2A has a discovery mechanism. Every agent exposes a JSON file at <code>/.well-known/agent-card.json</code>.</p><p>I port-forwarded the k8s-agent service:</p><pre><code><code>k port-forward -n kagent svc/k8s-agent 8080:8080 --address 127.0.0.1 &amp;
sleep 2
curl -s http://localhost:8080/.well-known/agent-card.json | jq '.'</code></code></pre><p>The response:</p><pre><code><code>{
  "capabilities": {
    "pushNotifications": false,
    "stateTransitionHistory": true,
    "streaming": true
  },
  "defaultInputModes": ["text"],
  "defaultOutputModes": ["text"],
  "description": "An Kubernetes Expert AI Agent specializing in cluster operations, troubleshooting, and maintenance.",
  "name": "k8s_agent",
  "preferredTransport": "JSONRPC",
  "protocolVersion": "0.3.0",
  "skills": [
    {
      "description": "The ability to analyze and diagnose Kubernetes Cluster issues.",
      "examples": [
        "What is the status of my cluster?",
        "How can I troubleshoot a failing pod?",
        "What are the resource limits for my nodes?"
      ],
      "id": "cluster-diagnostics",
      "name": "Cluster Diagnostics",
      "tags": ["cluster", "diagnostics"]
    },
    {
      "description": "The ability to manage and optimize Kubernetes resources.",
      "examples": [
        "Scale my deployment X to 3 replicas.",
        "Optimize resource requests for my pods."
      ],
      "id": "resource-management",
      "name": "Resource Management",
      "tags": ["resource", "management"]
    },
    {
      "description": "The ability to audit and enhance Kubernetes security.",
      "examples": [
        "Check for RBAC misconfigurations.",
        "Audit my network policies."
      ],
      "id": "security-audit",
      "name": "Security Audit",
      "tags": ["security", "audit"]
    }
  ],
  "url": "http://k8s-agent.kagent:8080",
  "version": ""
}</code></code></pre><p>The agent advertises:</p><ul><li><p>What it can do (three skills: cluster-diagnostics, resource-management, security-audit)</p></li><li><p>Example queries it understands</p></li><li><p>Protocol version (0.3.0)</p></li><li><p>Preferred transport (JSONRPC)</p></li></ul><p>Compare this to promql-agent:</p><pre><code><code>k port-forward -n kagent svc/promql-agent 8082:8080 --address 127.0.0.1 &amp;
sleep 2
curl -s http://localhost:8082/.well-known/agent-card.json | jq '.skills'</code></code></pre><p>Output:</p><pre><code><code>[
  {
    "description": "Translates a natural language description of monitoring needs into a precise and performant PromQL query, providing an explanation, assumptions, and alternatives.",
    "id": "generate-promql-query",
    "name": "Generate PromQL Query",
    "tags": ["promql", "prometheus", "query-generation", "metrics"]
  },
  {
    "description": "Explains how an existing PromQL query works, helps debug issues, or suggests refinements for better performance or accuracy.",
    "id": "explain-debug-promql-query",
    "name": "Explain or Debug PromQL Query",
    "tags": ["promql", "debug-query", "optimization"]
  },
  {
    "description": "Provides information about Prometheus data models, PromQL functions, syntax, common patterns, and best practices for writing effective queries.",
    "id": "promql-concepts-best-practices",
    "name": "PromQL Concepts and Best Practices",
    "tags": ["promql", "concepts", "best-practices", "tutorial"]
  }
]</code></code></pre><p>And my custom sre-orchestrator:</p><pre><code><code>k port-forward -n kagent svc/sre-orchestrator 8081:8080 --address 127.0.0.1 &amp;
sleep 2
curl -s http://localhost:8081/.well-known/agent-card.json | jq '.skills'</code></code></pre><p>Output:</p><pre><code><code>[]</code></code></pre><p>Empty. I did not define any skills for it. That is fine. Skill definitions are optional. But if you are building a multi-agent system, defining skills helps other agents understand what to delegate.</p><h2>Things That Did Not Work</h2><p>Not everything went smoothly.</p><h3>Llama 3.3 70B and Tool Calling</h3><p>I tried a general question first:</p><pre><code><code>kagent invoke --agent k8s-agent \
  --task "What is the status of this cluster? How many nodes and namespaces?" \
  --stream</code></code></pre><p>The agent called <code>k8s_get_resources</code> with these parameters (from the JSON stream):</p><pre><code><code>{
  "namespace": "null",
  "resource_name": "null"
}</code></code></pre><p>Not JSON <code>null</code>. The string <code>"null"</code>.</p><p>The tool server tried to run <code>k get node null -n null</code>. From my terminal:</p><pre><code><code>[Kubernetes] get node null -n null -o wide failed: exit status 1</code></code></pre><p>The agent retried 9 times with the same parameters. Same error every time. Then it gave up.</p><p>But when I made the question specific:</p><pre><code><code>kagent invoke --agent k8s-agent \
  --task "List all pods in the kagent namespace" \
  --stream</code></code></pre><p>It worked perfectly. The parameters this time:</p><pre><code><code>{
  "namespace": "kagent",
  "resource_type": "pod"
}</code></code></pre><p>The tool executed successfully and returned 18 pods.</p><p>The problem is Llama 3.3 70B tool calling accuracy. It is not as sharp as GPT-4. When the query is vague, it guesses wrong. When the query is specific, it is fine.</p><p>If you run Llama in prod, write specific prompts. I learned that the hard way. Or just use Claude or GPT-4 and stop worrying about it.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mesutoezdil.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://mesutoezdil.substack.com/subscribe?"><span>Subscribe now</span></a></p><h2>Wrapping up</h2><p>MCP itself is HTTP, JSON-RPC, and a session header. Nothing exotic. The problem is that most tutorials stop at the config file, so you never see the actual exchange. </p><p>A few things stuck with me.</p><p><strong>&#187; </strong>Sessions live in the Mcp-Session-Id header, not the JSON body. If you are writing a client, parse headers. I almost missed this.</p><p><strong>&#187; </strong>Tool schemas have annotations like destructiveHint and readOnlyHint. That is genuinely useful. You can build guardrails so the LLM does not fire destructive tools without confirmation.</p><p><strong>&#187; </strong>Agent-to-agent delegation creates a new session under the hood. Parent and child are isolated. Separate database rows, separate token counts. Worth knowing if you are debugging a multi-agent flow.</p><p><strong>&#187;</strong> Tool calling is more LLM-dependent than I expected. Llama 3.3 70B handles specific prompts fine but trips on vague ones. For prod, I would probably reach for Claude or GPT-4 unless my prompts were really locked down.</p><p><strong>&#187;</strong> And the kagent Helm chart in 0.9.0 does not populate the baseUrl in ModelConfig. If you use Nebius or anything other than OpenAI, you have to patch it manually after install. Took me 20 minutes to figure that out.</p><h2>Next?</h2><p>I tested MCP with CPU-only agents. But what if your tools need GPUs? Imagine an MCP tool that runs Stable Diffusion, or does real-time video analysis.</p><p>On k8s, you will need GPU scheduling. I work on Project HAMi for this. It is a CNCF sandbox project that does multi-tenant GPU sharing. HAMi plus MCP could be interesting: MCP tools request GPU fractions, HAMi schedules them across pods.</p><p>Maybe a future article.</p><p>All the commands above are from a real session on a Nebius VM. Whole thing took 30 minutes on a k3s cluster. If you want to try it yourself, k3s and any OpenAI-compatible LLM API will work. I used Nebius for cost reasons, but OpenAI, Anthropic, or local Ollama all work.</p><p><a href="https://www.lfopensource.cn/mcp-dev-summit-shanghai/program/cfp/">The MCP Dev Summit</a> is happening in Shanghai in September. I might submit this as a talk. If you are going, let me know.</p>]]></content:encoded></item><item><title><![CDATA[kagent + HAMi on Nebius: 2 CNCF Projects, 1 GPU, 0 OpenAI]]></title><description><![CDATA[I spent a day testing two open-source CNCF projects on a real GPU instance. Every command below actually ran. Every output is real.]]></description><link>https://mesutoezdil.substack.com/p/kagent-hami-on-nebius-2-cncf-projects</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/kagent-hami-on-nebius-2-cncf-projects</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Mon, 20 Apr 2026 11:20:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zABN!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F645d6c03-4d5b-4067-ab03-473c24075b55_500x500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>A quick note before we start: this is not a summary of documentation. Every command you see here I typed myself on a Nebius VM. Every output came from that machine. </strong></p><p>When sth failed, I debugged it. When sth worked, I explained why. The errors in this article are real errors I hit. The fixes are things I actually tried. If you run these same commands on the same setup, you will get the same results. The full repo with all manifests and setup script is here: </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://github.com/mesutoezdil/kagentWithHami&quot;,&quot;text&quot;:&quot;See my repo on GitHub&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://github.com/mesutoezdil/kagentWithHami"><span>See my repo on GitHub</span></a></p><p><strong>A note on scope:</strong> This article covers the highlights. The full setup, all manifests, the complete troubleshooting guide, and the setup script live in the GitHub repo. If you want to run this yourself, start there.</p><p><strong>If you are new to HAMi:</strong> I wrote a separate deep-dive on running HAMi in a real k8s env. That covers the installation from scratch and explains how the GPU virtualization layer works internally. Read that first if you have not set up HAMi before: <a href="https://medium.com/@mesutoezdil/hami-in-a-real-kubernetes-environment-e8eaa872f388">https://medium.com/@mesutoezdil/hami-in-a-real-kubernetes-environment-e8eaa872f388</a>.</p><p><strong>If you want to see how I approach testing observability tool for GPU:</strong> <a href="https://mesutoezdil.substack.com/p/i-tested-every-feature-of-ingero">https://mesutoezdil.substack.com/p/i-tested-every-feature-of-ingero</a></p><h2>What this is about</h2><p><strong><a href="https://kagent.dev/">kagent</a></strong> turns AI agents into k8s resources. Your system prompt, your tools, your model config: all of it lives as a CRD. You version it in Git, deploy it with Helm, and inspect it with <code>kubectl</code> just like everything else in your cluster.</p><p><strong><a href="https://project-hami.io/">HAMi</a></strong> virtualizes GPUs at the k8s scheduler level. One physical NVIDIA L40S becomes ten virtual GPUs in k8s, each with hard memory limits enforced at the CUDA driver level.</p><p><strong><a href="https://nebius.com/">Nebius Token Factory</a></strong> is an OpenAI-compatible inference API. I used Llama 3.3 70B for all tests. No OpenAI account needed.</p><p>The question I wanted to answer: can an AI agent manage GPU-virtualized workloads from inside a k8s cluster, using only open-source models?</p><p>The answer is yes!</p><h2>The machine</h2><pre><code><code>GPU:  1x NVIDIA L40S (46GB VRAM)
CPU:  8 vCPUs
RAM:  32GB
OS:   Ubuntu 24.04 LTS for NVIDIA GPUs (CUDA 13)</code></code></pre><pre><code><code>nvidia-smi</code></code></pre><pre><code><code>| NVIDIA-SMI 580.126.09    CUDA Version: 13.0      |
|   0  NVIDIA L40S    0MiB / 46068MiB    0%        |</code></code></pre><p>46GB VRAM sitting completely idle. By the end of this, it runs as ten virtual GPUs.</p><h2>1 &gt; k3s and Helm</h2><p>k3s is the right choice for a single-node setup. One binary, starts in seconds.</p><pre><code><code>curl -sfL https://get.k3s.io | sh -

mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown ubuntu:ubuntu ~/.kube/config
export KUBECONFIG=~/.kube/config
alias k=kubectl
kubectl config set-context --current --namespace=kagent</code></code></pre><pre><code><code>k get node</code></code></pre><pre><code><code>NAME               STATUS   ROLES           AGE   VERSION
nebius-tarantula   Ready    control-plane   48s   v1.34.6+k3s1</code></code></pre><pre><code><code>curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash</code></code></pre><h2>2 &gt; Installing kagent</h2><p>kagent ships two Helm charts. CRDs go first, then the main chart. This is intentional. It lets you upgrade CRDs independently without touching running agents.</p><pre><code><code>kubectl create namespace kagent

helm install kagent-crds \
  oci://ghcr.io/kagent-dev/kagent/helm/kagent-crds \
  --namespace kagent</code></code></pre><pre><code><code>Pulled: ghcr.io/kagent-dev/kagent/helm/kagent-crds:0.8.6
STATUS: deployed</code></code></pre><p>Now the main chart. This is where I point it at Nebius Token Factory instead of OpenAI:</p><pre><code><code>helm install kagent \
  oci://ghcr.io/kagent-dev/kagent/helm/kagent \
  --namespace kagent \
  --set providers.default=openAI \
  --set providers.openAI.apiKey=$NEBIUS_API_KEY \
  --set providers.openAI.model=meta-llama/Llama-3.3-70B-Instruct</code></code></pre><p><code>providers.default=openAI</code> does not mean I am using OpenAI. Nebius implements the OpenAI API specification. The provider field picks which API client to use, not which company to pay.</p><p>One thing the Helm chart does not handle automatically: setting the base URL in the ModelConfig. I had to patch it manually after install:</p><pre><code><code>kubectl create secret generic kagent-openai \
  --from-literal=OPENAI_API_KEY=$NEBIUS_API_KEY \
  -n kagent --dry-run=client -o yaml | kubectl apply -f -

kubectl patch modelconfig default-model-config --type=merge -p '{
  "spec": {
    "openAI": {"baseUrl": "https://api.studio.nebius.ai/v1"},
    "model": "meta-llama/Llama-3.3-70B-Instruct"
  }
}'

kubectl rollout restart deployment -n kagent</code></code></pre><p>After that, everything looks right:</p><pre><code><code>k get agents,modelconfigs,remotemcpservers</code></code></pre><pre><code><code>NAME                                      TYPE          RUNTIME  READY  ACCEPTED
agent.kagent.dev/k8s-agent                Declarative   python   True   True
agent.kagent.dev/helm-agent               Declarative   python   True   True
agent.kagent.dev/istio-agent              Declarative   python   True   True
agent.kagent.dev/cilium-debug-agent       Declarative   python   True   True
... (10 agents total)

NAME                                         PROVIDER  MODEL
modelconfig.kagent.dev/default-model-config  OpenAI    meta-llama/Llama-3.3-70B-Instruct

NAME                                               PROTOCOL         ACCEPTED
remotemcpserver.kagent.dev/kagent-tool-server      STREAMABLE_HTTP  True
remotemcpserver.kagent.dev/kagent-grafana-mcp      STREAMABLE_HTTP  True</code></code></pre><p>Ten agents, all ready. ModelConfig pointing at Nebius. Two MCP servers registered with 175 tools between them.</p><h2>3 &gt; Installing HAMi</h2><p>Without HAMi, k8s cannot see the GPU at all:</p><pre><code><code>k get node nebius-tarantula \
  -o jsonpath='{.status.allocatable}' | python3 -m json.tool</code></code></pre><pre><code><code>{"cpu": "8", "memory": "32865164Ki", "pods": "110"}</code></code></pre><p>No <code>nvidia.com/gpu</code>. From k8s&#8217; perspective, this is just a CPU node.</p><p>Before installing HAMi, there is one fix that the documentation does not make obvious. The <code>nvidia-ctk</code> tool writes a containerd config in version 1 format. k3s uses version 3. These two formats conflict, and it causes HAMi&#8217;s device plugin to crash on startup. The fix is to delete the conflicting file:</p><pre><code><code>sudo rm -f /etc/containerd/conf.d/99-nvidia.toml
sudo systemctl restart containerd
sudo systemctl restart k3s
sleep 20</code></code></pre><p>k3s already has the nvidia runtime in its own containerd config. The deleted file was silently overriding it with something incompatible. Now install:</p><pre><code><code>helm repo add hami-charts https://project-hami.github.io/HAMi/
helm repo update

kubectl label node nebius-tarantula gpu=on

helm install hami hami-charts/hami \
  --set scheduler.kubeScheduler.imageTag=v1.34.6 \
  --set devicePlugin.nvidiaDriverRoot=/usr \
  -n kube-system</code></code></pre><p>Two things matter here. The <code>imageTag</code> must exactly match your k8s version. HAMi extends the default scheduler, so version compatibility is strict. The <code>nvidiaDriverRoot=/usr</code> tells HAMi where to find NVIDIA libraries on the host.</p><pre><code><code>k get pods -n kube-system | grep hami</code></code></pre><pre><code><code>hami-device-plugin-cw6t5          2/2   Running
hami-scheduler-84f8888fc5-fqrp8   2/2   Running</code></code></pre><p>Both pods <code>2/2</code>. Now check the node:</p><pre><code><code>k get node nebius-tarantula \
  -o jsonpath='{.status.allocatable}' | python3 -m json.tool</code></code></pre><pre><code><code>{
  "cpu": "8",
  "memory": "32865164Ki",
  "nvidia.com/gpu": "10",
  "pods": "110"
}</code></code></pre><p><code>"nvidia.com/gpu": "10"</code>. One physical GPU, ten virtual GPUs. The node annotation shows the full registration:</p><pre><code><code>[{
  "id": "GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7",
  "count": 10,
  "devmem": 46068,
  "type": "NVIDIA L40S",
  "mode": "hami-core",
  "health": true
}]</code></code></pre><h2>4 &gt; First agent invocation</h2><pre><code><code>kubectl port-forward svc/kagent-controller 8083:8083 -n kagent &amp;

kagent invoke --agent k8s-agent \
  --task "What is the status of this cluster? List all running pods." \
  --stream</code></code></pre><p>Watching the stream, you can see exactly what happens. The LLM receives the task, decides to call <code>k8s_get_resources</code>, gets back the pod table, and summarizes:</p><pre><code><code>"The cluster has 25 running pods across different namespaces,
including kagent and kube-system."</code></code></pre><p>Total: 6,131 tokens.</p><h2>5 &gt; The GPU check</h2><p>Before HAMi was installed, I asked the agent about GPU availability:</p><pre><code><code>kagent invoke --agent k8s-agent \
  --task "What GPUs are available on the node?" --stream</code></code></pre><pre><code><code>"The node does not have any GPUs available."</code></code></pre><p>After HAMi:</p><pre><code><code>kagent invoke --agent k8s-agent \
  --task "Check nebius-tarantula allocatable resources. How many GPUs?" --stream</code></code></pre><pre><code><code>"The node nebius-tarantula has 10 GPUs available, type NVIDIA L40S."</code></code></pre><p>The agent reads HAMi&#8217;s k8s annotations and understands what they mean.</p><h2>6 &gt; The self-inspection test</h2><p>This one was genuinely interesting. I asked the agent to describe itself using the k8s API:</p><pre><code><code>kagent invoke --agent k8s-agent \
  --task "Describe yourself using the Kubernetes API. What CRD defines you?" \
  --stream</code></code></pre><p>The agent tried the default namespace first and failed. Then it listed all agents across namespaces, found itself in kagent, ran describe on <code>k8s-agent -n kagent</code>, read its own CRD including the system prompt and tool list, and explained its own architecture.</p><p>An agent reading its own k8s definition. Not from training data. From a live API call.</p><h2>7 &gt; Creating a custom agent</h2><p>Beyond the ten built-in agents, you can define your own. I created an SRE orchestrator that delegates to the promql-agent for metrics queries:</p><pre><code><code>apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: sre-orchestrator
  namespace: kagent
spec:
  type: Declarative
  declarative:
    modelConfig: default-model-config
    systemMessage: |
      You are an SRE orchestrator. Use k8s tools for cluster state.
      Delegate to promql-agent for metrics and PromQL queries.
    tools:
    - type: McpServer
      mcpServer:
        name: kagent-tool-server
        kind: RemoteMCPServer
        apiGroup: kagent.dev
        toolNames:
        - k8s_get_resources
        - k8s_describe_resource
        - k8s_get_pod_logs
    - type: Agent
      agent:
        name: promql-agent</code></code></pre><pre><code><code>kubectl apply -f sre-orchestrator.yaml
kubectl get agent sre-orchestrator</code></code></pre><pre><code><code>NAME               TYPE          RUNTIME  READY  ACCEPTED
sre-orchestrator   Declarative   python   True   True</code></code></pre><p>Ready in under two minutes. The <code>type: Agent</code> entry is what makes A2A work. It lets the orchestrator call promql-agent as if it were a tool.</p><h2>8 &gt; Agent talking to agent</h2><pre><code><code>kagent invoke --agent sre-orchestrator \
  --task "Check pods in kagent namespace then ask promql-agent for CPU usage query." \
  --stream</code></code></pre><p>Two different session IDs appear in the stream:</p><pre><code><code>sre-orchestrator session:  6e4e0146-9a78-41a6-b2af-a8f3f2dd79db
promql-agent sub-session:  2ab4bf39-379d-404c-8fda-0e6c9e28c5da</code></code></pre><p>Each agent ran in its own process with its own context window. The orchestrator only saw the final answer from promql-agent, not its internal reasoning. Both sessions were stored independently in PostgreSQL.</p><p>The actual tool call in the stream:</p><pre><code><code>{
  "name": "kagent__NS__promql_agent",
  "args": {"request": "CPU usage query for pods in kagent namespace"},
  "metadata": {
    "kagent_subagent_session_id": "2ab4bf39-379d-404c-8fda-0e6c9e28c5da"
  }
}</code></code></pre><h2>9 &gt; Agent creates a HAMi GPU pod</h2><pre><code><code>kagent invoke --agent k8s-agent \
  --task "Create a pod named gpu-test-1 requesting 1 nvidia.com/gpu with HAMi annotation nvidia.com/gpumem: 20000. Use ubuntu:22.04." \
  --stream</code></code></pre><p>The agent generated and applied this:</p><pre><code><code>metadata:
  name: gpu-test-1
  annotations:
    nvidia.com/gpumem: "20000"
spec:
  containers:
  - image: ubuntu:22.04
    resources:
      limits:
        nvidia.com/gpu: 1</code></code></pre><p>HAMi picked it up immediately:</p><pre><code><code>hami.io/bind-phase: success
hami.io/vgpu-devices-allocated: GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7,NVIDIA,46068,100:;</code></code></pre><p>I then created a second pod requesting 15,000 MiB. Both landed on the same physical GPU. 20,000 + 15,000 = 35,000 MiB, well under the 46,068 physical limit.</p><p>Then I asked the agent to explain what just happened:</p><pre><code><code>kagent invoke --agent k8s-agent \
  --task "Look at gpu-test-1 and gpu-test-2. Explain how HAMi is sharing the GPU." \
  --stream</code></code></pre><pre><code><code>"GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7 is shared between both pods.
gpu-test-1 has 20,000 MiB, gpu-test-2 has 15,000 MiB from the same NVIDIA L40S."</code></code></pre><h2>10 &gt; Overcommit protection</h2><p>I created a pod requesting 11 virtual GPUs when only 10 exist:</p><pre><code><code>resources:
  limits:
    nvidia.com/gpu: 11</code></code></pre><pre><code><code>k describe pod gpu-overcommit-test | tail -5</code></code></pre><pre><code><code>Warning  FailedScheduling  hami-scheduler  0/1 nodes: 1 NodeUnfitPod
Warning  FilteringFailed   hami-scheduler  no available node, 1 nodes do not meet</code></code></pre><p>Pod stays Pending forever. HAMi will not schedule what it cannot fulfill. No partial allocation, no silent failure.</p><h2>11 &gt; HAMi metrics</h2><pre><code><code>curl http://localhost:31992/metrics</code></code></pre><pre><code><code>HostCoreUtilization{devicetype="NVIDIA-NVIDIA L40S",zone="vGPU"} 0
HostGPUMemoryUsage{devicetype="NVIDIA-NVIDIA L40S",zone="vGPU"} 6.40090112e+08
hami_build_info{version="v2.8.1",go_version="go1.25.5"} 1</code></code></pre><p>610 MiB consumed. That is the CUDA runtime overhead from idle pods. Utilization is 0% because <code>sleep infinity</code> does not run any GPU compute. Standard Prometheus format, works with any existing monitoring setup.</p><h2>12 &gt; kagent CLI</h2><pre><code><code>kagent get agent</code></code></pre><pre><code><code>+----+-----------------------------------+----------------------+------------------+
| #  | NAME                              | CREATED              | DEPLOYMENT_READY |
+----+-----------------------------------+----------------------+------------------+
| 1  | kagent/k8s-agent                  | 2026-04-18T07:27:18Z | true             |
| 4  | kagent/sre-orchestrator           | 2026-04-18T07:45:56Z | true             |
...</code></code></pre><pre><code><code>kagent get session</code></code></pre><pre><code><code>+----+------------------+------------------------------+----------------------+
| #  | ID               | AGENT                        | CREATED              |
+----+------------------+------------------------------+----------------------+
| 6  | 6e4e0146-...     | kagent__NS__sre_orchestrator | 2026-04-18T07:46:36Z |
| 7  | 2ab4bf39-...     | kagent__NS__promql_agent     | 2026-04-18T07:46:40Z |</code></code></pre><p>Row 7 is the A2A sub-session created by row 6. The 4-second gap is the delegation latency. PostgreSQL stores everything.</p><h2>The A2A agent card</h2><p>Every kagent agent exposes its capabilities at <code>/.well-known/agent-card.json</code>:</p><pre><code><code>curl http://10.42.0.68:8080/.well-known/agent-card.json | python3 -m json.tool</code></code></pre><pre><code><code>{
  "name": "k8s_agent",
  "protocolVersion": "0.3.0",
  "preferredTransport": "JSONRPC",
  "capabilities": {"streaming": true, "stateTransitionHistory": true},
  "skills": [
    {"id": "cluster-diagnostics", "name": "Cluster Diagnostics"},
    {"id": "resource-management", "name": "Resource Management"},
    {"id": "security-audit", "name": "Security Audit"}
  ],
  "url": "http://k8s-agent.kagent:8080"
}</code></code></pre><p>This is how agents discover each other in multi-agent systems.</p><h2>What did not work</h2><p><strong>Memory CRD</strong> only supports Pinecone as the vector database backend. There is no built-in option. Cross-session agent memory requires an external Pinecone account.</p><p><strong>kmcp init</strong> runs without error and creates nothing. The project scaffolding feature is listed in the CLI help but does not actually work in v0.8.6.</p><p><strong>Ubuntu + HAMi + sleep:</strong> HAMi injects <code>libvgpu.so</code> via <code>/etc/ld.so.preload</code>. If your image does not have CUDA libraries, even the <code>sleep</code> binary fails to start. Use CUDA-enabled base images for GPU workloads.</p><p><strong>HAMi WebUI</strong> is not bundled in the main Helm chart. The Prometheus metrics endpoint works fine but the graphical dashboard requires a separate installation.</p><h2>Why this setup makes sense</h2><p>Your deployment specs live in Git. Your network policies live in Git. Your RBAC rules live in Git. There is no reason your AI agent system prompts should be any different. kagent makes that possible without building custom tooling.</p><p>HAMi fixes GPU waste without requiring any changes to your applications. It works at the CUDA driver level, invisible to the workload.</p><p>Together, they give you AI agents that can see and manage GPU-virtualized infrastructure from inside the cluster, using open-source models, with no dependency on any closed AI provider.</p><p>All manifests, the setup script, and eight specific troubleshooting cases with root causes and verified fixes are in the repo: <strong><a href="https://github.com/mesutoezdil/kagentWithHami">https://github.com/mesutoezdil/kagentWithHami</a></strong></p><div><hr></div><p><em>One last thing: if any command here does not work for you, open an issue in the repo. I read them. These are not generated examples. They are things I actually ran, and I can help debug if something behaves differently on your setup.</em></p>]]></content:encoded></item><item><title><![CDATA[I Tested Every Feature of Ingero v0.9.1 on a Real NVIDIA L40S (Here’s Every Command, Every Output, Every Finding)]]></title><description><![CDATA[Platform: NVIDIA L40S 46 GB &#183; CUDA 13.0 &#183; Driver 580.126.09 &#183; Kernel 6.11.0&#8211;1016-nvidia &#183; Ubuntu 24.04 Tested: April 12, 2026 on a real cloud GPU instance, not simulated]]></description><link>https://mesutoezdil.substack.com/p/i-tested-every-feature-of-ingero</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/i-tested-every-feature-of-ingero</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Mon, 13 Apr 2026 09:34:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1xC4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article was too long for Substack, so I published it on Medium.</p><p>I tested Ingero v0.9.1 on a real NVIDIA L40S GPU from start to finish. </p><p>I ran the commands, checked the outputs, and looked at how it behaves in real situations.</p><p>In this post, I cover:</p><ul><li><p>How it traces CUDA at kernel level with eBPF</p></li><li><p>Why nvidia-smi can look fine while performance is not</p></li><li><p>How CPU, disk I/O, and GPU latency are connected</p></li><li><p>The difference between Driver API and Runtime API</p></li><li><p>Real root causes and how to fix them</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1xC4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1xC4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif 424w, https://substackcdn.com/image/fetch/$s_!1xC4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif 848w, https://substackcdn.com/image/fetch/$s_!1xC4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif 1272w, https://substackcdn.com/image/fetch/$s_!1xC4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1xC4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif" width="1200" height="672" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:672,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:561784,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/194051625?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1xC4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif 424w, https://substackcdn.com/image/fetch/$s_!1xC4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif 848w, https://substackcdn.com/image/fetch/$s_!1xC4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif 1272w, https://substackcdn.com/image/fetch/$s_!1xC4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fcd3ef5-08c7-4216-ab38-6c6b3feab647_1200x672.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p></li></ul><p>Everything is based on actual runs, not theory.</p><p>Read here: https://medium.com/@mesutoezdil/i-tested-every-feature-of-ingero-v0-9-1-94ae6de6eb7a</p><p></p>]]></content:encoded></item><item><title><![CDATA[HAMi in a Real Kubernetes Environment]]></title><description><![CDATA[GPUs are expensive.]]></description><link>https://mesutoezdil.substack.com/p/hami-in-a-real-kubernetes-environment</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/hami-in-a-real-kubernetes-environment</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Fri, 03 Apr 2026 11:56:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1cAR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>GPUs are expensive. For example, an NVIDIA <a href="https://www.nvidia.com/en-us/data-center/l40s/">L40S</a> costs around $10,000, and an <a href="https://www.nvidia.com/en-us/data-center/h100/">H100</a> can go up to $30,000. But in most k8s setups, GPU usage is very simple: a pod either gets the whole GPU or nothing.</p><p>This made sense in the past. Back then, GPU workloads were mostly long training jobs that really needed the full power of a GPU. But things have changed.</p><p><em>While we&#8217;re at it, if you want to learn (GPU and) CUDA (starting from zero), you can star the <a href="https://github.com/mesutoezdil/Systematic-CUDA-Learning-from-0-to-hero-.git">repo</a> I created for this and contribute as well. </em>And you can follow me on <a href="https://www.linkedin.com/in/mesut-oezdil/">LinkedIn</a> if you&#8217;d like.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1cAR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1cAR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png 424w, https://substackcdn.com/image/fetch/$s_!1cAR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png 848w, https://substackcdn.com/image/fetch/$s_!1cAR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png 1272w, https://substackcdn.com/image/fetch/$s_!1cAR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1cAR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png" width="1456" height="785" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:785,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:517252,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/193060113?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1cAR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png 424w, https://substackcdn.com/image/fetch/$s_!1cAR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png 848w, https://substackcdn.com/image/fetch/$s_!1cAR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png 1272w, https://substackcdn.com/image/fetch/$s_!1cAR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed6f642c-cfaa-41d2-b6c0-deb16280103c_2830x1526.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">project-hami.io</figcaption></figure></div><p>Today, inference workloads are smaller. Experiments are shorter. Teams share the same infrastructure.</p><p>In reality, a single pod often uses only about 15&#8211;20% of the GPU memory, and the rest just stays unused. But k8s still locks the whole GPU for that one pod, so nobody else can use it until the job is done.</p><p>The easy solution is to buy more GPUs. The smarter solution is to share what you already have.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mesutoezdil.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://mesutoezdil.substack.com/subscribe?"><span>Subscribe now</span></a></p><p>So what does <a href="https://project-hami.io/">HAMi</a> actually do?</p><p>HAMi is a k8s extension that lets you share GPUs instead of giving the whole card to just one pod. Instead of treating a GPU as one big block, it allows you to decide exactly how much memory and compute each workload should get. </p><p>So, in a nutshell, that&#8217;s it. Of course, there&#8217;s more to it than that.</p><p>Inside the container, it still looks like the pod has its own GPU with the amount of memory you assigned. It doesn&#8217;t see other workloads running on the same physical GPU.</p><p>But from the cluster side, the rest of the GPU is still free and can be used by other pods.</p><p>Two resources make this possible:</p><p><strong>nvidia.com/gpumem</strong> &#8594; This is the GPU memory in MiB. If a pod asks for 10240, it will see exactly 10240 MiB inside the container, no matter how big the actual GPU is.</p><p><strong>nvidia.com/gpucores</strong> &#8594; This controls how much compute power the pod can use. If a pod requests 50, it can use up to 50% of the GPU.</p><p>This is not the same as GPU time-slicing. Time-slicing just shares the GPU over time and doesn&#8217;t really isolate memory.</p><p>HAMi is stricter. It sets hard limits. If a workload tries to use more memory than it was given, it simply fails. It doesn&#8217;t spill over into someone else&#8217;s share.</p><p>I think I&#8217;ve explained what we&#8217;re going to do today up to this point.</p><p><strong>Why I set this up on my own server</strong></p><p>I run AI workloads on my own setup: an NVIDIA L40S on k3s. The card has 48 GB of VRAM, and honestly, most of my daily workloads don&#8217;t even come close to using all of it.</p><p>I usually run multiple things at once, inference services, evaluation jobs, small fine-tuning experiments, all competing for the same GPU.</p><p>HAMi felt like the right solution. But the official docs assume you already know a lot, so it&#8217;s not that easy to follow in practice.</p><p>So I went through the setup myself, ran into the real issues, and wrote down what actually happens.</p><p>I tested everything step by step on my own setup: an NVIDIA L40S with k3s, driver 580.126.09, and CUDA 13.0.</p><p>This isn&#8217;t theory. Every command here was actually run on real hardware. If sth worked, I noted it. If it failed, I noted that too.</p><p>The goal is simple but strict: to make sure HAMi is not just installed, but actually working.</p><p><strong>So what does &#8220;working&#8221; actually look like?</strong></p><p>Before we start, it&#8217;s important to be clear about this. A lot of people say &#8220;it works&#8221; too early. A setup is not working just because:</p><ul><li><p>Pods are running</p></li><li><p><code>helm install</code> finished successfully</p></li><li><p>The scheduler pod is up.</p></li></ul><p>A setup is really working when:</p><ul><li><p>You can actually access the GPU inside a container</p></li><li><p>K8s correctly shows the GPU resources</p></li><li><p>The HAMi scheduler doesn&#8217;t block your workloads</p></li><li><p>The limits you set are really enforced inside the container</p></li></ul><p>Everything in this guide is about proving exactly that.</p><p><strong>Environment</strong></p><p>This guide was tested on a clean setup with k3s running on a recent stable k8s version, using an NVIDIA L40S GPU with driver 580.126.09 and CUDA 13.0.</p><p>Everything described here was actually executed on this env.</p><blockquote><p><em><strong>Note:</strong> Different k8s distributions behave differently, especially k3s vs. kubeadm. This guide confirms HAMi works in a lightweight k3s env. The official HAMi docs require kubectl v1.16+, Helm v3+, CUDA v10.2+, and NVIDIA Driver v440+.</em></p></blockquote><p>Before starting, NVIDIA drivers must already be installed on your GPU nodes. The next section covers the container runtime config. This is the part most guides skip, and it&#8217;s where most failures actually happen.</p><h3>Step 1 &gt; Configure the NVIDIA container runtime</h3><p>HAMi and GPU workloads in general need the NVIDIA container runtime to be set as the default runtime on each GPU node.</p><p>If this isn&#8217;t configured, containers won&#8217;t see the GPU at all, no matter what k8s says.</p><p>Run the following on every GPU node in your cluster.</p><p>Install <code>nvidia-container-toolkit</code>:</p><pre><code>distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list</code></pre><pre><code>sudo apt-get update &amp;&amp; sudo apt-get install -y nvidia-container-toolkit</code></pre><p><strong>Configure Docker (if using Docker as your container runtime)</strong></p><p>Edit <code>/etc/docker/daemon.json</code> to set <code>nvidia</code> as the default runtime:</p><pre><code>{
  &#8220;default-runtime&#8221;: &#8220;nvidia&#8221;,
  &#8220;runtimes&#8221;: {
    &#8220;nvidia&#8221;: {
      &#8220;path&#8221;: &#8220;/usr/bin/nvidia-container-runtime&#8221;,
      &#8220;runtimeArgs&#8221;: []
    }
  }
}</code></pre><p>Then restart Docker:</p><pre><code>sudo systemctl daemon-reload &amp;&amp; systemctl restart docker</code></pre><p><strong>Configure containerd (if using containerd)</strong></p><p>Edit <code>/etc/containerd/config.toml</code>:</p><pre><code>version = 2
[plugins]
  [plugins.&#8221;io.containerd.grpc.v1.cri&#8221;]
    [plugins.&#8221;io.containerd.grpc.v1.cri&#8221;.containerd]
      default_runtime_name = &#8220;nvidia&#8221;</code></pre><pre><code>      [plugins.&#8221;io.containerd.grpc.v1.cri&#8221;.containerd.runtimes]
        [plugins.&#8221;io.containerd.grpc.v1.cri&#8221;.containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = &#8220;&#8221;
          runtime_root = &#8220;&#8221;
          runtime_type = &#8220;io.containerd.runc.v2&#8221;
          [plugins.&#8221;io.containerd.grpc.v1.cri&#8221;.containerd.runtimes.nvidia.options]
            BinaryName = &#8220;/usr/bin/nvidia-container-runtime&#8221;</code></pre><p>Then restart containerd:</p><pre><code>sudo systemctl daemon-reload &amp;&amp; systemctl restart containerd</code></pre><p>With the runtime in place, we can now verify that GPU workloads actually run before we add HAMi on top.</p><h3>Step 2 &gt; Validate the GPU stack</h3><p>Before touching HAMi, you first need to make sure k8s can actually run a GPU workload. </p><p>This step is not optional. If it fails, no amount of HAMi setup will fix it and in most cases, what looks like a &#8220;HAMi issue&#8221; is really a problem with the GPU runtime or drivers underneath.</p><p>Create this test pod:</p><pre><code>apiVersion: v1
kind: Pod
metadata:
  name: cuda-test
spec:
  restartPolicy: Never
  containers:
    - name: cuda
      image: nvcr.io/nvidia/cuda:12.2.0-base-ubuntu22.04
      command: [&#8221;nvidia-smi&#8221;]
      resources:
        limits:
          nvidia.com/gpu: 1</code></pre><p>Apply and verify:</p><pre><code>kubectl delete pod cuda-test --ignore-not-found
kubectl apply -f cuda-test.yaml
kubectl wait --for=condition=Ready pod/cuda-test --timeout=60s || true
kubectl wait --for=condition=Succeeded pod/cuda-test --timeout=60s || true
kubectl logs cuda-test</code></pre><p>What&#8217;s happening here is simple: K8s looks for a node that exposes <code>nvidia.com/gpu</code>, the device plugin reports the available GPUs, the container runtime makes the GPU accessible inside the container, and then <code>nvidia-smi</code> runs there.</p><p>If you can see GPU info in the logs, it means your whole GPU stack (drivers, container runtime, device plugin, and k8s scheduling) is working correctly.</p><p>Do not move forward until this step works.</p><h4>Troubleshooting Step 2</h4><p>As I mentioned earlier, I encountered a lot of errors in my own tests, so I&#8217;m sharing these solutions in the hope that they&#8217;ll be helpful to you as well. If <code>nvidia-smi</code> failed, work through these in order.</p><p><strong>i. Remove conflicting NVIDIA device plugins</strong></p><p>If you&#8217;ve tried multiple GPU setups before, stale device plugins may be interfering:</p><pre><code>kubectl delete daemonset nvidia-device-plugin-daemonset -n kube-system --ignore-not-found</code></pre><p><strong>ii. Ensure the node is labeled correctly</strong></p><p>Some setups require explicit GPU labeling on the node:</p><pre><code>kubectl label node $(hostname) nvidia.com/gpu.present=true --overwrite</code></pre><p><strong>iii. Verify GPU access outside k8s</strong></p><p>Isolate whether the problem is k8s or the runtime itself:</p><pre><code>sudo ctr run --rm --gpus 0 docker.io/nvidia/cuda:12.2.0-base-ubuntu22.04 test nvidia-smi</code></pre><p>If this works but the k8s pod fails, the issue is in the device plugin or container runtime config, not the driver.</p><p><strong>iv. Re-run the test</strong></p><pre><code>kubectl delete pod cuda-test --ignore-not-found
kubectl apply -f cuda-test.yaml
kubectl logs cuda-test</code></pre><p>Once you see clean <code>nvidia-smi</code> output, you&#8217;re ready to continue.</p><h3>Step 3 &gt; Verify GPU resources on the node</h3><p>Now that we know the GPU works, let&#8217;s confirm that k8s is correctly advertising it as a schedulable resource.</p><pre><code>kubectl get nodes -o jsonpath=&#8217;{.items[*].status.allocatable}&#8217; | grep -i nvidia</code></pre><p>Or without filtering, to see the full picture:</p><pre><code>kubectl get nodes -o jsonpath=&#8217;{.items[*].status.allocatable}&#8217;</code></pre><p>Expected output includes:</p><pre><code>nvidia.com/gpu: 1</code></pre><p>At this stage, you won&#8217;t see <code>nvidia.com/gpucores</code> or <code>nvidia.com/gpumem</code> yet. These are HAMi-specific resources and only show up after it&#8217;s installed. Right now, we&#8217;re just making sure the GPU is visible to the scheduler.</p><p>Once that baseline is confirmed, we&#8217;re ready to bring HAMi into the picture.</p><h3>Step 4 &gt; Label your GPU nodes</h3><p>HAMi&#8217;s scheduler only works with nodes that have the <code>gpu=on</code> label. This is a specific requirement. If the label is missing, the scheduler will simply ignore the node, and your GPU workloads either won&#8217;t get scheduled or will run without HAMi&#8217;s resource control at all.</p><pre><code>kubectl label nodes $(hostname) gpu=on</code></pre><p>If you have multiple GPU nodes, run this for each one, replacing <code>$(hostname)</code> with the actual node name. It&#8217;s easy to skip this step and then spend an hour wondering why HAMi isn&#8217;t enforcing anything.</p><h3>Step 5 &gt; Install HAMi</h3><p>HAMi builds on top of how k8s already handles GPUs. It adds its own scheduler logic, introduces new GPU resources like <code>gpucores</code> and <code>gpumem</code>, and makes it possible to share a single GPU across multiple workloads. It doesn&#8217;t replace the existing GPU stack, it depends on it.</p><p>Before you install it, first check your k8s server version. The HAMi scheduler image tag needs to match that version exactly.</p><pre><code>kubectl version</code></pre><p>Then add the HAMi Helm repository and install:</p><pre><code>helm repo add hami-charts https://project-hami.github.io/HAMi/</code></pre><pre><code>helm install hami hami-charts/hami \
  --set scheduler.kubeScheduler.imageTag=v1.XX.X \
  -n kube-system</code></pre><p>Replace <code>v1.XX.X</code> with your actual k8s server version. For example, if your cluster is running <code>1.28.4</code>, use <code>--set scheduler.kubeScheduler.imageTag=v1.28.4</code>.</p><p>Getting this version wrong is one of the most common HAMi installation failures. The scheduler will start but refuse to bind to k8s correctly if there&#8217;s a mismatch and it won&#8217;t always fail loudly.</p><h3>Step 6 &gt; Verify HAMi components</h3><p>Once the Helm install completes, check that both the device plugin and scheduler are running:</p><pre><code>kubectl get pods -n kube-system | grep hami</code></pre><p>You should see both <code>vgpu-device-plugin</code> and <code>vgpu-scheduler</code> in a Running state. If one of them is in <code>CrashLoopBackOff</code>, the most common reason is that the scheduler image version doesn&#8217;t match your k8s version from the previous step.</p><p>To make sure the scheduler is actually doing sth, not just sitting there, check its logs.</p><pre><code>kubectl logs -n kube-system -l app=hami-scheduler</code></pre><p>Look for signs that it&#8217;s actually making decisions. Things like scheduling actions, resource checks, and GPU allocation attempts in the logs.</p><p>If you want to go a bit deeper, you can also check the node-level resource state.</p><pre><code>kubectl get nodes -o wide
kubectl describe node $(hostname)</code></pre><p>With both components confirmed running, let&#8217;s prove the whole chain works end to end.</p><h3>Step 7 &gt; Re-run the GPU workload under HAMi</h3><p>This step answers a direct question: did installing HAMi break anything that was already working?</p><pre><code>kubectl delete pod cuda-test --ignore-not-found
kubectl apply -f cuda-test.yaml
kubectl wait --for=condition=Ready pod/cuda-test --timeout=60s || true
kubectl wait --for=condition=Succeeded pod/cuda-test --timeout=60s || true
kubectl logs cuda-test</code></pre><p>You should see the same clean <code>nvidia-smi</code> output as before. If you do, the scheduling chain is intact, no regression was introduced, and HAMi is coexisting cleanly with your existing GPU setup.</p><h3>Step 8 &gt; Submit a vGPU workload and verify resource limits</h3><p>This is the step that proves HAMi is actually doing sth. We&#8217;re going to submit a workload that requests a specific amount of GPU memory, then verify from inside the container that the limit is enforced.</p><p>Create the following pod:</p><pre><code>apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: ubuntu-container
      image: ubuntu:18.04
      command: [&#8221;bash&#8221;, &#8220;-c&#8221;, &#8220;sleep 86400&#8221;]
      resources:
        limits:
          nvidia.com/gpu: 1          # requesting 1 vGPU
          nvidia.com/gpumem: 10240   # limiting to 10240 MiB of GPU memory</code></pre><p>Apply it:</p><pre><code>kubectl apply -f gpu-pod.yaml
kubectl get pod gpu-pod</code></pre><p>Once it&#8217;s running, exec into it and check what the GPU looks like from inside:</p><pre><code>kubectl exec -it gpu-pod -- nvidia-smi</code></pre><p>If HAMi is working correctly, you&#8217;ll see output similar to this.</p><p>Note the memory cap at 10240 MiB, not the full GPU memory:</p><pre><code>[HAMI-core Msg]: Initializing.....
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15    Driver Version: 550.54.15    CUDA Version: 12.4               |
|-----------------------------------------+------------------------+----------------------+
|   0  Tesla V100-PCIE-32GB          On   |   00000000:3E:00.0 Off |                   0 |
| N/A   29C    P0    24W / 250W           |       0MiB / 10240MiB  |      0%     Default |
+-----------------------------------------+------------------------+----------------------+</code></pre><p>The key detail here is <code>0MiB / 10240MiB</code></p><p>The container only sees 10240 MiB of GPU memory, not the full 32 GB of the physical card. That means HAMi&#8217;s resource control is actually working.</p><p>This is exactly what we&#8217;ve been building toward. From the container&#8217;s point of view, it has its own dedicated slice of the GPU. </p><p>At the same time, other pods can use the remaining capacity on the same card, without any of them being aware of each other.</p><h3>What you&#8217;ve actually verified</h3><p>At this point, your understanding should have changed a bit.</p><p>Before, the thinking was simple: test GPU &#8594; install HAMi &#8594; done.</p><p>Now it&#8217;s more structured: set up the runtime &#8594; make sure the GPU really works &#8594; label the node &#8594; install HAMi with the correct version &#8594; check the scheduler &#8594; finally verify a real vGPU workload with enforced limits.</p><p>By going through this process, you&#8217;ve proven a few important things.</p><p>The NVIDIA container runtime is correctly set up. The GPU stack works end-to-end in k8s. Your GPU nodes are properly labeled and picked up by the HAMi scheduler.</p><p>HAMi itself is installed with the right version. It doesn&#8217;t break existing GPU workloads. And most importantly, resource limits like <code>gpumem</code> are actually enforced inside the container.</p><p>What this guide doesn&#8217;t cover is also important. We didn&#8217;t go into <code>gpucores</code> for limiting compute, or fairness and multi-tenant scheduling, or more advanced setups like multi-GPU and MIG.</p><p>All of those build on top of the foundation you&#8217;ve just validated.</p><h3>Common failure scenarios</h3><p>Not everything in life goes according to plan. Especially when it comes to such a complex issue, even if expectations are high, it&#8217;s only natural to encounter challenges in the real world.</p><p><strong>GPU not visible in k8s</strong></p><pre><code>kubectl get nodes -o jsonpath=&#8217;{.items[*].status.allocatable}&#8217;</code></pre><p>If <code>nvidia.com/gpu</code> is missing, the device plugin is not working or the container runtime is misconfigured. Revisit Step 1.</p><p><strong>Pod stuck in ContainerCreating</strong></p><p>Almost always means runtime hook issues or missing NVIDIA libraries. The pod was scheduled, it just can&#8217;t start. Revisit the <code>containerd</code> or Docker runtime config in Step 1.</p><p><strong>nvidia-smi works outside k8s but not inside a pod</strong></p><p>The container runtime is not wired into k8s correctly. The device plugin cannot pass GPU access into the pod. Fix Step 1 before continuing.</p><p><strong>HAMi scheduler in CrashLoopBackOff</strong></p><p>Almost certainly a version mismatch between the scheduler image tag and your k8s server version. Re-run <code>kubectl version</code>, then reinstall with the correct <code>imageTag</code>.</p><p><strong>gpu-pod stays Pending after HAMi install</strong></p><p>Check that the node has the <code>gpu=on</code> label. Without it, HAMi&#8217;s scheduler ignores the node entirely.</p><h3>Common misconceptions</h3><p>I&#8217;d like to touch on a few points that will prevent you from letting your imagination run wild.</p><p><strong>&#8220;HAMi is not working.&#8221;</strong> If <code>nvidia-smi</code> runs in a plain GPU pod, the base stack is fine. HAMi is a scheduler extension, verify the scheduler image version and node labels before assuming HAMi itself is broken.</p><p><strong>&#8220;The scheduler is running, so GPU sharing is active.&#8221;</strong> Scheduler presence confirms deployment. Actual resource enforcement is only confirmed by exec-ing into a running vGPU pod and checking memory limits, as shown in Step 8.</p><p><strong>&#8220;I don&#8217;t need to set the imageTag. HAMi figures it out.&#8221;</strong> In most recent versions this is true, but version mismatches are still the top cause of silent failures. When in doubt, set it explicitly.</p><h3>Troubleshooting order</h3><p>When sth goes wrong, always debug in this sequence:</p><ol><li><p>NVIDIA drivers and nvidia-container-toolkit</p></li><li><p>Container runtime config (Docker or containerd)</p></li><li><p>Device plugin status</p></li><li><p>Node resource advertisement and <code>gpu=on</code> label</p></li><li><p>HAMi scheduler version and logs</p></li></ol><pre><code>kubectl delete pod cuda-test
kubectl delete pod gpu-pod</code></pre><h2>End</h2><p>At the end of the day, HAMi is not some magic layer you just install and forget. It only works as well as the foundation underneath it. If your GPU stack is solid, HAMi gives you a clean and practical way to actually use your hardware efficiently instead of letting it sit idle. </p><p>Start simple. Validate each layer. Don&#8217;t assume anything is working until you&#8217;ve seen it yourself inside the container.</p><p><strong>I&#8217;d like to express my thanks to <a href="https://www.jetbrains.com/">JetBrains</a> for their support.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DhDm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DhDm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png 424w, https://substackcdn.com/image/fetch/$s_!DhDm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png 848w, https://substackcdn.com/image/fetch/$s_!DhDm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png 1272w, https://substackcdn.com/image/fetch/$s_!DhDm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DhDm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png" width="1192" height="256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:256,&quot;width&quot;:1192,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:35569,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/193060113?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DhDm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png 424w, https://substackcdn.com/image/fetch/$s_!DhDm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png 848w, https://substackcdn.com/image/fetch/$s_!DhDm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png 1272w, https://substackcdn.com/image/fetch/$s_!DhDm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb2adeab-bc3d-4cb9-b3b1-f0dee6eace3d_1192x256.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[Traces, Metrics, and Logs: What They Actually Do]]></title><description><![CDATA[Why traces, metrics, and logs work better together than apart]]></description><link>https://mesutoezdil.substack.com/p/traces-metrics-and-logs-what-they</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/traces-metrics-and-logs-what-they</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Mon, 09 Feb 2026 10:11:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!W0Ow!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9153862-886a-444c-a73a-19650e3c8bd6_800x429.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve been quiet on <a href="https://www.linkedin.com/in/mesut-oezdil/">LinkedIn</a> for a while. </p><p>Haven&#8217;t posted articles anywhere. Not because I wasn&#8217;t making stuff. The opposite actually. </p><p>Shallow prompt posts and recycled &#8220;AI&#8221; stuff had turned my feed into noise. </p><p>People sharing prompt outputs like they just invented the wheel and used their own brain was tiring me out. </p><p>I&#8217;m back though, with one simple rule: real content, stuff that works, stuff you can actually use again.</p><p>So yeah, I started posting about <a href="https://opentelemetry.io/">OTel</a>. </p><p>This is probably my first real article about it. Let&#8217;s just jump in.</p><p>I think we can all agree on this: when systems get messy, trying to figure out what&#8217;s wrong from one graph or one log line gets really hard. </p><p>You know that feeling during an outage when you open a bunch of dashboards and still can&#8217;t find a clear answer? Super familiar. </p><p>That&#8217;s why teams now look at three things together: traces, metrics, and logs. I posted the basic stuff about these before with the <a href="https://www.linkedin.com/posts/mesut-oezdil_otelday-traces-observability-share-7424401070962937856-KN37?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAD40WRwBqEbaK2JN941f2VqsR4RL0jnH00Y">#OTelDay-2</a> tag.</p><p>But it&#8217;s not really about definitions. </p><p>The real deal is: what question does each one answer? </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://mesutoezdil.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://mesutoezdil.substack.com/subscribe?"><span>Subscribe now</span></a></p><p>When do you use which one? And how do they tell a story together? </p><p>I think that&#8217;s the best way to learn. Hard to find the right answer if you don&#8217;t ask the right question first.</p><p>A trace shows one request&#8217;s trip through your system. </p><p>The request goes through services. Sits in a queue. Hits the database. </p><p>Sometimes calls an outside API. Each step on that trip is a span. </p><p>Each span usually has the name of what it&#8217;s doing. </p><p>When it started, how long it took, did it work, did it break... All that lives on the span. And there&#8217;s also tags. They give you context. Like <code>db.system = PostgreSQL</code> or <code>http.route = /checkout</code>.</p><p>Now we can clearly see what traces do. </p><p>Traces answer these questions: &#8220;Which step was slow?&#8221; &#8220;Where exactly did it break?&#8221; &#8220;How did this mess up the whole thing?&#8221;</p><p>You look at traces when you want to see what caused what. </p><p>Same when you want to understand the order stuff happened. That&#8217;s what gives you the big picture.</p><p>In my <a href="https://www.linkedin.com/posts/mesut-oezdil_otelday-otel-jaeger-share-7424021328342458369-IsTY?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAD40WRwBqEbaK2JN941f2VqsR4RL0jnH00Y">#OTelDay-1</a> post I really pushed one thing: <strong>context propagation</strong>. </p><p>Basically the same trace ID following the request through all the services. </p><p>If that breaks, the trace breaks. The story stays half-told.</p><p>Another key thing is grabbing the right tags from the start. This is the info that actually helps when stuff breaks. </p><p>Customer type, region, order ID, stuff like that. </p><p>That way you don&#8217;t drown in hundreds of traces. You narrow it down fast.</p><p>So here&#8217;s what we&#8217;re learning: traces show you cause and effect and order. </p><p>Context needs to follow through. Important tags need to be grabbed early. </p><p>So instead of tracking every single line of code, focus on the key spots. </p><p>Incoming requests, outgoing calls, database stuff. Traces tell us one user&#8217;s trip.</p><p>But how do we know when something&#8217;s hitting lots of users at once? </p><p>That&#8217;s where metrics come in. The basic question metrics ask is: &#8220;Is this normal?&#8221;</p><p>Metrics are numbers over time. They tell you there&#8217;s a spike. Traces show you where.</p><p>One of the biggest mistakes with metrics is setting alerts based on averages. </p><p>Because averages hide user pain. A few users can have a terrible time while tons of fast requests keep the average looking fine. </p><p>Another mistake is making random metrics for each service. Way better to have a shared set of core metrics across services.</p><p>We&#8217;ve asked the right questions so far. But there&#8217;s still something missing: details.</p><p>When we want to know exactly what went down, logs come in. Logs answer: &#8220;What exactly happened?&#8221;</p><p>Logs have parameters, choices that were made, error dumps, security stuff. </p><p>When they&#8217;re organized and linked to trace and span IDs, they&#8217;re super powerful.</p><p>They give you clarity without all the noise.</p><p>Logging mistakes happen a lot too. Messy and super detailed logs are hard to search and expensive. </p><p>Another bad one is logging sensitive stuff. That&#8217;s why you gotta mask things where needed.</p><p>Now we know what these three things do. </p><p>But the real magic is when they work together. During an outage, it usually goes like this:</p><p>First, metrics sound the alarm. Like p95 delay for <code>/checkout</code> jumps from 300 milliseconds to 2.2 seconds.</p><p>Then you check traces. You filter for that same time and <code>/checkout</code>. You see most slow traces are spending 1.8 seconds on the payment step.</p><p>Next you pull up logs for those same trace IDs. You notice timeouts when calling the payment gateway when <code>fraud_check = true</code>.</p><p>Finally you check dependency metrics. You confirm outside API errors only went up in East US.</p><p>Each step clears things up a bit more.</p><p>Bottom line: metrics sound the alarm. Traces show where the problem is. </p><p>Logs confirm what happened. Together they make a messy situation something you can actually solve step by step. </p><p>For this to work smoothly across teams, you need standards. That&#8217;s exactly why OpenTelemetry rules matter. I talked about this more in my <a href="https://www.linkedin.com/posts/mesut-oezdil_otelday-metric-metric-share-7424837159116918784-xaR8?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAD40WRwBqEbaK2JN941f2VqsR4RL0jnH00Y">#OTelDay-3</a> post.</p><p>Let me wrap up with a quick rule: </p><p>Check metrics first. To understand how big it is. </p><p>Then go to traces. To find the bottleneck. </p><p>Dig into logs if you need details.</p><p>And one last thing: don&#8217;t trust averages for alerts. Averages hide user pain. </p><p>Percentiles show what your slowest users are really going through. Most of the time, that&#8217;s where the real story is.</p><p>Now let me quickly recap last week&#8217;s posts so everything really sticks.</p><p><code>#OTelDay-0</code> </p><p>I wanted us to ask ourselves: &#8220;If you already have metrics and logs, why are you still blind in prod?&#8221; A reminder that OTel is awesome and will solve a ton of problems by giving us a common language for observability.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/gif&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9153862-886a-444c-a73a-19650e3c8bd6_800x429.gif&quot;}],&quot;caption&quot;:&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/gif&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9153862-886a-444c-a73a-19650e3c8bd6_800x429.gif&quot;}},&quot;isEditorNode&quot;:true}"></div><p><code>#OTelDay-1</code> </p><p>Metrics check the system&#8217;s pulse. Response times and error rates show us the general health. Logs give clearer answers to &#8220;What happened?&#8221; They show what went down behind the scenes. Traces show a request&#8217;s journey between services. Where it slowed down, where it got stuck. And the most important piece connecting everything: context propagation. Keeps traces from breaking in distributed systems.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/gif&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45cd0aae-1104-4684-875f-672b63d047ad_794x1096.gif&quot;}],&quot;caption&quot;:&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/gif&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45cd0aae-1104-4684-875f-672b63d047ad_794x1096.gif&quot;}},&quot;isEditorNode&quot;:true}"></div><p><code>#OTelDay-2</code> </p><p>OTel is an open-source standard for collecting telemetry data. Instead of every framework or vendor doing observability their own way, OTel gives us a shared model. Shows how events in the system connect to each other in a clear and consistent way. OTel isn&#8217;t a logging tool. Doesn&#8217;t replace your logs or monitoring system. Doesn&#8217;t store data. Doesn&#8217;t show dashboards. OTel is a standard. Defines what telemetry data should look like and how to collect it. So different tools can work together smoothly.</p><p><code>#OTelDay-3</code> </p><p>If your measurement language isn&#8217;t clear, your decisions won&#8217;t be clear either. This is exactly where semantic conventions become critical. With semantic conventions you can put services side by side. Metric names don&#8217;t get messed up over time. You don&#8217;t have to rewrite everything when you switch vendors.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/gif&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1d92b10-fd49-4f9a-b97b-5bc4b47e5f43_800x682.gif&quot;}],&quot;caption&quot;:&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/gif&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1d92b10-fd49-4f9a-b97b-5bc4b47e5f43_800x682.gif&quot;}},&quot;isEditorNode&quot;:true}"></div><p>That&#8217;s the rundown. </p><p>Keep it simple, keep it real, and your systems will thank you.</p><p>Got questions? I&#8217;m here.</p><p></p>]]></content:encoded></item><item><title><![CDATA[OpenShift-3]]></title><description><![CDATA[OpenShift Networking (an introductory effort)]]></description><link>https://mesutoezdil.substack.com/p/openshift-3</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/openshift-3</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Tue, 25 Mar 2025 13:42:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!qZaX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h4><strong>&gt; OpenShift Network Architecture</strong></h4><p>In this article I will give an introduction to OpenShift&#8217;s network design. Let me say at the outset that I have no (professional) affiliation with RedHat or OpenShift. I share these articles and related posts on <a href="https://www.linkedin.com/in/mesut-oezdil/">LinkedIn</a> with the intention of helping myself first and then others. I think that&#8217;s obvious, so let&#8217;s get to the point. </p><p>OpenShift&#8217;s network design helps containers, services, and users communicate smoothly inside and outside the cluster. It expands on k8s networking with extra tools for control and visibility. A lot of things we know from k8s we won&#8217;t repeat here, but we will still point out some reminders. Each Pod in OpenShift gets its own IP and communicates through a flat network called the Cluster Network. K8s Services hide these IP addresses behind stable IPs and DNS names.</p><p>OpenShift handles internal networking with CNI plugins. OpenShift SDN used to be the default, but from OpenShift 4.6 onward, OVN-Kubernetes is recommended. Both plugins meet basic networking needs, but OVN-Kubernetes provides better scalability, security, and flexibility. Let&#8217;s take a closer look at what this all means.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qZaX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qZaX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif 424w, https://substackcdn.com/image/fetch/$s_!qZaX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif 848w, https://substackcdn.com/image/fetch/$s_!qZaX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif 1272w, https://substackcdn.com/image/fetch/$s_!qZaX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qZaX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif" width="981" height="1058" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1058,&quot;width&quot;:981,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:750550,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/159824568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qZaX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif 424w, https://substackcdn.com/image/fetch/$s_!qZaX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif 848w, https://substackcdn.com/image/fetch/$s_!qZaX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif 1272w, https://substackcdn.com/image/fetch/$s_!qZaX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe6d06a3-700a-4e70-8a35-ef445972d597_981x1058.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Internal Communication (Pod-to-Pod Communication)</strong></h4><p>OpenShift uses CNI plugins to manage internal pod networking. CNI plugins handle IP assignment, routing, and security for OpenShift pods. Each pod receives a unique IP address and communicates over the Cluster Network using K8s-native services. These services provide a stable Cluster IP to access pods across nodes.</p><p>Supported CNI plugins include OpenShift SDN (default in older versions) and OVN-Kubernetes (default in OpenShift 4.6+). These plugins facilitate pod networking, traffic policies, and service connectivity. Well, what is this SDN?</p><p>&#9758; <strong>Software-Defined Networking (SDN)</strong></p><p>OpenShift SDN is a built-in CNI plugin that manages pod communication and enforces policies using software-defined overlays. It offers three operational modes:</p><p><em>Cluster Network Mode</em> &#8674; All pods can communicate freely across the cluster</p><p><em>Multitenant Mode</em> &#8674; Enforces isolation between projects (namespaces)</p><p><em>Subnet Mode</em> &#8674; Each node has its own subnet for pod IPs</p><p>OpenShift SDN is now deprecated in favor of OVN-K8s, which is the default CNI plugin since OpenShift 4.6. SDN is simple to use and good for small-to-medium systems but might struggle in larger setups. OpenShift SDN&#8217;s centralised control plane can become a bottleneck in clusters with 1000+ nodes.</p><p>Open vSwitch (OVS) powers SDN, creating virtual bridges and supporting overlay technologies like VXLAN or GENEVE. Features of OVS include:</p><p>&#8677; Virtual bridging of pod traffic</p><p>&#8677; Flow control using OpenFlow rules</p><p>&#8677; MAC address management</p><p>&#8677; Overlay networking using VXLAN or GENEVE</p><p>VXLAN is used with OpenShift SDN to expand virtual networks. GENEVE is supported by OVN-K8s and offers enhanced flexibility compared to VXLAN.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jPCf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jPCf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif 424w, https://substackcdn.com/image/fetch/$s_!jPCf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif 848w, https://substackcdn.com/image/fetch/$s_!jPCf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif 1272w, https://substackcdn.com/image/fetch/$s_!jPCf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jPCf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif" width="901" height="642" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:901,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:147378,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/159824568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jPCf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif 424w, https://substackcdn.com/image/fetch/$s_!jPCf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif 848w, https://substackcdn.com/image/fetch/$s_!jPCf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif 1272w, https://substackcdn.com/image/fetch/$s_!jPCf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23ce9c5c-fdd1-4daa-aac1-075563826f81_901x642.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#9758; <strong>OVN-Kubernetes</strong></p><p>OVN-K8s has replaced OpenShift SDN as the default CNI provider, offering enhanced scalability, security, and flexibility for modern OpenShift deployments. Built on Open Virtual Network (OVN), this plugin introduces a logical networking architecture that leverages:</p><p>&#8677; Logical switches/routers (managed through OVSDB)</p><p>&#8677; Distributed control plane (improves scalability vs SDN&#8217;s centralised model)</p><p>&#8677; GENEVE encapsulation (more flexible than VXLAN)</p><p>The traditional OpenShift SDN modes (Cluster/Multitenant/Subnet) are superseded by OVN&#8217;s unified architecture, which provides:</p><p>&#8677; Automated IP address management (no manual subnet allocation)</p><p>&#8677; Distributed east-west routing (no single chokepoint)</p><p>&#8677; Integration with OpenStack/VM workloads (via OVN provider)</p><p>While OVN introduces minimal overhead (~5-8% vs SDN), its distributed architecture scales better beyond 500 nodes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!79Fm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!79Fm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif 424w, https://substackcdn.com/image/fetch/$s_!79Fm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif 848w, https://substackcdn.com/image/fetch/$s_!79Fm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif 1272w, https://substackcdn.com/image/fetch/$s_!79Fm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!79Fm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif" width="900" height="642" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:284482,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/159824568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!79Fm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif 424w, https://substackcdn.com/image/fetch/$s_!79Fm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif 848w, https://substackcdn.com/image/fetch/$s_!79Fm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif 1272w, https://substackcdn.com/image/fetch/$s_!79Fm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F562d6595-39ca-4fb2-881b-243a31bd223a_900x642.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>External Communication</h4><p>I hope that the internal communication has become clear in a nutshell. In fact, for those familiar with the k8s structure, the external structure will be similarly obvious. OpenShift exposes services to external users using three main mechanisms:</p><p><em>NodePort</em>: Binds services to a static port on each worker node. Suitable for development or testing but not recommended for production. Binds services to a static port (30000-32767) on every worker node.</p><p><em>LoadBalancer</em>: Integrates with cloud or on-prem load balancers (e.g., AWS ELB, Azure LB). Distributes external traffic to internal services.</p><p><em>Ingress Controller</em>: A HAProxy-based controller that handles HTTP/HTTPS routing via OpenShift Routes. It&#8217;s the default and most flexible option for exposing web applications. And OpenShift&#8217;s HAProxy-based Ingress Controller (default) enables advanced L7 routing via Route objects.</p><p>The Ingress Controller receives incoming requests and maps them to the appropriate service within the cluster using DNS-based routing and predefined rules.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f1qm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f1qm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif 424w, https://substackcdn.com/image/fetch/$s_!f1qm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif 848w, https://substackcdn.com/image/fetch/$s_!f1qm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif 1272w, https://substackcdn.com/image/fetch/$s_!f1qm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f1qm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif" width="912" height="965" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:965,&quot;width&quot;:912,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:470517,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/159824568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f1qm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif 424w, https://substackcdn.com/image/fetch/$s_!f1qm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif 848w, https://substackcdn.com/image/fetch/$s_!f1qm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif 1272w, https://substackcdn.com/image/fetch/$s_!f1qm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae4558da-13dc-47e5-bd39-34bd7cb9bfc0_912x965.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Routing in OpenShift</strong></h4><p>OpenShift leverages DNS-based service discovery and Route objects to manage both internal and external application access. Routes (custom resources in route.openshift.io/v1) expose services externally via the Ingress Controller. Each namespace automatically receives internal DNS addresses in the following format:</p><pre><code><code>my-service.my-namespace.svc.cluster.local</code></code></pre><p>OpenShift Routes provide a mechanism to expose services externally through hostnames (e.g., myapp.apps.example.com). When a Route resource is created, OpenShift&#8217;s Ingress Controller (typically the HAProxy-based default ingress controller) processes hostname and path-based routing rules specified in the Route definition and forwards external traffic to the appropriate backend service. However, the OpenShift DNS Operator itself typically manages internal cluster DNS resolution (*.cluster.local) rather than external DNS records. External DNS management (like creating a DNS entry such as myapp.apps.example.com) usually requires:</p><p>&#8677; Manual DNS configuration (if external DNS is managed separately)</p><p>&#8677; Integration with the OpenShift ExternalDNS Operator (if automation is desired)</p><p>By default, OpenShift doesn&#8217;t automatically create external DNS entries unless you specifically configure it using an additional operator like the ExternalDNS Operator. Last but not least, when a Route resource is created:</p><p>&#8677; The Ingress Controller (default is HAProxy-based) inspects the Route definition,</p><p>&#8677; Based on the specified hostname and optional path rules, it routes incoming external traffic to the correct backend service within the cluster,</p><p>&#8677; External DNS records (e.g., myapp.apps.example.com) are typically managed separately, either manually or via OpenShift&#8217;s ExternalDNS Operator.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wKjC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wKjC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif 424w, https://substackcdn.com/image/fetch/$s_!wKjC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif 848w, https://substackcdn.com/image/fetch/$s_!wKjC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif 1272w, https://substackcdn.com/image/fetch/$s_!wKjC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wKjC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif" width="961" height="984" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:984,&quot;width&quot;:961,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:404130,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/159824568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wKjC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif 424w, https://substackcdn.com/image/fetch/$s_!wKjC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif 848w, https://substackcdn.com/image/fetch/$s_!wKjC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif 1272w, https://substackcdn.com/image/fetch/$s_!wKjC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab5ea4f3-df9b-450a-8ef5-fce1f15ace32_961x984.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Service Mesh</h4><p>It is also important to address this point. OpenShift provides robust support for Service Mesh, enabling secure and observable communication between microservices within the cluster. The primary implementation is the OpenShift Service Mesh (OSSM), built upon the Istio service mesh framework.</p><p>Service Mesh capabilities include:</p><p><em>Traffic Managemen</em>t &#8674; Intelligent load balancing, advanced routing rules, circuit breaking, retries, and traffic shifting.</p><p><em>Security</em> &#8674; Secure service-to-service communication enforced via mutual TLS encryption.</p><p><em>Observability</em> &#8674; Integrated monitoring capabilities including metrics collection, distributed tracing, and logging.</p><p>And how it works? Each pod participating in the service mesh contains an Envoy sidecar proxy, which intercepts and manages all incoming and outgoing network traffic. The Istio Control Plane (primarily Istiod in modern Istio versions) centrally manages and distributes configuration and policy settings to the Envoy proxies. OpenShift Service Mesh leverages Kubernetes Operators to streamline the deployment, configuration, and lifecycle management of all mesh components, significantly simplifying operational complexity.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E5Yn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E5Yn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif 424w, https://substackcdn.com/image/fetch/$s_!E5Yn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif 848w, https://substackcdn.com/image/fetch/$s_!E5Yn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif 1272w, https://substackcdn.com/image/fetch/$s_!E5Yn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E5Yn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif" width="931" height="642" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:931,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:151550,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/159824568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E5Yn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif 424w, https://substackcdn.com/image/fetch/$s_!E5Yn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif 848w, https://substackcdn.com/image/fetch/$s_!E5Yn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif 1272w, https://substackcdn.com/image/fetch/$s_!E5Yn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296da915-0396-4ea8-8f87-1e0088e6e6cc_931x642.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Egress (Outbound Traffic) Management</h4><p>Egress defines how Pods within your OpenShift cluster access external networks such as the internet or third-party services. It determines which IP addresses are used for outbound traffic and controls which domains or IP ranges can be reached.</p><p>In some scenarios, it is necessary for Pods to consistently use a fixed external IP address&#8212;for example, to comply with firewall rules or access control lists enforced by external APIs or databases that only allow traffic from known IPs.</p><p>In OpenShift, egress IP management is handled by CNI plugins like OpenShift SDN or OVN-K8s. These plugins allow administrators to define which node handles outbound traffic and which external IP addresses are assigned. Egress firewall rules can also be applied to limit access to specific domains or IPs. For instance, you may want to allow traffic only to trusted domains such as *.mycompany.com while blocking all other external access. Common Use Cases Include:</p><p>&#8677; Restricting traffic to specific external services or destinations</p><p>&#8677; Allowing only approved IP ranges for outbound communication</p><p>&#8677; Assigning fixed public IPs to Pods for predictable firewall rule configuration</p><p>OVN-Kubernetes offers built-in support for egress firewall policies, allowing fine-grained control over outbound traffic behavior per namespace.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gAXV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gAXV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif 424w, https://substackcdn.com/image/fetch/$s_!gAXV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif 848w, https://substackcdn.com/image/fetch/$s_!gAXV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif 1272w, https://substackcdn.com/image/fetch/$s_!gAXV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gAXV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif" width="962" height="789" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:789,&quot;width&quot;:962,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:345151,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/159824568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gAXV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif 424w, https://substackcdn.com/image/fetch/$s_!gAXV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif 848w, https://substackcdn.com/image/fetch/$s_!gAXV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif 1272w, https://substackcdn.com/image/fetch/$s_!gAXV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e30511-be4c-4890-9cd1-39408d8c94ad_962x789.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>DNS Management</strong></h4><p>OpenShift provides comprehensive DNS resolution capabilities, automatically managing both internal and external networking components such as Services, Pods, and Routes. This integrated DNS functionality is crucial for seamless service discovery, efficient internal routing, and secure external exposure of workloads.</p><p>Internally, OpenShift assigns a predictable DNS record to every Service within a namespace using a standardized naming convention:</p><pre><code>service-name.namespace.svc.cluster.local</code></pre><p>These internal fully qualified domain names (FQDNs) ensure that Pods can reliably discover and communicate with other services within the cluster. The OpenShift DNS Operator manages internal DNS resolution by leveraging system-level components like <em>dnsmasq</em> or <em>bind</em>. As a system-level operator, it handles internal DNS queries efficiently, maintaining consistent performance and reliability.</p><p>Externally, the exposure of applications is achieved through Route objects. When a Route is created, OpenShift automatically generates a domain name following a structured convention:</p><pre><code>&lt;route-name&gt;-&lt;namespace&gt;.apps.&lt;cluster-domain&gt;</code></pre><p>For example, creating a route named <em>myapp</em> in the <em>dev</em> namespace on a cluster with the base domain ocp4.example.com results in:</p><pre><code>myapp-dev.apps.ocp4.example.com</code></pre><p>However, although OpenShift internally generates this route, making it resolvable externally typically requires further action. External DNS entries need to be configured either manually or via OpenShift&#8217;s ExternalDNS Operator. The ExternalDNS Operator seamlessly integrates with external DNS providers, such as AWS Route 53 or Azure DNS, automating public DNS record management.</p><p>To accommodate enterprise-level requirements, OpenShift supports extensive DNS customization options. Organizations can implement custom root domains or wildcard DNS entries for enhanced control over application routing and branding. Custom DNS management can be carried out manually or through automated API-driven approaches, provided proper DNS delegation and TLS certificate management procedures are established.</p><p>Starting with OpenShift 4.x, DNS configurations have become more flexible and are managed natively through Kubernetes Custom Resource Definitions (CRDs). This advancement allows administrators to easily define DNS forwarding rules, create private DNS zones, specify upstream resolvers, and utilize sophisticated DNS capabilities such as split-horizon DNS or namespace-specific DNS policies. Internal DNS queries within the cluster rely exclusively on the reserved cluster.local domain, whereas external DNS resolutions depend upon the cluster&#8217;s configured base domain. All DNS operations strictly adhere to OpenShift&#8217;s RBAC policies and namespace isolation principles, ensuring secure, compliant, and reliable DNS resolution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ei_0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ei_0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif 424w, https://substackcdn.com/image/fetch/$s_!Ei_0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif 848w, https://substackcdn.com/image/fetch/$s_!Ei_0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif 1272w, https://substackcdn.com/image/fetch/$s_!Ei_0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ei_0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif" width="993" height="734" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:734,&quot;width&quot;:993,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:715025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/159824568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ei_0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif 424w, https://substackcdn.com/image/fetch/$s_!Ei_0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif 848w, https://substackcdn.com/image/fetch/$s_!Ei_0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif 1272w, https://substackcdn.com/image/fetch/$s_!Ei_0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc737f0e-af68-4b55-b5b7-91addccfb6cc_993x734.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>Network Structure in terms of Security</h4><p>Network security within OpenShift is enforced primarily through NetworkPolicies, a native Kubernetes mechanism designed to manage and control Pod-to-Pod and external communication. By default, many Container Network Interface (CNI) implementations allow unrestricted communication between Pods across namespaces and nodes. NetworkPolicies provide administrators the capability to override this behavior, implementing fine-grained rules based on criteria such as Pod labels, namespaces, IP address blocks, ports, and protocols.</p><p>A common scenario involves securing database services to ensure only authorized application components can access them. For instance, consider an environment where you have an application Pod labeled app=backend and another administrative Pod labeled app=admin. To ensure that only the backend application Pods can access a database Pod labeled app=database on port 5432/TCP, a specific NetworkPolicy would be configured. This policy explicitly allows access from the backend Pods while denying all other traffic, effectively creating a robust network security boundary.</p><p>NetworkPolicies in OpenShift are expressed through standard k8s YAML manifests and can be deployed using the command-line interface (oc apply -f &lt;yaml_file&gt;). Enforcement of these policies depends on the networking layer&#8217;s support, with plugins such as OVN-K8s fully compliant with K8s NetworkPolicies. Additionally, administrators can manage NetworkPolicies visually through the OpenShift Web Console, streamlining the process of defining and reviewing Pod and service traffic rules.</p><p>It&#8217;s crucial to recognise that the effectiveness and default behavior of NetworkPolicies can vary significantly depending on the underlying CNI plugin in use. Certain plugins operate under a permissive <em>allow all by default</em> model, whereas others implement a more restrictive <em>deny by default</em> strategy. In the restrictive model, explicit rules must be defined to allow necessary traffic, significantly influencing the cluster&#8217;s default security posture. Understanding these distinctions is vital when planning secure deployments, especially in multi-tenant or regulated environments.</p><h4><strong>Conclusion</strong></h4><p>So, that&#8217;s OpenShift networking in a nutshell! We&#8217;ve seen how OpenShift builds on standard k8s networking, adding powerful features like OVN-K8s for scalability, Service Mesh for secure microservices communication, and flexible DNS management for internal and external service discovery. NetworkPolicies help you define clear security boundaries, while Egress management ensures controlled external access. At the end of the day, getting comfortable with these concepts will make managing your OpenShift cluster easier and more secure&#8212;letting you focus less on infrastructure headaches and more on building great apps.</p>]]></content:encoded></item><item><title><![CDATA[OpenShift-2]]></title><description><![CDATA[OpenShift Ecosystem]]></description><link>https://mesutoezdil.substack.com/p/openshift-2</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/openshift-2</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Tue, 11 Mar 2025 10:55:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Zvro!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>OpenShift is a strong platform because it offers flexibility and enterprise-level features. But its different versions and deployment models can be confusing. In this article, I&#8217;ll explain OKD, OCP, OSD, ROSA, ARO, and other OpenShift options to help you understand which one is the best fit for different needs.</p><h4>OKD &gt; The Open-Source Base of OpenShift</h4><p>Let&#8217;s start with the <em>core</em> of OpenShift. OKD is the open-source, community-supported version of OpenShift. It was previously called &#8220;OpenShift Origin&#8221; and serves as the base for Red Hat&#8217;s commercial products. Red Hat first builds OKD, then adds extra security, management tools, and support to create its enterprise solutions.</p><p>Why is it called &#8220;OKD&#8221;? It&#8217;s <strong>not</strong> an abbreviation. Red Hat isn&#8217;t allowed to use &#8220;Kubernetes&#8221; in its product names because of Linux Foundation&#8217;s rules. So, they named the community version &#8220;OKD.&#8221; It runs on physical servers, virtual machines, and private cloud setups, relying completely on community support. It also receives experimental updates and features first. Later, if these features are stable, they often appear in Red Hat&#8217;s commercial versions. This means OKD is very flexible for testing new ideas, but it does <strong>not</strong> have official Red Hat support. Companies that need a strong Service Level Agreement usually pick a commercial version.</p><h4>OCP &gt; OpenShift Container Platform by Red Hat</h4><p>If you want to use OpenShift in a business setting, OCP is the best choice. You can think of OCP as the professional version of OKD, provided by Red Hat. It comes with better security, enterprise support, and extra management tools&#8212;features that make many companies trust OCP for running securely on physical servers, virtual machines, and private cloud environments. All managed OpenShift services (OSD, ROSA, ARO) are built on OCP, solidifying it as the core of the enterprise OpenShift experience. Along with official support and regular updates, Red Hat includes special security features such as <a href="https://docs.openshift.com/container-platform/4.8/authentication/using-rbac.html">RBAC</a> and <a href="https://docs.openshift.com/container-platform/3.11/admin_guide/manage_scc.html">Security Context Constraints</a>, which let teams manage permissions precisely and help keep containers more secure.</p><p>Building on those strong enterprise capabilities, newer versions of OpenShift introduced an operator-based architecture. Operators automate tasks like installing software, monitoring services, and upgrading them inside the cluster, reducing manual work for admins. OpenShift offers <a href="https://docs.openshift.com/container-platform/4.7/openshift_images/using_images/using-s21-images.html">Source-to-Image</a>, a feature that can convert application code directly into a container image. This streamlines the process of moving from source code to a running container, allowing developers to focus more on coding and less on configuration.</p><p>To complete the picture, OpenShift seamlessly integrates with popular CI/CD tools such as Jenkins, Tekton, and ArgoCD. This means you can set up pipelines that automatically build and deploy code, so when developers push changes to a Git repository, OpenShift takes care of building new container images and rolling out updates. GitOps tools like ArgoCD help keep infrastructure in sync with the code, enhancing reliability. By combining these features, OpenShift provides a comprehensive ecosystem that supports both operational efficiency and developer productivity.</p><h4>OSD, ROSA, and ARO &gt; Managed OpenShift Services</h4><p>If you don&#8217;t want to manage your own infrastructure and prefer a fully managed OpenShift experience, OSD, ROSA, and ARO are good options.</p><p><em>OSD (OpenShift Dedicated)</em> &#8594; A managed OpenShift version that runs on AWS and GCP, fully handled by Red Hat. No need for internal management, Red Hat takes care of everything.</p><p><em>ROSA (Red Hat OpenShift Service on AWS)</em> &#8594; A fully integrated OpenShift service provided by AWS and Red Hat together. It works closely with AWS services while keeping the OpenShift experience.</p><p><em>ARO (Azure Red Hat OpenShift)</em>  &#8594; A managed OpenShift service on Microsoft Azure, run by Red Hat and Microsoft. It integrates well with Azure resources and works smoothly with Microsoft services.</p><p>These services are great for companies that want to start using OpenShift quickly without dealing with infrastructure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Zvro!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Zvro!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif 424w, https://substackcdn.com/image/fetch/$s_!Zvro!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif 848w, https://substackcdn.com/image/fetch/$s_!Zvro!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif 1272w, https://substackcdn.com/image/fetch/$s_!Zvro!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Zvro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif" width="1251" height="1309" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1309,&quot;width&quot;:1251,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:974167,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158835633?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Zvro!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif 424w, https://substackcdn.com/image/fetch/$s_!Zvro!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif 848w, https://substackcdn.com/image/fetch/$s_!Zvro!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif 1272w, https://substackcdn.com/image/fetch/$s_!Zvro!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F706055b9-b4ea-4a7a-8236-04d4f63ad0e1_1251x1309.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With these managed services, you can focus on developing and scaling your apps. Red Hat and the cloud providers handle security patches, upgrades, and hardware management. This saves time and resources for your team.</p><h4>Edge Computing and Virtualization &gt; Expanding OpenShift&#8217;s Capabilities</h4><p>OpenShift is not just for big data centers and cloud environments. It also works well for edge computing and virtualization.</p><p><em>OpenShift Edge Computin</em>g  &#8594;  Allows small-scale OpenShift setups for IoT, 5G, and retail apps. It provides a lightweight and optimized OpenShift experience for remote locations.</p><p><em>OpenShift Virtualization</em>  &#8594;  Lets you run virtual machines on K8s. This makes it easier to move traditional data center workloads to OpenShift and K8s environments.</p><p>These options are useful for businesses that want to support hybrid and multi-cloud setups. For example, a company with a main data center could run OCP for primary workloads, while using Edge clusters in different branch offices for local data processing. If the company still needs some old VMs, they can run those inside OpenShift too, thanks to OpenShift Virtualization.</p><h4>How OpenShift Versions Connect</h4><p>Let&#8217;s look at how OpenShift versions relate to each other. So Hierarchy is: </p><p><code>OKD &#8594; OCP &#8594; OSD, ROSA, ARO, Edge, Virtualization</code></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N5Gh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N5Gh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif 424w, https://substackcdn.com/image/fetch/$s_!N5Gh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif 848w, https://substackcdn.com/image/fetch/$s_!N5Gh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif 1272w, https://substackcdn.com/image/fetch/$s_!N5Gh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N5Gh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif" width="1175" height="838" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:838,&quot;width&quot;:1175,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:251811,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158835633?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N5Gh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif 424w, https://substackcdn.com/image/fetch/$s_!N5Gh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif 848w, https://substackcdn.com/image/fetch/$s_!N5Gh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif 1272w, https://substackcdn.com/image/fetch/$s_!N5Gh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb008c36a-00ce-416a-91bd-1d458ac4e061_1175x838.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This means:</p><p>&#10003; OKD is the foundation of all OpenShift versions</p><p>&#10003; OCP is the commercial, Red Hat-supported version of OKD</p><p>&#10003; OSD, ROSA, and ARO are fully managed cloud services built on OCP</p><p>&#10003; Edge and Virtualization expand OpenShift&#8217;s use cases</p><p>This structure lets you pick the version or service that fits your organization. Some people start with OKD and later switch to OCP. Others jump straight to a managed version like OSD or ARO. The choice depends on your team&#8217;s size, budget, and need for official support.</p><p>Many cloud providers offer their own managed K8s services, like Amazon EKS, Azure AKS, or Google GKE. OpenShift adds extra features on top of vanilla K8s, such as built-in security rules, S2I, and an operator-based architecture. If you need enterprise support, Red Hat&#8217;s OCP or the managed OpenShift services may be more suitable. But if you just want basic K8s with less overhead, EKS, AKS, or GKE might be enough.</p><h4>Conclusion</h4><p>OpenShift is a flexible K8s platform with many setup and management options. You can choose OKD, OCP, OSD, ROSA, ARO, Edge, or Virtualization depending on your goals.</p><p>&#10003; If you need a community version without official support, go with OKD</p><p>&#10003; If you need enterprise support and extra features, OCP is the best choice</p><p>&#10003; If you want a fully managed experience, go for OSD, ROSA, or ARO</p><p>&#10003; If you want a smaller setup or special use cases, consider Edge or Virtualization</p><p>No matter which one you pick, OpenShift can help you simplify container management and scale your applications. By combining Red Hat&#8217;s resources, a strong community, and K8s innovations, OpenShift provides a robust platform for modern software development.</p>]]></content:encoded></item><item><title><![CDATA[OpenShift-1]]></title><description><![CDATA[OpenShift CRC (CodeReady Containers) Installation, Setup, and Management Guide]]></description><link>https://mesutoezdil.substack.com/p/openshift-1</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/openshift-1</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Thu, 06 Mar 2025 08:52:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!HR-I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>OpenShift CRC lets you run OpenShift right on your macOS computer. It&#8217;s great for devs and system admins who want a quick and easy way to test things locally. In this guide, you&#8217;ll learn how to install, set up, and manage OpenShift CRC step-by-step, with clear examples and helpful tips along the way. Let&#8217;s get started!</p><h4>1 &gt; Prerequisites</h4><p>Before starting, ensure:</p><p>&#9758; You have macOS (&gt;= 13.x)</p><p>&#9758; You have at least 16GB RAM and 35GB disk space</p><p>&#9758; You have a Red Hat account to obtain the pull secret</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HR-I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HR-I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png 424w, https://substackcdn.com/image/fetch/$s_!HR-I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png 848w, https://substackcdn.com/image/fetch/$s_!HR-I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png 1272w, https://substackcdn.com/image/fetch/$s_!HR-I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HR-I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png" width="1456" height="774" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:774,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:672527,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158500411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HR-I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png 424w, https://substackcdn.com/image/fetch/$s_!HR-I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png 848w, https://substackcdn.com/image/fetch/$s_!HR-I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png 1272w, https://substackcdn.com/image/fetch/$s_!HR-I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9115524d-a5e9-4c5b-9b18-b491b540f64a_3832x2038.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Go to the <a href="https://access.redhat.com/products/red-hat-hybrid-cloud-console">URL</a> in the image above and create a user account, making sure to save all the necessary information (name, address, phone number).</p><p>And you should see this page after you have made the relevant confirmations in your email.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XlVP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XlVP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png 424w, https://substackcdn.com/image/fetch/$s_!XlVP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png 848w, https://substackcdn.com/image/fetch/$s_!XlVP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!XlVP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XlVP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png" width="1456" height="663" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:230039,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158500411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XlVP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png 424w, https://substackcdn.com/image/fetch/$s_!XlVP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png 848w, https://substackcdn.com/image/fetch/$s_!XlVP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!XlVP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac92c05-3014-4693-aab0-4fd296560d0c_2758x1256.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On the same page, click on Cluster List on the left and see the info above.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RSDf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RSDf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png 424w, https://substackcdn.com/image/fetch/$s_!RSDf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png 848w, https://substackcdn.com/image/fetch/$s_!RSDf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png 1272w, https://substackcdn.com/image/fetch/$s_!RSDf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RSDf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png" width="1456" height="544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:544,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:363403,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158500411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RSDf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png 424w, https://substackcdn.com/image/fetch/$s_!RSDf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png 848w, https://substackcdn.com/image/fetch/$s_!RSDf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png 1272w, https://substackcdn.com/image/fetch/$s_!RSDf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F544afa27-fb26-4209-9740-e722c89a0433_3828x1430.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To use OpenShift locally, download the necessary files according to the respective OS.</p><p>Don&#8217;t forget to download the secret right afterwards.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V-vE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V-vE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png 424w, https://substackcdn.com/image/fetch/$s_!V-vE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png 848w, https://substackcdn.com/image/fetch/$s_!V-vE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png 1272w, https://substackcdn.com/image/fetch/$s_!V-vE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V-vE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png" width="1456" height="553" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:553,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:125842,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158500411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V-vE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png 424w, https://substackcdn.com/image/fetch/$s_!V-vE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png 848w, https://substackcdn.com/image/fetch/$s_!V-vE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png 1272w, https://substackcdn.com/image/fetch/$s_!V-vE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4758be7c-d3b5-45d9-8277-114d570ccd06_2042x776.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So far we have downloaded the OpenShift Local File and the pull secret file, now let&#8217;s go back to the terminal.</p><h4><strong>2 &gt; Installation of CRC</strong></h4><h5><strong>Step 1 &gt;&gt; Download and Install CRC</strong></h5><p>Install via Installer Package</p><pre><code><code>cd ~/Downloads # your files may have been downloaded elsewhere
sudo installer -pkg crc-macos-installer.pkg -target /</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SD45!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SD45!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png 424w, https://substackcdn.com/image/fetch/$s_!SD45!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png 848w, https://substackcdn.com/image/fetch/$s_!SD45!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png 1272w, https://substackcdn.com/image/fetch/$s_!SD45!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SD45!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png" width="1456" height="135" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:135,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108292,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158500411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SD45!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png 424w, https://substackcdn.com/image/fetch/$s_!SD45!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png 848w, https://substackcdn.com/image/fetch/$s_!SD45!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png 1272w, https://substackcdn.com/image/fetch/$s_!SD45!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0b742b-4135-4a14-8101-3f7f28396871_3426x318.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>Or</em> install manually (it is easiest and healthiest to start with the above method)</p><pre><code><code># skip this part if you downloaded the files from the official website
curl -L -o ~/Downloads/crc-macos.tar.xz &lt;https://mirror.openshift.com/pub/openshift-v4/clients/crc/latest/crc-macos-amd64.tar.xz&gt;
cd ~/Downloads
tar -xvf crc-macos.tar.xz
sudo mv crc-macos-*/crc /usr/local/bin</code></code></pre><h4><strong>3 &gt; Initial CRC Setup</strong></h4><pre><code><code>crc setup -h # to take a look at some important info
crc setup</code></code></pre><p>This command prepares the environment by configuring the hypervisor, networking, and security settings.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!18g5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!18g5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png 424w, https://substackcdn.com/image/fetch/$s_!18g5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png 848w, https://substackcdn.com/image/fetch/$s_!18g5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png 1272w, https://substackcdn.com/image/fetch/$s_!18g5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!18g5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png" width="1456" height="704" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:704,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:651358,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158500411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!18g5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png 424w, https://substackcdn.com/image/fetch/$s_!18g5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png 848w, https://substackcdn.com/image/fetch/$s_!18g5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png 1272w, https://substackcdn.com/image/fetch/$s_!18g5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14e81ab3-515a-4eb0-b700-6a9065a28e1a_3454x1670.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>4 &gt; Pull Secret Configuration (Avoid Entering Every Time)</strong></h4><p>Download the pull-secret from Red Hat OpenShift Console (If you did not do it!)</p><p>&#128073; <a href="https://console.redhat.com/openshift/create/local">Link</a></p><p>Move it to a fixed location:</p><pre><code><code># so it doesn't ask you for a secret every time you log in
mv ~/Downloads/pull-secret.txt ~/.crc/pull-secret.txt 
crc config set pull-secret-file ~/.crc/pull-secret.txt</code></code></pre><p>Verify:</p><pre><code><code>crc config view | grep pull-secret-file 
# you see your secret, for example: - pull-secret-file: /Users/yourUserName/Downloads/pull-secret.txt</code></code></pre><h4><strong>5 &gt; Start OpenShift Cluster</strong></h4><pre><code><code>crc start</code></code></pre><p>This will:</p><p>&#9758; Start the OpenShift Virtual Machine</p><p>&#9758; Apply the pull-secret</p><p>&#9758; Configure networking</p><p>&#9758; Deploy OpenShift components</p><pre><code><code>&#10095; crc start
INFO Using bundle path /Users/yourUserName/.crc/cache/crc_vfkit_4.17.10_arm64.crcbundle
INFO Checking if running macOS version &gt;= 13.x
INFO Checking if running as non-root
INFO Checking if crc-admin-helper executable is cached
INFO Checking if running on a supported CPU architecture
INFO Checking if crc executable symlink exists
INFO Checking minimum RAM requirements
INFO Check if Podman binary exists in: /Users/mesutoezdil/.crc/bin/oc
INFO Checking if running emulated on Apple silicon
INFO Checking if vfkit is installed
INFO Checking if old launchd config for tray and/or daemon exists
INFO Checking if crc daemon plist file is present and loaded
INFO Checking SSH port availability
INFO Loading bundle: crc_vfkit_4.17.10_arm64...
INFO Creating CRC VM for OpenShift 4.17.10...
INFO Generating new SSH key pair...
INFO Generating new password for the kubeadmin user
INFO Starting CRC VM for openshift 4.17.10...
INFO CRC instance is running with IP 127.0.0.1
INFO CRC VM is running
INFO Updating authorized keys...
INFO Configuring shared directories
INFO Check internal and public DNS query...
WARN Failed public DNS query from the cluster: ssh command error:
command : curl --head &lt;https://quay.io&gt;
err     : Process exited with status 60:
INFO Check DNS query from host...
INFO Verifying validity of the kubelet certificates...
INFO Starting kubelet service
INFO Waiting for kube-apiserver availability... [takes around 2min]
INFO Adding user's pull secret to the cluster...
INFO Updating SSH key to machine config resource...
INFO Waiting until the user's pull secret is written to the instance disk...
INFO Changing the password for the kubeadmin user
INFO Updating cluster ID...
INFO Updating root CA cert to admin-kubeconfig-client-ca configmap...
INFO Starting openshift instance... [waiting for the cluster to stabilize]
INFO All operators are available. Ensuring stability...
INFO Operator console is progressing
INFO Operator authentication is not yet available
INFO Operator authentication is not yet available
INFO Operator authentication is not yet available
INFO Operator authentication is degraded
INFO Operator authentication is degraded
INFO Operator authentication is degraded
INFO All operators are available. Ensuring stability...
INFO Operators are stable (2/3)...
INFO Operators are stable (3/3)...
INFO Adding crc-admin and crc-developer contexts to kubeconfig...
Started the OpenShift cluster.

The server is accessible via web console at:
  &lt;https://console-openshift-console.apps-crc.testing&gt;

Log in as administrator:
  Username: kubeadmin
  Password: FZIsX-XXXXX-YYYYY-ZZZZZ

Log in as user:
  Username: developer
  Password: developer

Use the 'oc' command line interface:
  $ eval $(crc oc-env)
  $ oc login -u developer &lt;https://api.crc.testing&gt;:XXXX
</code></code></pre><h4><strong>6 &gt; </strong>Automate Terminal Commands in .zshrc (If you use bash, adapt the commands to it)</h4><p>To avoid repeating commands, add these to ~/.zshrc:</p><pre><code><code>echo 'eval $(crc oc-env)' &gt;&gt; ~/.zshrc
echo 'alias crcup="crc start"' &gt;&gt; ~/.zshrc
echo 'alias crcdown="crc stop"' &gt;&gt; ~/.zshrc
echo 'alias crcdel="crc delete &amp;&amp; crc setup"' &gt;&gt; ~/.zshrc
echo 'alias crcoc="eval $(crc oc-env)"' &gt;&gt; ~/.zshrc
echo 'oc login -u kubeadmin -p FZIsX-97VhH-gZuRA-QQgkS &lt;https://api.crc.testing:6443&gt;' &gt;&gt; ~/.zshrc
echo 'oc login -u developer -p developer &lt;https://api.crc.testing:6443&gt;' &gt;&gt; ~/.zshrc
source ~/.zshrc</code></code></pre><p>Now you can:</p><p>&#9758; crcup &#8594; Start OpenShift</p><p>&#9758; crcdown &#8594; Stop OpenShift</p><p>&#9758; crcdel &#8594; Reset CRC</p><p>&#9758; crcoc &#8594; Load OpenShift CLI</p><h4><strong>7 &gt; Login to OpenShift (Admin &amp; Developer)</strong></h4><p>Admin Login (<em>Admin</em> and <em>Developer</em> differences below)</p><pre><code><code>oc login -u kubeadmin -p FZIsX-XXXXX-YYYYY-ZZZZZ &lt;https://api.crc.testing&gt;:XXXX</code></code></pre><pre><code><code># output
Login successful.

You have access to 65 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "default".</code></code></pre><p>Developer Login</p><pre><code><code>oc login -u developer -p developer &lt;https://api.crc.testing&gt;:XXXX</code></code></pre><pre><code><code># output
Login successful.

You don't have any projects. You can try to create a new project, by running

    oc new-project &lt;projectname&gt;</code></code></pre><h4><strong>8 &gt; OpenShift Web Console &amp; Cluster Info</strong></h4><h5><strong>Access Web Console</strong></h5><p>&#128279; <a href="https://console-openshift-console.apps-crc.testing">Open</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7bE2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7bE2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png 424w, https://substackcdn.com/image/fetch/$s_!7bE2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png 848w, https://substackcdn.com/image/fetch/$s_!7bE2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png 1272w, https://substackcdn.com/image/fetch/$s_!7bE2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7bE2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png" width="1456" height="775" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:775,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:685476,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158500411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7bE2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png 424w, https://substackcdn.com/image/fetch/$s_!7bE2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png 848w, https://substackcdn.com/image/fetch/$s_!7bE2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png 1272w, https://substackcdn.com/image/fetch/$s_!7bE2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45ee5497-30ce-43be-b491-441f100a6896_3830x2038.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To get credentials:</p><pre><code><code>crc console --credentials</code></code></pre><p>List Nodes</p><pre><code><code>oc get nodes</code></code></pre><p>List Running Pods</p><pre><code><code>oc get pods --all-namespaces</code></code></pre><h4><strong>9 &gt; Stopping and Restarting OpenShift</strong></h4><p>Stop CRC (Shutdown OpenShift)</p><pre><code><code>crc stop</code></code></pre><p>Delete and Reset CRC</p><pre><code><code>crc delete
crc cleanup
crc setup</code></code></pre><h4><strong>10 &gt; Restarting OpenShift from Scratch (If needed)</strong></h4><p>After a restart:</p><pre><code><code>crcup
oc login -u kubeadmin -p FZIsX-XXXXX-YYYYY-ZZZZZ &lt;https://api.crc.testing&gt;:XXXX</code></code></pre><h4><strong>11 &gt; Difference Between Admin &amp; Developer Users</strong></h4><ul><li><p>Manage Nodes</p><ul><li><p>Admin (kubeadmin): &#9989; Yes &#8674; Developer: &#10060; No</p></li></ul></li><li><p>Create Projects</p><ul><li><p>Admin (kubeadmin): &#9989; Yes &#8674; Developer: &#9989; Yes</p></li></ul></li><li><p>Access Web UI</p><ul><li><p>Admin (kubeadmin): &#9989; Yes &#8674; Developer: &#9989; Yes</p></li></ul></li><li><p>View Logs</p><ul><li><p>Admin (kubeadmin): &#9989; Yes &#8674; Developer: &#10060; No</p></li></ul></li><li><p>Modify Cluster Configuration</p><ul><li><p>Admin (kubeadmin): &#9989; Yes &#8674; Developer: &#10060; No</p></li></ul></li></ul><h4><strong>12 &gt; Debugging &amp; Troubleshooting</strong></h4><p>Check Cluster Logs</p><pre><code><code>crc status
oc get pods --all-namespaces</code></code></pre><p>If Cluster Fails to Start</p><pre><code><code>crc cleanup
crc setup
crc start</code></code></pre><p>Check Configs</p><pre><code><code>crc config view</code></code></pre><h4><strong>Stopping and Restarting CRC Like It's a New Day</strong></h4><p>Now, let&#8217;s test what happens when you stop or completely delete CRC and then restart it as if it were a new day.</p><p>&#10008; Stopping CRC (Shutdown)</p><p>To stop CRC without losing data:</p><pre><code><code>crc stop</code></code></pre><p>&#10003; This will shut down the OpenShift VM, but keep all configurations and data.</p><p>&#10003; The OpenShift web console will become inaccessible.</p><p>&#10003; You can start it again later without reinstalling anything.</p><p>Check in your browser: Go to <a href="https://console-openshift-console.apps-crc.testing">here</a>.</p><p>It should show a &#8220;site unreachable&#8221; error.</p><p><strong>! Completely Deleting CRC</strong></p><p>If you want to completely remove CRC, including all stored configurations and data:</p><pre><code><code>crc delete
crc cleanup</code></code></pre><p>This will wipe everything and require a full reinstallation next time.</p><p><strong>Restarting CRC as If It&#8217;s a New Day</strong></p><p>Let&#8217;s assume you turned off your computer and now want to start everything <em>from scratch</em> the next day.</p><p>Open a terminal and start CRC:</p><pre><code><code>crcup</code></code></pre><p>(This is an alias in .zshrc that runs crc start)</p><p>Login to OpenShift:</p><pre><code><code>oc login -u kubeadmin -p FZIsX-XXXXX-YYYYY-ZZZZZ &lt;https://api.crc.testing&gt;:XXXX</code></code></pre><p>(Or log in as developer if needed)</p><p>Open the OpenShift web <a href="https://console-openshift-console.apps-crc.testing">console</a> again.</p><p>The dashboard should now be accessible again!</p><h4>Want a Faster Startup?</h4><p>If you don&#8217;t want to reinstall everything, don&#8217;t delete CRC. Instead, just:</p><pre><code><code>crc stop  # stop CRC
crc start  # start CRC again</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nmUR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nmUR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png 424w, https://substackcdn.com/image/fetch/$s_!nmUR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png 848w, https://substackcdn.com/image/fetch/$s_!nmUR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png 1272w, https://substackcdn.com/image/fetch/$s_!nmUR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nmUR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png" width="1456" height="1399" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1399,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:485773,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158500411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nmUR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png 424w, https://substackcdn.com/image/fetch/$s_!nmUR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png 848w, https://substackcdn.com/image/fetch/$s_!nmUR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png 1272w, https://substackcdn.com/image/fetch/$s_!nmUR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F033a1a02-2291-454e-b9ce-71105bd35aa8_2002x1924.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This way, you don&#8217;t have to set up everything from scratch each time.</p><h4><strong>CRC Storage Usage Analysis &amp; Growth Impact</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!63Ah!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!63Ah!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png 424w, https://substackcdn.com/image/fetch/$s_!63Ah!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png 848w, https://substackcdn.com/image/fetch/$s_!63Ah!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png 1272w, https://substackcdn.com/image/fetch/$s_!63Ah!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!63Ah!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png" width="1456" height="750" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:664894,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/158500411?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!63Ah!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png 424w, https://substackcdn.com/image/fetch/$s_!63Ah!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png 848w, https://substackcdn.com/image/fetch/$s_!63Ah!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png 1272w, https://substackcdn.com/image/fetch/$s_!63Ah!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47d0e9d7-8a88-495b-8f58-0fa744154376_3448x1776.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>Current Storage Usage</strong></h4><p>From your command:</p><pre><code><code>du -sh ~/.crc</code></code></pre><p>Breakdown:</p><pre><code><code>du -sh ~/.crc/*</code></code></pre><h4>Will Storage Usage Increase Over Time?</h4><p>Yes, as you create new pods, store images, and persist data, the storage usage will increase. Factors That Increase Storage:</p><p>&#9758; Pods &amp; Containers</p><ul><li><p>New pods require space for logs, configs, and runtime storage</p></li><li><p>Pods that pull new container images will increase the cache size</p></li></ul><p>&#9758; Persistent Volumes &amp; PVCs</p><ul><li><p>If you use Persistent Volume Claims (PVCs), CRC will allocate additional storage</p></li><li><p>Large databases or applications using storage-backed PVCs consume significant space</p></li></ul><p>&#9758; Downloaded Container Images</p><ul><li><p>When pulling images from <a href="http://quay.io">quay.io</a>, Docker Hub, or Red Hat Registry, they get stored inside CRC</p></li><li><p>More images = higher storage consumption</p></li></ul><p>&#9758; Logs &amp; Metrics Data</p><ul><li><p>OpenShift generates logs from different services (like kube-apiserver, etcd)</p></li><li><p>Continuous logging = gradual storage increase</p></li></ul><h3>How to Prevent Storage Bloat?</h3><p>To check OpenShift storage usage:</p><pre><code><code>oc adm top nodes
oc adm top pods</code></code></pre><p>To list Persistent Volumes:</p><pre><code><code>oc get pvc</code></code></pre><h3><strong>How to Free Up Space?</strong></h3><p>i. Delete Unused Pods</p><pre><code><code>oc delete pod &lt;pod-name&gt;</code></code></pre><p>To delete all pods in a namespace:</p><pre><code><code>oc delete pod --all -n &lt;namespace&gt;</code></code></pre><p>ii. Remove Unused Container Images</p><pre><code><code>podman images
podman rmi &lt;image-id&gt;</code></code></pre><p>To prune all unused images:</p><pre><code><code>podman system prune -a</code></code></pre><p>iii. Delete Unused Persistent Volumes</p><pre><code><code>oc delete pvc &lt;pvc-name&gt;</code></code></pre><p>vi. Clean Logs &amp; CRC Cache</p><pre><code><code>rm -rf ~/.crc/logs/*
rm -rf ~/.crc/cache/*</code></code></pre><p>v. Reset CRC to Free Up Disk Space</p><pre><code><code>crc delete
crc cleanup
crc setup</code></code></pre><p>WARNING: This will reset CRC and require reconfig.</p><p>Use aliases for quick cleanup &amp; reset:</p><pre><code><code>echo 'alias crcreset="crc delete &amp;&amp; crc cleanup &amp;&amp; crc setup"' &gt;&gt; ~/.zshrc</code></code></pre><h3>Conclusion</h3><p>That&#8217;s it! Now you know how to install, set up, and manage OpenShift CRC on your Mac. With these simple steps, you&#8217;re ready to quickly create, test, and deploy applications locally.</p>]]></content:encoded></item><item><title><![CDATA[K8s-Security-4]]></title><description><![CDATA[Using CIS Benchmark]]></description><link>https://mesutoezdil.substack.com/p/k8s-security-4</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/k8s-security-4</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Wed, 26 Feb 2025 09:52:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!CO1L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Here again, I am going to talk about a seemingly simple and yet very important and powerful topic. Power requires safety. This article shows how to run <a href="https://www.cisecurity.org/cis-benchmarks">CIS Benchmark </a>checks on your cluster using the <a href="https://github.com/aquasecurity/kube-bench">kube-bench tool</a>. The goal is to find possible sec issues and fix them so your cluster is more secure.</p><div><hr></div><pre><code><code>&#127926; I have successfully executed these commands in my own environment step by step. If you encounter any issues, please let me know in the comments.</code></code></pre><h4>1 &gt; Why CIS Benchmark?</h4><p>CIS Benchmark provides recommended rules for securing k8s. If you follow these rules, you reduce risks such as unauthorized access, data leaks, or misconfigs. The kube-bench tool checks your cluster against these rules and gives you a report. In that report, you&#8217;ll see:</p><p>&#10004;&#65038; PASS: The item meets the recommendation</p><p>&#10004;&#65038; FAIL: The item does not meet the recommendation</p><p>&#10004;&#65038; WARN: The item might be risky or needs attention</p><p>&#10004;&#65038; INFO: The item is just for info</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CO1L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CO1L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif 424w, https://substackcdn.com/image/fetch/$s_!CO1L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif 848w, https://substackcdn.com/image/fetch/$s_!CO1L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif 1272w, https://substackcdn.com/image/fetch/$s_!CO1L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CO1L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif" width="1292" height="831" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:831,&quot;width&quot;:1292,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:619050,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/157949972?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CO1L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif 424w, https://substackcdn.com/image/fetch/$s_!CO1L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif 848w, https://substackcdn.com/image/fetch/$s_!CO1L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif 1272w, https://substackcdn.com/image/fetch/$s_!CO1L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4238ae-6fe3-47df-b425-ca3185086a43_1292x831.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>2 &gt; Prerequisites</h4><p>&#9758; A working k8s cluster (for example, one control-plane node and two worker nodes).</p><p>&#9758; The kubectl command to interact with the cluster. It is my env (on <a href="https://labs.iximiuz.com/dashboard">iximiuz</a>):</p><pre><code><code>k get pods
No resources found in default namespace.

k get nodes
NAME        STATUS   ROLES           AGE   VERSION
cplane-01   Ready    control-plane   25s   v1.32.1
node-01     Ready    &lt;none&gt;          13s   v1.32.1
node-02     Ready    &lt;none&gt;          13s   v1.32.1</code></code></pre><h4>3 &gt; Download the kube-bench YAML Files</h4><p>We need two YAML files to run the checks:</p><pre><code><code>wget -O ar-kube-control-plane.yaml &lt;https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job-master.yaml&gt;

wget -O ar-kube-node.yaml &lt;https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job-node.yaml&gt;</code></code></pre><p>ar-kube-control-plane.yaml tests the control-plane components and ar-kube-node.yaml tests the worker nodes. You can open these files (for example, cat ar-kube-control-plane.yaml) to see the details.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Ac0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Ac0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png 424w, https://substackcdn.com/image/fetch/$s_!0Ac0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png 848w, https://substackcdn.com/image/fetch/$s_!0Ac0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ac0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Ac0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png" width="1456" height="453" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:453,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:394028,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/157949972?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Ac0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png 424w, https://substackcdn.com/image/fetch/$s_!0Ac0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png 848w, https://substackcdn.com/image/fetch/$s_!0Ac0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ac0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0233add3-8c38-409d-a2d9-3f26dfcd0a88_2998x932.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>4 &gt; Run the CIS Benchmark Tests</h4><p>Use these commands to create the Jobs in your cluster:</p><pre><code><code>kubectl create -f ar-kube-control-plane.yaml

kubectl create -f ar-kube-node.yaml</code></code></pre><p>This starts two Jobs: one for the control-plane and one for the node checks. Run kubectl get pods to see the Pods created by these Jobs. They should have a Completed status once done.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uudA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uudA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png 424w, https://substackcdn.com/image/fetch/$s_!uudA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png 848w, https://substackcdn.com/image/fetch/$s_!uudA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png 1272w, https://substackcdn.com/image/fetch/$s_!uudA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uudA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png" width="1456" height="116" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:116,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:122920,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/157949972?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uudA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png 424w, https://substackcdn.com/image/fetch/$s_!uudA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png 848w, https://substackcdn.com/image/fetch/$s_!uudA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png 1272w, https://substackcdn.com/image/fetch/$s_!uudA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125a1932-2c07-43c9-a28d-a7a3b48e6820_3086x246.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>5 &gt; Save and View the Test Results</h4><p>When the Pods are complete, get their logs:</p><pre><code><code>kubectl logs &lt;Control Plane Pod&gt; &gt; ar-kube-results-control-plane.log

kubectl logs &lt;Node Pod&gt; &gt; ar-kube-results-node.log</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MPjN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MPjN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png 424w, https://substackcdn.com/image/fetch/$s_!MPjN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png 848w, https://substackcdn.com/image/fetch/$s_!MPjN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png 1272w, https://substackcdn.com/image/fetch/$s_!MPjN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MPjN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png" width="1456" height="186" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:186,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:103449,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://mesutoezdil.substack.com/i/157949972?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MPjN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png 424w, https://substackcdn.com/image/fetch/$s_!MPjN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png 848w, https://substackcdn.com/image/fetch/$s_!MPjN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png 1272w, https://substackcdn.com/image/fetch/$s_!MPjN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee6cb442-95fe-4be0-bf89-ec19f345308d_1922x246.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Replace &lt;Control Plane Pod&gt; and &lt;Node Pod&gt; with the real Pod names. Then:</p><pre><code><code>cat ar-kube-results-control-plane.log

cat ar-kube-results-node.log</code></code></pre><p>Look for FAIL, WARN, or PASS. FAIL means you need to fix something. Output is too long to be included here.</p><h4>6 &gt; Example Findings</h4><p>In this section, we look at common issues you might see in your kube-bench results. Each item shows a part of the CIS Benchmark and a hint about why it matters and how to fix it. Here are a few examples:</p><p>1.1.12: &#8220;Ensure etcd data directory is owned by etcd:etcd&#8221; &#8594; Why it matters: etcd stores cluster data (like states, secrets, or config details). If other users can change or read the etcd data directory, they could tamper with cluster info. But here is the solution: Make sure you run this command on the node where etcd is installed. After changing ownership, you may need to restart etcd. This ensures that etcd picks up the new file ownership correctly.</p><pre><code><code>chown etcd:etcd /var/lib/etcd</code></code></pre><pre><code><code>systemctl daemon-reload

systemctl restart etcd</code></code></pre><p>1.2.5: &#8220;Ensure the --kubelet-certificate-authority argument is set as appropriate&#8221; &#8594; Why it matters: The API server must verify kubelet certificates. If the CA (Certificate Authority) is not set, anyone could pose as a kubelet. Here is the solution: Identify the CA file (often found in /etc/kubernetes/pki/ca.crt). Then edit the API server manifest (usually /etc/kubernetes/manifests/kube-apiserver.yaml). And add or modify:</p><pre><code><code>- --kubelet-certificate-authority=/etc/kubernetes/pki/ca.crt</code></code></pre><p>After saving, the kubelet usually restarts the API server pod automatically.</p><p>4.1.1: &#8220;Ensure the kubelet service file permissions are 600&#8221; &#8594; Why it matters: If the kubelet service file is world-readable or writable, an attacker could modify the service and gain control. Here is the solution: These commands restrict access so that only the root user can edit the kubelet service file, and then restart the kubelet.</p><pre><code><code>chmod 600 /lib/systemd/system/kubelet.service

systemctl daemon-reload

systemctl restart kubelet</code></code></pre><p>4.1.9: &#8220;If the kubelet config.yaml configuration file is being used, validate permissions are set to 600&#8221; &#8594; Why it matters: Kubelet&#8217;s config.yaml can contain sensitive details, such as TLS settings or cluster server addresses. Here is the solution: This ensures that only the root user can read or change the file.</p><pre><code><code>chmod 600 /var/lib/kubelet/config.yaml

systemctl daemon-reload

systemctl restart kubelet</code></code></pre><p>These are just a few examples. Your actual report may contain many other checks with different IDs and messages. Always review each FAIL or WARN item and see the suggestions in the output. That way, you can fix or improve your cluster&#8217;s security.</p><h4>7 &gt; Re-run Tests After Fixes</h4><p>Once you fix the items in the report, you can re-run the tests:</p><pre><code><code>kubectl delete -f ar-kube-control-plane.yaml

kubectl delete -f ar-kube-node.yaml</code></code></pre><p>Then:</p><pre><code><code>kubectl create -f ar-kube-control-plane.yaml

kubectl create -f ar-kube-node.yaml</code></code></pre><p>This lets you see if FAIL items are now PASS.</p><h4>8 &gt; Some Advanced Tips</h4><p>Sometimes, you don&#8217;t need to run every single CIS check. You can focus on certain items or skip checks that do not apply to your setup. The <code>kube-bench</code> tool supports this flexibility. For example, if you only want to test the control plane, you can run <code>kube-bench run --targets master</code>, which checks only master-related items. If you want to test a specific control, such as &#8220;1.2.5,&#8221; just add <code>--check 1.2.5</code>. That way, you quickly see if your fix for that one issue is correct. If there are checks you don&#8217;t need, you can skip them with <code>--skip</code>. For instance, <code>kube-bench run --targets node --skip 4.1.1,4.1.9</code> ignores checks 4.1.1 and 4.1.9. Because commands can change with different kube-bench versions, you should also check the official usage docs or run <code>kube-bench --help</code> to stay updated.</p><h4>9 &gt; Additional Tips</h4><p>i- Backup configs &#9758; Before changing any configuration, make a copy.</p><p>ii- Go step by step &#9758; Fix a few items at a time and re-check.</p><p>iii- Keep reading docs  &#9758; <a href="https://kubernetes.io/docs/">Kubernetes Official Docs</a> and <a href="https://www.cisecurity.org/benchmark/kubernetes">CIS Benchmark Docs</a></p><p>iv- Automation &#9758; Consider adding kube-bench to a CI/CD pipeline.</p><h4>9 &gt; Conclusion</h4><p>Using kube-bench with the CIS Benchmark is a simple way to check and improve your k8s security. By following the recommendations, you lower the risk of attacks and make sure your cluster is set up with best practices in mind.</p><p>Special thanks to <a href="https://www.ardanlabs.com/">ArdanLabs</a> for their amazing educational videos on k8s and Golang, and for their support in this journey. ArdanLabs is a well-known technology company specializing in software development, training, and consulting, particularly in Golang and Kubernetes. They provide high-quality educational content, workshops, and courses to help developers and enterprises build efficient and scalable apps.</p>]]></content:encoded></item><item><title><![CDATA[K8s-Security-3]]></title><description><![CDATA[Restricting Default Access with NetworkPolicies]]></description><link>https://mesutoezdil.substack.com/p/k8s-security-3</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/k8s-security-3</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Sat, 22 Feb 2025 10:42:29 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4e767a7d-cb6f-49b0-8859-39fdbadb7bce_3312x1494.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>NetworkPolicies are a powerful tool for controlling traffic between Pods. By default, all Pods in a k8s cluster can communicate with each other, which can pose security risks in production environments. NetworkPolicies enable fine-grained traffic control, improving cluster security by implementing least-privilege principles.</p><p>This guide will walk you through the process of creating a default deny NetworkPolicy to block all network traffic to and from Pods in a namespace. Along the way, you will gain a deeper understanding of NetworkPolicies and how to fine-tune them for your security needs.</p><pre><code>&#127926; I have successfully executed these commands in my own environment step by step. If you encounter any issues, please let me know in the comments.</code></pre><h4>How to Apply This in Your Own Environment</h4><p>If you want to try this setup in your own k8s cluster, follow these steps:</p><ol><li><p>Ensure you have a Kubernetes cluster running (such as Minikube, Kind, or a managed cloud service like AKS, EKS, or GKE).</p></li><li><p>Install kubectl and configure it to connect to your cluster.</p></li><li><p>Verify that your cluster is working properly by running:</p></li></ol><p>This should list all the nodes in your cluster.</p><pre><code>kubectl get nodes</code></pre><ol start="4"><li><p>If your cluster does not have a CNI plugin that supports NetworkPolicy (e.g., Calico, Cilium), install one. For example, to install Calico:</p></li></ol><pre><code>kubectl apply -f &lt;https://docs.projectcalico.org/manifests/calico.yaml&gt;</code></pre><ol start="4"><li><p>Once your cluster is ready, proceed with the following steps.</p></li></ol><h4>Create a New Namespace</h4><p>To maintain isolation, create a dedicated namespace for this test:</p><pre><code>kubectl create namespace arkube-ns</code></pre><p>This ensures that all the resources we create in this tutorial remain isolated from other workloads in the cluster.</p><h4>Deploy a Simple Nginx Web Server</h4><p>Now that we have a namespace, let's deploy an Nginx web server inside the arkube-ns namespace.</p><ol><li><p>Create a file named arkube-nginx.yml with the following content:</p></li></ol><pre><code>apiVersion: apps/v1
kind: Deployment
metadata:
  name: arkube-nginx  # Name of the deployment
  namespace: arkube-ns  # Namespace where this deployment will reside
spec:
  replicas: 1  # Number of Nginx replicas
  selector:
    matchLabels:
      app: arkube-nginx  # Selector label for identifying pods
  template:
    metadata:
      labels:
        app: arkube-nginx  # Label applied to the created pods
    spec:
      containers:
      - name: nginx  # Name of the container
        image: nginx:1.14.2  # Image version to use for Nginx
        ports:
        - containerPort: 80  # Exposing port 80 inside the container</code></pre><ol><li><p>Apply the deployment:</p></li></ol><pre><code>kubectl apply -f arkube-nginx.yml</code></pre><p>This deploys an Nginx pod, which we will use to test connectivity and NetworkPolicies.</p><h4>Get the Cluster IP Address of the Nginx Pod</h4><p>Since we need a target for network testing, retrieve the IP address of the Nginx Pod:</p><pre><code>kubectl get pods -n arkube-ns -o wide</code></pre><p>Take note of the IP address displayed, as we will use it in the next step.</p><h4>Create a Test Client Pod</h4><p>We&#8217;ll create a client Pod that continuously tries to access the Nginx server to test network connectivity.</p><ol><li><p>Create a file named arkube-client.yml with the following content:</p></li></ol><pre><code>apiVersion: v1
kind: Pod
metadata:
  name: arkube-client  # Name of the client pod
  namespace: arkube-ns  # Namespace where the client pod is deployed
  labels:
    app: arkube-client  # Label for identifying the client pod
spec:
  containers:
  - name: busybox  # Name of the container
    image: radial/busyboxplus:curl  # Lightweight image with curl support
    command: ['sh', '-c', 'while true; do curl -m 3 &lt;Nginx Pod Cluster IP address&gt;; sleep 5; done']  # Looping curl request for connectivity testing</code></pre><ol><li><p>Replace &lt;Nginx Pod Cluster IP address&gt; with the IP address obtained in Step 3.</p></li><li><p>Apply the client Pod:</p></li></ol><pre><code>kubectl create -f arkube-client.yml</code></pre><h4>Verify Connectivity</h4><p>Now, confirm that the client Pod can reach the Nginx server:</p><pre><code>kubectl logs -n arkube-ns arkube-client</code></pre><p>If the connection is successful, you should see HTTP response data from the Nginx server.</p><h4>Create a Default Deny NetworkPolicy</h4><p>Next, we&#8217;ll create a default deny NetworkPolicy to block all incoming and outgoing traffic in the arkube-ns namespace.</p><ol><li><p>Create a file named arkube-deny-all.yml with the following content:</p></li></ol><pre><code>apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: arkube-deny-all  # Name of the NetworkPolicy
  namespace: arkube-ns  # Namespace where this policy will be applied
spec:
  podSelector: {}  # Apply to all pods in the namespace
  policyTypes:
  - Ingress  # Deny incoming traffic
  - Egress  # Deny outgoing traffic</code></pre><ol><li><p>Apply the NetworkPolicy:</p></li></ol><pre><code>kubectl create -f arkube-deny-all.yml</code></pre><h4>Verify the NetworkPolicy</h4><p>Check the logs of the client Pod again. If the NetworkPolicy is working, you should see Connection timed out errors:</p><pre><code>kubectl logs -n arkube-ns arkube-client</code></pre><h4>Troubleshooting NetworkPolicy Issues</h4><p>If the client Pod can still reach the Nginx server after applying the arkube-deny-all policy, check the following:</p><ol><li><p>Client still reaches Nginx:</p><ul><li><p>Possible cause: No CNI plugin installed or it is misconfigured.</p></li><li><p>Solution: Install a NetworkPolicy-aware CNI like Calico:</p></li></ul></li></ol><pre><code>kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml</code></pre><ol><li><p>Policy not applied:</p><ul><li><p>Possible cause: The Pod needs to be restarted.</p></li><li><p>Solution: Restart the client Pod:</p></li></ul></li></ol><pre><code>kubectl delete pod arkube-client -n arkube-ns &amp;&amp; kubectl create -f arkube-client.yml</code></pre><h4>Allow Specific Traffic</h4><p>To allow only the client Pod to access the Nginx server, create this policy:</p><pre><code>apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: arkube-allow-client  # Name of the NetworkPolicy
  namespace: arkube-ns  # Namespace where this policy will be applied
spec:
  podSelector:
    matchLabels:
      app: arkube-nginx  # Apply this policy to Nginx pods
  policyTypes:
  - Ingress  # Allow incoming traffic
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: arkube-client  # Allow traffic only from client pods</code></pre><p>Apply the policy:</p><pre><code>kubectl create -f arkube-allow-client.yml</code></pre><p>Key Takeaways:</p><ul><li><p>A default deny policy ensures that no traffic is allowed unless explicitly permitted</p></li><li><p>NetworkPolicy enforcement requires a CNI that supports NetworkPolicies</p></li><li><p>Use troubleshooting steps to diagnose misconfigurations</p></li><li><p>Always test policies with a client-server setup before applying them to production</p></li><li><p>Consider combining NetworkPolicies with RBAC for enhanced security</p></li></ul><p>Stay safe!</p><h5>Relevant Documentation</h5><ul><li><p>Kubernetes Network Policies</p></li></ul>]]></content:encoded></item><item><title><![CDATA[K8s-Security-2]]></title><description><![CDATA[Top 10 Web Hacking Techniques of 2024: What K8s Admins Need to Know in 2025]]></description><link>https://mesutoezdil.substack.com/p/k8s-security-2</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/k8s-security-2</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Mon, 10 Feb 2025 12:41:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!13wv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every year, the cybersecurity world gets together to spotlight the coolest, most mind-blowing hacks of the year. <a href="https://portswigger.net/research/top-10-web-hacking-techniques-of-2024">The 2024 Top 10 Web Hacking Techniques</a> list is out, and it&#8217;s packed with research that&#8217;s equal parts fascinating and terrifying. But here&#8217;s the thing: while these techniques focus on web apps, they have huge implications for k8s, the backbone of so many modern systems. Let&#8217;s break it down and see what k8s teams should be worried about in 2025. </p><p>In this article, we will explore k8s security from the perspective of the &#8220;<a href="https://portswigger.net/research/top-10-web-hacking-techniques-of-2024">Top 10 Web Hacking Techniques of 2024</a>,&#8221; published by <a href="https://portswigger.net/research/james-kettle">James Kettle on PortSwigger</a> on February 4, 2025.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!13wv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!13wv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif 424w, https://substackcdn.com/image/fetch/$s_!13wv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif 848w, https://substackcdn.com/image/fetch/$s_!13wv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif 1272w, https://substackcdn.com/image/fetch/$s_!13wv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!13wv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif" width="1333" height="829" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:829,&quot;width&quot;:1333,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:779655,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!13wv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif 424w, https://substackcdn.com/image/fetch/$s_!13wv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif 848w, https://substackcdn.com/image/fetch/$s_!13wv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif 1272w, https://substackcdn.com/image/fetch/$s_!13wv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6af15e7-0f9b-45c9-bda7-c1473c4e603d_1333x829.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>10 &gt; Hijacking OAuth Flows via Cookie Tossing</h3><p>OAuth is everywhere&#8212;k8s included. This hack shows how attackers can mess with cookies across subdomains to hijack sessions. In a k8s setup, where many services share subdomains, it could let someone sneak into your cluster or steal sensitive data. One way to tighten cookie policies is by adding an annotation like:</p><pre><code><code>kubectl annotate ingress my-ingress nginx.ingress.kubernetes.io/proxy-cookie-domain="myapp.example.com"</code></code></pre><p>which enforces stricter domain rules. You can also add the HttpOnly and Secure flags:</p><pre><code><code>kubectl annotate ingress my-ingress nginx.ingress.kubernetes.io/proxy-cookie-path="/; HttpOnly; Secure"</code></code></pre><p>to protect cookies from being accessed by client-side scripts or sent over non-HTTPS connections.</p><h3>9 &gt; ChatGPT Account Takeover &#8211; Wildcard Web Cache Deception</h3><p>Web cache deception isn&#8217;t new, but this twist is next-level. By exploiting how caches handle paths, attackers can sneak into places they shouldn&#8217;t be. For k8s, where caching helps with speed, this could expose secrets or lead to account takeovers. Disabling caching for sensitive paths via:</p><pre><code><code>kubectl annotate ingress my-ingress nginx.ingress.kubernetes.io/cache-control="no-store, no-cache, must-revalidate"</code></code></pre><p>helps prevent caching of critical endpoints. If you&#8217;re using a tool like Varnish, you can specify certain paths not to cache at all by setting something like:</p><pre><code><code>helm upgrade my-cache-chart stable/varnish --set varnish.no_cache="/login,/api/secret"</code></code></pre><h3>8 &gt; OAuth Non-Happy Path to ATO</h3><p>By manipulating the Referer header, attackers can trick OAuth into handing over access. In k8s, where OAuth is used for service authentication, this could let attackers impersonate services. You can enforce Referer validation with a snippet such as:</p><pre><code><code>kubectl annotate ingress my-ingress nginx.ingress.kubernetes.io/configuration-snippet="if ($http_referer !~* '^https://trusteddomain.com') { return 403; }"</code></code></pre><p>Also, rotating OAuth secrets and restricting callback URLs via:</p><pre><code><code>kubectl create secret generic oauth-config --from-literal=client_id='NEW_CLIENT_ID' --from-literal=client_secret='NEW_SECRET'</code></code></pre><p>reduces the risk of unauthorized redirection.</p><h3>7 &gt; CVE-2024-4367 &#8211; Arbitrary JavaScript Execution in PDF.js</h3><p>PDF.js, often used in apps running on k8s, can let attackers run malicious JavaScript if left unpatched. Updating the PDF.js container image to a patched version (for example, 2.18.5 or newer) with:</p><pre><code><code>helm upgrade my-app ./chart --set pdfjs.version=2.18.5</code></code></pre><p>helps close known holes. Running as a non-root user with a read-only filesystem:</p><pre><code><code>securityContext:
  runAsNonRoot: true
  readOnlyRootFilesystem: true</code></code></pre><p>further limits potential damage.</p><h3>6 &gt; DoubleClickjacking: A New Era of UI Redressing</h3><p>Clickjacking returns, and in k8s, where admins rely on web dashboards, it can trick you into unintended actions. Adding a Content Security Policy through an annotation such as:</p><pre><code><code>kubectl annotate ingress dashboard-ingress nginx.ingress.kubernetes.io/configuration-snippet="add_header Content-Security-Policy \\"frame-ancestors 'none'\\";"</code></code></pre><p>helps ensure your dashboard can&#8217;t be iframed by malicious sites. You can also lock down dashboard access with RBAC:</p><pre><code><code>kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --group=admin</code></code></pre><p>to limit who can access critical controls in the first place.</p><h3>5 &gt; Exploring the DOMPurify Library: Bypasses and Fixes</h3><p>DOMPurify is a popular way to sanitize HTML and stop XSS, but new bypass techniques keep emerging. Upgrading DOMPurify regularly, for example via:</p><pre><code><code>npm install dompurify@latest</code></code></pre><p>helps ensure you&#8217;re running the latest patched version. Combining client-side sanitization with server-side protections, like adding:</p><pre><code><code>kubectl annotate ingress my-ingress nginx.ingress.kubernetes.io/configuration-snippet="add_header X-XSS-Protection '1; mode=block';"</code></code></pre><p>provides extra layers of defense.</p><h3>4 &gt; WorstFit: Unveiling Hidden Transformers in Windows ANSI</h3><p>Charset conversion vulnerabilities are less obvious but can allow malicious payloads to sneak in. In Kubernetes, where multi-platform interactions are common, forcing UTF-8 can help:</p><pre><code><code>kubectl annotate ingress my-ingress nginx.ingress.kubernetes.io/proxy-set-headers="Content-Type: application/json; charset=utf-8"</code></code></pre><p>and validating encodings at the application level (rejecting non-UTF-8 requests) cuts down on hidden character attacks.</p><h3>3 &gt; Unveiling TE.0 HTTP Request Smuggling</h3><p>HTTP request smuggling has a new variant, TE.0, which can let attackers bypass security or poison caches in k8s. Upgrading your ingress controller:</p><pre><code><code>helm upgrade nginx-ingress ingress-nginx/ingress-nginx --version latest</code></code></pre><p>ensures you have the latest patches. Enabling strict header parsing with:</p><pre><code><code>kubectl annotate ingress my-ingress nginx.ingress.kubernetes.io/enable-strict-http-headers="true"</code></code></pre><p>prevents ambiguous <code>Transfer-Encoding</code> or <code>Content-Length</code> headers from being exploited.</p><h3>2 &gt; SQL Injection Isn't Dead: Smuggling Queries at the Protocol Level</h3><p>SQL injection keeps evolving. In k8s, where databases power many apps, parameterized queries and regular audits are essential. You can also use policies like Kyverno to enforce best practices:</p><pre><code><code>apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-parameterized-queries
spec:
  validationFailureAction: enforce
  rules:
    - name: check-sql-queries
      match:
        resources:
          kinds:
            - Deployment
      validate:
        message: "Use parameterized queries for SQL calls."</code></code></pre><p>Meanwhile, rotating database credentials and storing them as Secrets:</p><pre><code><code>kubectl create secret generic db-credentials --from-literal=username='dbUser' --from-literal=password='S3cur3Pa$$'</code></code></pre><p>further limits damage if an injection does occur.</p><h3>1 &gt; Confusion Attacks: Exploiting Hidden Semantic Ambiguity in Apache HTTP Server</h3><p>Apache HTTP Server is common in k8s, often as a reverse proxy or ingress. Confusion attacks exploit hidden ambiguities to bypass security. Keeping Apache up to date (for instance, <code>FROM httpd:2.4.57</code> in your Dockerfile) and enabling ModSecurity in <code>httpd.conf</code>:</p><pre><code><code>LoadModule security2_module modules/mod_security2.so
&lt;IfModule security2_module&gt;
  SecRuleEngine On
&lt;/IfModule&gt;</code></code></pre><p>will help catch ambiguous or malformed requests before they can cause trouble.</p><h3><strong>What This Means for Kubernetes in 2025</strong> </h3><p><em>The 2024 Top 10 Web Hacking Techniques</em> list is a reminder that the threat landscape never stops evolving. For k8s teams, it means staying proactive and prepared. Patch everything, harden your configurations, and monitor for unusual behavior. Keep up with new research and adapt your defenses accordingly. K8s is powerful, but it&#8217;s also a big target&#8212;learning from these top 10 techniques will help you stay ahead of the curve in 2025 and beyond.</p><p>As important as building a strong infrastructure is, it&#8217;s crucial to remember that attackers are constantly developing more advanced techniques to infiltrate your systems. Ensuring k8s security requires staying up to date and continuously adapting.</p>]]></content:encoded></item><item><title><![CDATA[K8s-Security-1]]></title><description><![CDATA[A TALE OF TWO RBAC CONFIGURATIONS]]></description><link>https://mesutoezdil.substack.com/p/k8s-security-1</link><guid isPermaLink="false">https://mesutoezdil.substack.com/p/k8s-security-1</guid><dc:creator><![CDATA[AR-Kube (Mesut Oezdil)]]></dc:creator><pubDate>Mon, 03 Feb 2025 13:02:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4bia!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>K8s is powerful, but a small misconfiguration can lead to major breaches. In real production envs, it&#8217;s common for developers or operators to accidentally grant &#8220;temporary&#8221; high-level privileges&#8212;and forget to remove them. </p><p>This article shows exactly that risk: (i) Vulnerable Scenario: A ServiceAccount with cluster-admin rights, (ii) Secure Scenario: A safer approach, including minimal RBAC, non-root containers, and optional network restrictions. Let me give you a little spoiler beforehand:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://mesutoezdil.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">This Substack is reader-supported. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4bia!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4bia!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif 424w, https://substackcdn.com/image/fetch/$s_!4bia!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif 848w, https://substackcdn.com/image/fetch/$s_!4bia!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif 1272w, https://substackcdn.com/image/fetch/$s_!4bia!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4bia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif" width="1306" height="792" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:792,&quot;width&quot;:1306,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:552118,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4bia!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif 424w, https://substackcdn.com/image/fetch/$s_!4bia!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif 848w, https://substackcdn.com/image/fetch/$s_!4bia!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif 1272w, https://substackcdn.com/image/fetch/$s_!4bia!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b0af72d-8ebc-4260-9bdb-8193fd63f502_1306x792.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Each step highlights what is happening and why it matters.</p><h3><strong>PART I &#8211; VULNERABLE SCENARIO</strong></h3><p>In this first part, we create resources in the default namespace, intentionally exposing them to a big security hole.</p><h4>Over-Privileged ServiceAccount: <code>debug-sa.yaml</code></h4><p>A ServiceAccount is like an &#8220;identity&#8221; for pods. Giving it cluster-admin means your pods can do anything in the cluster&#8212;list secrets, modify nodes, etc.</p><pre><code>############################################
# debug-sa.yaml
# over-privileged sa with cluster-admin role
############################################
apiVersion: v1
kind: ServiceAccount
metadata:
  name: debug-sa
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: debug-sa-cluster-admin-binding
subjects:
- kind: ServiceAccount
  name: debug-sa
  namespace: default
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io</code></pre><p>The ClusterRoleBinding references cluster-admin. This effectively gives root-like power to any pod using debug-sa. If an attacker gains access to the pod, they could control the entire cluster.</p><h4>&#8220;Arkube&#8221; Deployment Using <code>debug-sa</code>: <code>arkube-deployment.yaml</code></h4><p>We show a normal nginx-based app, but it&#8217;s assigned this dangerously powerful serviceAccount. Now the nginx pod can manipulate cluster resources far beyond its legitimate need.</p><pre><code>##################################################
# arkube-deployment.yaml
# basic nginx but with cluster-admin via debug-sa
##################################################
apiVersion: apps/v1
kind: Deployment
metadata:
  name: arkube
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: arkube
  template:
    metadata:
      labels:
        app: arkube
    spec:
      serviceAccountName: debug-sa   # over-privileged sa
      containers:
      - name: arkube-container
        image: nginx:alpine
        ports:
        - containerPort: 80</code></pre><p>A simple container now has the power to manage or read any resource in the cluster. Attackers can exploit any small vulnerability in the container or app to escalate privileges fully.</p><h3>A Secret with a DB Password: arkube-db-secret.yaml</h3><p>In real life, you store sensitive data (API keys, DB credentials) in k8s Secrets. The next step will show how these can be easily exposed if the serviceAccount is too permissive.</p><pre><code>##################################################
# arkube-db-secret.yaml
# holds "supersecret" as a base64-encoded password
##################################################
apiVersion: v1
kind: Secret
metadata:
  name: arkube-db-secret
  namespace: default
type: Opaque
data:
  # echo -n 'supersecret' | base64 =&gt; c3VwZXJzZWNyZXQ=
  password: c3VwZXJzZWNyZXQ=</code></pre><p>Because debug-sa is cluster-admin, any pod using it can read or even modify this secret.</p><h4>Malicious Pod: <code>malicious-pod.yaml</code></h4><p>We simulate an attacker&#8217;s behavior: a pod that uses debug-sa to show how it can list secrets. This can be extended to reading the actual secret values, installing cryptominers, or deleting your entire environment.</p><pre><code>##################################################
# malicious-pod.yaml
# attacker's pod that repeatedly lists secrets
##################################################
apiVersion: v1
kind: Pod
metadata:
  name: malicious-pod
  namespace: default
spec:
  serviceAccountName: debug-sa
  containers:
  - name: malicious-container
    image: bitnami/kubectl:latest
    command: ["/bin/sh"]
    args:
      - "-c"
      - |
        echo "Malicious pod running..."
        while true; do
          echo "[*] Listing secrets in default namespace:"
          kubectl get secrets -n default
          echo "Sleeping 20s..."
          sleep 20
        done</code></pre><p>If you check logs (kubectl logs malicious-pod), you will see your arkube-db-secret among others. Could decode the base64-encoded password, pivot to your database, or read more critical data.</p><h4>Cleaning Up the Vulnerable Setup</h4><p>If you need to remove these insecure resources:</p><ul><li><p>kubectl delete pod malicious-pod</p></li><li><p>kubectl delete deployment arkube</p></li><li><p>kubectl delete -f debug-sa.yaml</p></li><li><p>kubectl delete secret arkube-db-secret</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sq4J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sq4J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png 424w, https://substackcdn.com/image/fetch/$s_!Sq4J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png 848w, https://substackcdn.com/image/fetch/$s_!Sq4J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png 1272w, https://substackcdn.com/image/fetch/$s_!Sq4J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sq4J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png" width="1456" height="883" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:883,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:655794,&quot;alt&quot;:&quot;If you want to try all the commands and experience the results yourself, here is a great platform: https://labs.iximiuz.com/tutorials?category=networking&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="If you want to try all the commands and experience the results yourself, here is a great platform: https://labs.iximiuz.com/tutorials?category=networking" title="If you want to try all the commands and experience the results yourself, here is a great platform: https://labs.iximiuz.com/tutorials?category=networking" srcset="https://substackcdn.com/image/fetch/$s_!Sq4J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png 424w, https://substackcdn.com/image/fetch/$s_!Sq4J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png 848w, https://substackcdn.com/image/fetch/$s_!Sq4J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png 1272w, https://substackcdn.com/image/fetch/$s_!Sq4J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eb1b40e-fe44-472a-9799-bde5a6b76589_2774x1682.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>PART II &#8211; SECURE SCENARIO</h3><p>Next, we fix everything by following best practices: (i) Use a dedicated namespace for your app, (ii) Apply minimal RBAC (a Role + RoleBinding) to your ServiceAccount, (iii) Run your container as a non-root user to prevent host-level compromise, (iv) (Optional) Add a NetworkPolicy for restricting traffic within the cluster.</p><h4>Create a Namespace: <code>arkube</code></h4><p>Using a separate namespace is a fundamental practice. It helps keep your resources organized and simplifies the scope of RBAC rules. Isolation and clarity. The &#8220;arkube&#8221; namespace is where we will deploy everything for this service.</p><ul><li><p>kubectl create namespace arkube</p></li></ul><h4>Minimal RBAC: <code>arkube-sa.yaml</code></h4><p>Now we define an ordinary ServiceAccount (arkube-sa), a Role that only lets it &#8220;get&#8221; and &#8220;list&#8221; pods, services, and secrets in the &#8220;arkube&#8221; namespace, and a RoleBinding that associates them.</p><pre><code>########################################################
# arkube-sa.yaml
# a minimal role for reading resources, and a roleBinding
########################################################
apiVersion: v1
kind: ServiceAccount
metadata:
  name: arkube-sa
  namespace: arkube
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: arkube-role
  namespace: arkube
rules:
- apiGroups: [""]
  resources: ["pods", "services", "secrets"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: arkube-rolebinding
  namespace: arkube
subjects:
- kind: ServiceAccount
  name: arkube-sa
  namespace: arkube
roleRef:
  kind: Role
  name: arkube-role
  apiGroup: rbac.authorization.k8s.io</code></pre><p>Key Differences from the Vulnerable Scenario: (i) No cluster-wide privileges (cluster-admin) and (ii) The ServiceAccount can only read certain resource types (pods, services, secrets) within the arkube namespace. Even if an attacker lands in your pod, they cannot see or alter resources outside this namespace (let alone the entire cluster).</p><h4>Secure Deployment: <code>arkube-deployment-secure.yaml</code></h4><p>We re-deploy &#8220;arkube&#8221; (still a basic nginx app), but we do two major improvements: (i) Use the limited arkube-sa, and (ii) Run as a non-root user to reduce container breakouts.</p><pre><code>#################################################################
# arkube-deployment-secure.yaml
# non-root user, minimal RBAC, safer environment for arkube
#################################################################
apiVersion: apps/v1
kind: Deployment
metadata:
  name: arkube
  namespace: arkube
spec:
  replicas: 1
  selector:
    matchLabels:
      app: arkube
  template:
    metadata:
      labels:
        app: arkube
    spec:
      serviceAccountName: arkube-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
      containers:
      - name: arkube-container
        image: bitnami/nginx:latest
        env:
          - name: NGINX_HTTP_PORT_NUMBER
            value: "8080"
        ports:
        - containerPort: 8080
        securityContext:
          allowPrivilegeEscalation: false</code></pre><p>Why Non-Root? By default, Nginx tries port 80, which typically requires root privileges. Using runAsUser: 1000 + port 8080 ensures the container does not run as root. And allowPrivilegeEscalation: false further blocks any attempt to gain super-user powers from inside the container. Your application can function normally (serving on port 8080), but an attacker inside the container has far fewer escalation paths.</p><h4><strong>NetworkPolicy</strong></h4><p>If your cluster supports NetworkPolicy (e.g., using Calico, Cilium, etc.), you can limit how pods communicate. For instance, you may only want traffic from an Ingress controller or only allow egress to a DB namespace.</p><pre><code>##################################################
# networkPolicy-secure.yaml
# restricts inbound/outbound traffic for arkube pods
##################################################
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-limited-ingress-egress
  namespace: arkube
spec:
  podSelector:
    matchLabels:
      app: arkube
  policyTypes:
  - Ingress
  - Egress
  ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            ingress: "allowed"
  egress:
    - to:
      - namespaceSelector:
          matchLabels:
            db: "allowed"</code></pre><p>Even if an attacker gets into your arkube pod, they cannot pivot to other namespaces or services that are not labeled as allowed.</p><h4>Testing a Malicious Pod Again: <code>malicious-test.yaml</code></h4><p>Finally, to confirm security, we replicate a malicious attempt in the arkube namespace, using the arkube-sa. We try to list secrets in the default namespace. It should fail now.</p><pre><code>##########################################################
# malicious-test.yaml
# attacker in 'arkube' tries listing secrets in 'default'
# should get "forbidden" if minimal RBAC is working
##########################################################
apiVersion: v1
kind: Pod
metadata:
  name: malicious-test
  namespace: arkube
spec:
  serviceAccountName: arkube-sa
  containers:
  - name: test-container
    image: bitnami/kubectl:latest
    command: ["/bin/sh"]
    args:
      - "-c"
      - |
        echo "Follow me in linkedin..."
        kubectl get secrets -n default
        echo "We expect a Forbidden error here."
        sleep 300</code></pre><p>When you run:</p><ul><li><p>kubectl apply -f malicious-test.yaml </p></li><li><p>kubectl logs -n arkube malicious-test</p></li></ul><p>You should see an error like:</p><pre><code>Error from server (Forbidden): secrets is forbidden:
User "system:serviceaccount:arkube:arkube-sa" cannot list resource "secrets" 
in API group "" in the namespace "default"</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TJa4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TJa4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png 424w, https://substackcdn.com/image/fetch/$s_!TJa4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png 848w, https://substackcdn.com/image/fetch/$s_!TJa4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png 1272w, https://substackcdn.com/image/fetch/$s_!TJa4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TJa4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png" width="1456" height="605" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:605,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:547971,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TJa4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png 424w, https://substackcdn.com/image/fetch/$s_!TJa4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png 848w, https://substackcdn.com/image/fetch/$s_!TJa4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png 1272w, https://substackcdn.com/image/fetch/$s_!TJa4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c003de4-a12e-4a2f-a7f0-cdb7841054ab_3098x1288.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is exactly what we want&#8212;the attacker&#8217;s attempt fails because of minimal RBAC.</p><p>I feel like some readers are saying now, &#8220;No one makes such simple mistakes in the real world.&#8221; Here are a few extra (advanced) tips and hints:</p><ul><li><p>Whenever you finish debugging, remove or rotate any ServiceAccount tokens or credentials you used. This helps prevent &#8220;temporary&#8221; elevated privileges from lingering in your env. </p></li><li><p>In many clusters, the default ServiceAccount can be used inadvertently by pods. Consider setting automountServiceAccountToken: false on pods unless explicitly needed. </p></li><li><p>Use tools like Trivy or Gitleaks to detect whether any token, password, or certificate was accidentally committed to source code. </p></li><li><p>Tools like OPA Gatekeeper or Kyverno can block pod deployments that request overly broad privileges (e.g., privileged: true or hostPath mounts). </p></li><li><p>K8s can produce detailed audit logs about who did what, and when. Forward these logs to a SIEM system or use a tool like Falco in real-time to detect suspicious behaviours (e.g., someone spawning a shell in a container).</p></li><li><p>Depending on your k8s version and setup, Pod Security Admission or a PSS alternative can enforce baseline, restricted, or privileged rules for pods. This helps ensure containers don&#8217;t run as root unless absolutely necessary, and can limit privilege escalation.</p></li><li><p>For NetworkPolicy and more advanced resource governance, consistent labeling of namespaces (e.g., ingress=allowed or db=allowed) simplifies writing and maintaining policies. </p></li></ul><h3>Conclusion</h3><p>In the vulnerable setup, one ServiceAccount with cluster-admin rights gave any pod full control of the cluster. As a result, a malicious pod could easily read secrets such as arkube-db-secret, which is extremely risky in a real environment.</p><p>In the secure setup, we created a dedicated arkube namespace, used a minimal Role/RoleBinding to limit access, ran containers as non-root to prevent easy privilege escalations, and (optionally) added a NetworkPolicy to control connections. These steps greatly reduce potential damage if someone breaks in.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Wrs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Wrs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif 424w, https://substackcdn.com/image/fetch/$s_!4Wrs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif 848w, https://substackcdn.com/image/fetch/$s_!4Wrs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif 1272w, https://substackcdn.com/image/fetch/$s_!4Wrs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Wrs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif" width="1334" height="763" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:763,&quot;width&quot;:1334,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:395800,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4Wrs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif 424w, https://substackcdn.com/image/fetch/$s_!4Wrs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif 848w, https://substackcdn.com/image/fetch/$s_!4Wrs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif 1272w, https://substackcdn.com/image/fetch/$s_!4Wrs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf88b94e-a4dc-4dcf-b0ff-d19dddc32265_1334x763.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To go even further, you can store secrets in an external vault, scan container images for known issues, add checks for weak configurations in your CI/CD pipeline, and use real-time monitoring tools like Falco. By following these guidelines, you significantly lower the chance of a major breach and keep your cluster safer. </p><p>Special thanks to <a href="https://www.ardanlabs.com/">ArdanLabs</a> for their amazing educational videos on K8s and Golang, and for their support in this journey. ArdanLabs is a well-known technology company specializing in software development, training, and consulting, particularly in Go (Golang) and Kubernetes. They provide high-quality educational content, workshops, and courses to help developers and enterprises build efficient and scalable apps.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://mesutoezdil.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">This Substack is reader-supported. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>