Most of an AI feature isn't AI

Every AI feature on the internet is described as if the AI is the difficult part. Get the LLM. Wire the LLM. Choose the LLM. The LLM is the interesting bit.

A few weeks ago I started building an AI-powered feature on DomainDash. Today, it ships — a short written paragraph at the top of your dashboard, telling you in plain English what's worth looking at today. Of the ~1,500 lines of new code that went into the build, about fifty call the LLM. The other 1,450 are unglamorous data plumbing.

The plumbing was the hard part. This post is about the plumbing.

What RAG actually is

Before we get into the boring half, a quick definition.

RAG (Retrieval Augmented Generation) is three words for "look things up in advance and paste them into the prompt." That's it. The generation is whatever the LLM does when it sees your prompt. The interesting work — the augmentation — is everything that happens before the LLM gets called.

The reason RAG exists is straightforward. LLMs are bad at facts they weren't trained on, and very good at writing prose about facts you give them. So if you can stuff the right facts into the prompt before sending it, the model does a much better job. RAG is the routing layer between your data and the LLM's writing layer.

When people say a system "uses RAG", what they usually mean is: there's a retrieval step that picks some relevant text from a corpus and bolts it onto the prompt. Sometimes the retrieval is fancy (vector embeddings, semantic similarity). Sometimes it's a grep. The interesting design questions live in that retrieval step, not in the LLM.

What we feed the model

For every site in your DomainDash account, the prompt gets two things:

A snapshot of the site. Current status, recent incidents, uptime across various windows, when the SSL cert was last issued, when the domain was last renewed, and so on. Live data, queried from the database every time.
Relevant chunks of our documentation. Specifically: the bits of the public docs that explain what the signals in the snapshot mean. If the snapshot includes a near-expiry SSL cert, the model gets the docs page about SSL renewals. If there's an active incident, it gets the page about incident triage.

The model takes both, ties them together into a short narrative, and emits a paragraph plus up to three bullet points.

That's the whole feature. The model isn't doing anything clever — it's writing prose from inputs we've pre-staged. The rest of this post is about how those inputs get assembled.

Input #1: the corpus

We already have a public documentation site at docs.domaindash.io. It's a VitePress site: Markdown files with frontmatter, no exotic CMS. The corpus the LLM sees is that site, run through a build step.

Frontmatter, augmented

Each guide page in the documentation repo gets an insights: block in its frontmatter. It looks like this:

yaml

---
title: SSL certificate renewals
description: How we check SSL expiry and what triggers a warning.
insights:
  summary: SSL certificates are checked daily and warned about before expiry.
  tags: [ssl, certificate, renewal, expiry, https]
  causes:
    - Let's Encrypt rotation failed
    - Certificate authority outage
    - Manual cert installed without renewal automation
  actions:
    - Check the ACME client logs
    - Verify DNS-01 validation isn't blocked
    - Re-issue manually if blocked
---

The insights: block is the only thing the build step cares about. The rest of the doc page — prose, screenshots, examples — is for humans on the web; the insights: block is for the model. The two stay coupled because they live in the same file: you can't ship a doc update without thinking about how the model should interpret the signal it relates to.

The build step

A small TypeScript script in the documentation repo (build-insights-yaml.ts) walks the guide/ tree, plucks out every insights: block, and emits an index.yaml plus one YAML file per page. The output is uploaded to an S3 bucket in eu-west-1.

The build runs as a CI step on every merge to main in the docs repo. Net effect: when the documentation team ships an updated guide, the corpus updates automatically.

Why YAML, not vector embeddings? Because the corpus is small. There are about forty pages. Tag-based retrieval over forty entries is cheaper than a vector lookup, deterministic (the same snapshot retrieves the same docs every time), and auditable (you can read the entire corpus in a single file). Embeddings plus a vector database is the right answer when your corpus is too big to fit in RAM. Ours fits in RAM twice over.

This is one of those design decisions where the right answer is "you don't need the thing." It's the most boring half of any RAG architecture conversation, and it's the right call any time you don't have ten thousand documents.

S3 as a Laravel disk

The platform application is Laravel. The bucket is mounted as a Laravel filesystem disk:

php

'insights' => [
    'driver' => 's3',
    'region' => env('INSIGHTS_S3_REGION', 'eu-west-1'),
    'bucket' => env('INSIGHTS_S3_BUCKET'),
    'root' => 'docs',
    'read-only' => true,
],

This sits in config/filesystems.php, alongside the application's other disks. From the rest of the codebase it looks identical to any other storage location: Storage::disk('insights')->get('index.yaml') returns bytes; nothing about the call site cares that those bytes come from S3.

The read-only flag is doing useful work. The corpus is built in CI and never modified at runtime. Anything in the application code that accidentally tries to write to the disk throws instead of silently corrupting the corpus.

Redis as the hot read path

We don't want to fetch the corpus from S3 every time the model needs to see it. So InsightsIndex (the class the rest of the app talks to) loads index.yaml once and caches the parsed result in Redis:

php

public function entries(): array
{
    $raw = Cache::rememberForever(
        $this->cacheKey(),
        fn () => $this->loadFromDisk(),
    );

    return array_map(IndexEntry::fromArray(...), $raw);
}

S3 is the source of truth; Redis is the hot read path. A scheduled Artisan command (insights:refresh-index) re-parses the YAML from S3 and overwrites the Redis payload. That's what runs whenever a docs deploy completes. Cold cache falls back to a lazy S3 fetch automatically, so a fresh Redis instance doesn't take the system down.

Retrieval

The "R" in RAG. The class is called CorpusRetriever, and it does this:

Inspect the snapshot. Tag it by signal: open_incident, ssl_warning, performance_trend_down, domain_near_expiry, and so on.
For each corpus entry, count the overlap between its tags: and the snapshot tags.
Sort entries by overlap, descending. Take the top N.
Return those entries to be embedded in the prompt.

That's the entire retrieval algorithm. Three lines of array_intersect and a sort. No embeddings. No semantic similarity. No reranking model.

It works because the corpus is curated. Every entry's tags: field is chosen by a human who understands what signals trigger that doc. The retrieval is only as good as the tagging. But tagging forty doc pages is an afternoon of work, and the resulting retrieval is correct in a way that's explainable. When the model surfaces the wrong context, we can look at the tags and fix them. When a vector search surfaces the wrong context, you adjust embeddings and pray.

Input #2: time-series queries

The other half of the prompt is the snapshot: the live data about the site. This is where the time-series database earns its keep.

DomainDash stores all check results in TimescaleDB. Every uptime check, every SSL probe, every DNS query lands in a hypertable and rolls up into continuous aggregates as time passes. (More on that in the architecture post.)

The class that builds the snapshot is BuildSiteInsightSnapshot. It composes:

Open incidents. Their severity, duration, what triggered them.
Uptime baselines. 7-day from a hot column on the sites row; 30-day and 90-day from continuous-aggregate reads.
90-day incident history. Recent incidents grouped by severity bucket. So the model knows whether "current incident on example.com" is unusual or routine for this site.
SSL renewal history. When the certificate was last issued, when the previous one was. So a near-expiry cert that was issued 18 days ago reads as a routine Let's Encrypt rotation, not an emergency.
Domain registration history. When the domain was last renewed. Same logic: a yearly renewal at its annual cycle is plumbing, not a problem.

The 30- and 90-day reads use TimescaleDB's time_bucket() against the rollup tables:

php

DB::table('uptime_checks_hourly')
    ->where('site_id', $site->id)
    ->where('bucket', '>=', now()->subDays(90))
    ->selectRaw("time_bucket('1 day', bucket)::date::text as date")
    ->selectRaw('avg(uptime_percent) as uptime_percent')
    ->groupBy('date')
    ->orderBy('date')
    ->get();

A query like "90 days of uptime, aggregated to one bucket per day" is a single continuous-aggregate read. Without TimescaleDB you're either scanning ~500,000 raw rows (90 days × 4 regions × 1 check/minute) or maintaining your own daily rollup tables and watching them like a hawk.

The renewal histories are what let the model be quiet about routine signals. A naive system that just saw "SSL expires in 4 days" would set off alarm bells every 90 days for every Let's Encrypt-protected site on the platform. By also handing the model when the cert was last issued, we give it enough context to recognise the difference between "this is fine, it'll auto-rotate" and "this is genuinely concerning."

This was the single biggest improvement to the feature during the build. The retrieval was already wired up. The prompt was already grounded in docs. What turned the output from noisy-but-impressive into actually-useful was giving the model a sense of normal for each site.

The actual AI bit

After all of that, the LLM call itself is short.

SiteInsightPrompt serialises the snapshot and the retrieved docs into a single JSON object and hands it to BedrockDigestClient. The system prompt sets the tone (plain English, cautious, no jargon) and a JSON schema for the response (one narrative string, up to three bullets). The model (Claude Haiku 4.5, via Bedrock's EU cross-region inference profile) generates a paragraph in around a second.

Max output: 300 tokens. Temperature: 0.4 (we want consistent, fact-grounded prose, not creative writing). Region: eu-west-1, with automatic fail-over across the EU inference profile if a single region is degraded.

That's it. Roughly five lines of SDK call. It's the most expensive single step in the pipeline — every other layer in this post is measured in microseconds — but also the simplest to write.

Caching by snapshot hash

One more piece worth mentioning: we cache the generated paragraph aggressively.

DigestCacheKey constructs a key that includes a SHA-1 of the canonical snapshot JSON, plus the prompt version. The shape is:

digest:team:{id}:site:{id}:{date}:{prompt_version}:{snapshot_hash}

The implication: if the snapshot hasn't changed, the cache key doesn't change, and we serve the same paragraph for up to 36 hours. As soon as the snapshot does change (an incident resolves, a cert gets reissued, uptime drifts), the hash changes, the key changes, and we regenerate.

There is no manual cache busting anywhere in the system. State changes are the cache invalidation. Bumping the prompt version (a config value) cascades a regeneration of every cached insight on the platform. Useful when we've improved the prompt and want everyone to get the new version.

What this isn't

A few things this post doesn't describe, because we didn't build them:

There is no fine-tuned model. We use Claude Haiku 4.5 off the shelf via Bedrock.
There is no vector database. No Pinecone, no Weaviate, no FAISS, no embeddings of any kind. Tag-overlap scoring over a YAML file is the retrieval algorithm.
There is no agent loop. One prompt in, one response out. The model isn't reasoning across multiple steps or calling tools.
The LLM has no database access. It only sees the JSON we hand it. If a fact isn't in the snapshot, the model literally cannot know about it.

Every one of these is a deliberate non-choice. They're the things you reach for when the simpler approach doesn't work. The simpler approach works here.

The shape, in summary

Layer	What it does	Tech
Corpus authoring	Docs pages with `insights:` frontmatter	VitePress, in the docs repo
Corpus build	Walk `guide/`, emit YAML	TypeScript script, runs in CI
Corpus storage	Hot blob store	S3 (eu-west-1)
Corpus mount	Filesystem abstraction	Laravel `Storage` disk, read-only
Corpus index cache	Hot read path	Redis, `Cache::rememberForever`
Index refresh	Pull-through on docs deploy	`insights:refresh-index` Artisan command
Retrieval	Tag-overlap scoring	`CorpusRetriever`, in-process PHP
Site snapshot	Live state + history	`BuildSiteInsightSnapshot` action
Historical reads	Time-series aggregates	TimescaleDB continuous aggregates + `time_bucket()`
Prompt	JSON payload + tone rules	`SiteInsightPrompt`
LLM	Single-shot inference	Claude Haiku 4.5 via Bedrock
Result cache	Hash-based invalidation	Redis, 36h TTL, snapshot SHA-1 in key

Twelve layers. One of them is an LLM. Eleven of them are not.

The lesson, such as it is: when you build an AI feature, most of your time is spent on data engineering, not AI engineering. Decide what the model needs to know. Build the systems that put that knowledge in front of it. The model itself is the easy bit.

What RAG actually is ​

What we feed the model ​

Input #1: the corpus ​

Frontmatter, augmented ​

The build step ​

S3 as a Laravel disk ​

Redis as the hot read path ​

Retrieval ​

Input #2: time-series queries ​

The actual AI bit ​

Caching by snapshot hash ​

What this isn't ​

The shape, in summary ​