KnowledgeForge
The model picks the wrong kind of thinking for the problem in front of it — brainstorming when it should be debugging, hedging when it should be deciding, inventing structure when it should be following one — and the whole chain degrades from there.
Swap GPT-5 for Opus 4.7 and the same problem shows up wearing different syntax.
KnowledgeForge is the layer that fixes that.
A multi-platform reasoning framework for AI agents. It classifies the type of decision in front of the model, routes the work to a specialized mode, and enforces deterministic checks before any LLM judgment is invoked.
Runs on Claude Code, ChatGPT, Gemini, and Cursor.
KF patches the model's weaknesses. It doesn't scaffold its strengths.
Deterministic first. Before invoking LLM judgment, exhaust deterministic checks. Before fixing, reproduce. Before acting, triage.
Most agent frameworks try to make the LLM do more. KF makes it do less, better. Every module exists because something specific was failing in production — hypothesis-skipping, hidden trade-offs, hallucinated groundings, runaway scope, stale knowledge surfacing as fact.
KF is a stack of modules — each one a targeted patch for a known failure mode in LLM reasoning. Six worth describing:
Every request gets sorted into one of four types — Reckoning (verifiable answer, give it directly), Evaluative (judgment against existing criteria), Predictive (future state), or Novel (no precedent, requires reasoning expansion). The classifier is the gate that decides whether the LLM should think hard or shut up and answer.
Six runtime checks that watch the agent watch itself — stuck detection, hypothesis fixation, scope drift, premature closure. When a check trips, the monitor injects an intervention rather than letting the run rot quietly.
Every claim the model makes carries a grounding score. Low-grounding claims get flagged before they propagate. Accretion to long-term memory is gated on grounding — bad reasoning doesn't get to become institutional knowledge.
Eight hard limits on what an agent can do per mode — token spend, tool calls, scope expansion, autonomy level. Two-layer safety so a single jailbroken prompt doesn't blow up a deployment.
Facts decay. KF tracks the staleness window for every piece of stored knowledge and refuses to surface it as current when it's past its half-life. Importance-weighted — high-stakes facts expire faster than low-stakes ones.
Three-tier risk classification with mode-specific capability profiles. The Critic mode can read anything but can't write. The Builder can write but can't ship. The Debugger can run scripts in sandbox but not in prod. Permissions are structural, not vibes.
Plus modules for Calibration, Salience Allocation, Memory Architecture, Knowledge Accretion, Semantic Wiki Search, Taxonomy Enforcement, Verbatim History Mining, and Entity Relationship Analysis. Fourteen modules total.
The modules feed nine specialized agents:
The orchestrator routes between them based on what the request actually needs — most don't need any mode at all, which is part of the point.
PRIVATE BETA
Currently running in production across several deployments. MIT open-source release planned once the API surface stabilizes.
Early access goes to people who can describe a specific failure mode they're trying to patch.