You know the pause. Item Definition open in one window. ISO 26262 normative text in another. A blank hazard table waiting to be filled.
The standard is clear about what you must produce. It says very little about how to produce it. That gap consumes weeks of expert engineering time on every project.
The work is necessary. It is also repetitive, structurally similar, and hard to scale with traditional processes. Generative AI is an obvious lever. It is also an obvious way to introduce silent, high impact failure into safety critical systems.
So the question is not whether to use it. The question is where it belongs, and where it absolutely does not. AegisSafeForge is our answer.
The honest problem
LLMs are very good at producing things that look like HARA tables. They are also very good at producing things that look correct while being subtly wrong.
We test this constantly. Give a capable model a real Item Definition and the output is immediately convincing. Clean ASIL assignments. Plausible severity reasoning. Fluent hazard descriptions. Then you inspect it closely.
An ASIL B quietly rounded up to D. A clause cited that does not say what it claims. Identical controllability ratings assigned to scenarios any trained engineer would immediately distinguish.
None of these errors are visible to a non expert. All of them are audit failures in an ISO 26262 context.
This is not a prompting problem. It is a category problem. Safety analysis is governed by deterministic systems. Lookup tables. Derivation rules. Traceability constraints. Strict logical dependencies. LLMs are not built for that class of reasoning.
But they are useful in other parts of the workflow. They can draft hazard descriptions. They can surface plausible failure modes. They can structure a first pass at a safety artifact. So we stopped asking whether AI can replace safety engineers. We asked something more precise.
Where can AI carry real weight? Where must it be structurally prevented from making decisions?
What we built
AegisSafeForge is an AI assisted platform for ISO 26262, ISO/SAE 21434, and IEC 61511 workflows. It generates first drafts of safety and security artifacts such as HARA, TARA, FMEA, safety goals, and requirements, with additional artifact workflows being expanded across industries. Then it forces every output through a workflow that cannot be bypassed.
The core principle is simple. The LLM does semantic generation. The engineer provides judgment. Deterministic logic enforces correctness. Each layer does what it is good at. Each layer is prevented from doing what it is not. In practice, this is enforced through four guardrails.
Guardrail 1: Standards grounded retrieval
Before the model writes anything, the system retrieves relevant clauses from authoritative sources. This includes ISO 26262, ISO/SAE 21434, IEC 61511, and project documentation stored in Qdrant. Up to 6,000 characters of grounded context is injected into the prompt.
This changes the failure mode. A model relying on training memory can hallucinate or misremember clauses. A model reading the actual standard text is constrained by it. It also makes updates straightforward. When a standard changes, we embed the updated text again. All later analyses operate on current ground truth. No retraining is required.
Guardrail 2: Deterministic safety scoring
This is the part auditors care about most. The model proposes hazard descriptions along with Severity, Exposure, and Controllability values. We accept only S, E, and C. We discard the ASIL. Every time.
ASIL is recomputed strictly using the ISO 26262 lookup table in Part 3, Table 4. There is no exception path.
Why this matters is simple. ASIL determines everything downstream. Safety goals. Verification scope. Hardware targets. Compliance burden. A single incorrect ASIL upgrade can add millions in unnecessary validation effort. A silent downgrade can weaken the entire safety case. Both may look plausible on paper. Neither survives audit scrutiny.
So the rule is absolute. S0 always results in QM. S3 plus E4 plus C3 always results in ASIL D. No configuration changes this behavior. No model output overrides it.
Guardrail 3: A workflow nobody can bypass
Every AI generated row enters the system as AI suggested. It must move from AI suggested to edited, then to needs review, and then to approved. Only approved artifacts can be exported for compliance use.
The important part is not the state names. It is that enforcement happens at the data layer. You cannot approve without human review. You cannot export incomplete HARA sets. You cannot bypass missing S, E, or C values. You cannot generate from incomplete Item Definition data. Compliance is not a feature toggle. It is the system boundary.
Guardrail 4: Audit trail as a first class system
Every change is written to an immutable audit log. Field edits. State transitions. User identity. Timestamps. System actions. Auditors can reconstruct the full lifecycle of every decision. Engineer A adjusts S2 to S3. Lead B approves it four hours later. That chain is permanent.
This is also what makes AI acceptable in this workflow at all. The audit trail is not documentation after the fact. It is the mechanism that proves humans were actually in control of safety decisions.
What we are still figuring out
There are three areas where we do not yet have satisfying answers.
Should the AI explain itself? We experimented with structured rationale fields alongside generated outputs. The problem is not technical. It is behavioral. LLM generated reasoning carries the same failure modes as its outputs. In some cases, it creates false confidence rather than clarity. Several engineers reviewing early versions said the explanations made incorrect outputs feel more trustworthy, not less. We are still evaluating whether explanation helps auditability or undermines it.
SOTIF and the limits of HARA. ISO 21448 introduces a different problem class. Failures are not component faults. They are performance limitations in perception and decision making systems. Traditional HARA structure maps poorly onto this domain. We can support ISO 26262 workflows well today. We do not yet have a strong model for how AI should assist SOTIF analysis at scale. If you have worked in this space, we would value your input.
Standards drift in live programs. Standards evolve. Programs do not reset when they do. Our architecture handles updates cleanly. We embed the revised standard again, and new analyses use updated clauses. The harder problem is organizational. What happens when a standard changes mid program, but the project is already partly approved? Different teams handle this differently. We have an approach, but it is not robust enough for every real case. We are actively learning here.
Next on this blog
In the next posts, we will go under the hood of how AegisSafeForge actually works, starting with the retrieval architecture that grounds AI proposals in standards and project context. Later, we will break down the platform layers that separate probabilistic generation, deterministic safety logic, workflow enforcement, and auditability.
More importantly, we will explain the trade offs behind those decisions. This architecture is not optimized for simplicity or speed. It is optimized for something harder. Traceability under regulatory scrutiny. Reproducibility across standards changes. Strict control over where AI ends and deterministic safety logic begins. We will also look at what we deliberately did not build. We did not collapse everything into a single LLM pipeline. That choice would break down in real ISO 26262 workflows.
Design Partners
If you want to see the deterministic ASIL recomputation in action on one of your own item definitions, we are currently opening 5 design partner slots with 12 weeks of free access in exchange for product feedback.