Systems Engineering Artificial Intelligence

Human-in-the-loop as a design choice for AI in Systems Engineering

6 min read

Apr 14, 2026 3:04:02 PM

You can trust AI — up to the level at which you can verify it

AI can speed up Systems Engineering: summarizing requirements, drafting specifications, detecting inconsistencies, analyzing changes, preparing verification plans. In construction, infrastructure, energy transition and other complex technical projects, “faster” is only valuable if it remains controllable. SE is not a text production process — it is a proof and decision‑making process. That is precisely why human‑in‑the‑loop is not optional, but an explicit design choice in your way of working.

In this article, I work towards one practical principle: I can trust AI up to the level at which I can verify, trace and audit its output — and everything above that requires explicit human oversight, with clear acceptance rules.

Why human‑in‑the‑loop is indispensable when using AI in Systems Engineering

SE requires evidence, not just plausible text

SE is about managing complexity through explicit agreements: what is the requirement, why does it exist, what is the impact of a change, and how do I demonstrably show that the system complies? Requirements engineering — with quality attributes such as unambiguity, traceability and completeness — is fundamental here. ISO/IEC/IEEE 29148 is a well‑known standard in this domain.

AI output, no matter how well worded, is not yet evidence. It is a proposal that only becomes useful once I can demonstrate which sources were used, which assumptions were made, how it fits into the baseline, and what review and acceptance have taken place. Without that step, AI speeds up text production — while SE needs an accelerator for decision‑making with evidence.

“People make mistakes too” is true — but SE is specifically designed to capture errors in a controlled way "

People make mistakes, especially under time pressure. SE processes — reviews, baselines, change control, V&V, peer checks — exist precisely to systematically reduce errors and make them visible before acceptance. AI adds a new error pattern that falls outside that classical picture: errors that are produced quickly and at scale, are convincingly worded, and have no inherent source citation or reasoning trail.

Incorrect assumptions can quietly propagate into requirements, interfaces, verification claims or safety arguments. Once they are embedded in the chain, they are difficult and costly to correct. In this light, human‑in‑the‑loop is not a sign of distrust in AI — it is risk management aligned with how SE works.

Hallucinations: the problem is real — and cannot be configured away

What research shows

“Hallucination” — model output that is factually incorrect or not well grounded in source material — is not a uniform concept. Definitions and measurement methods differ by context and task. That makes it unrealistic to claim that the problem is definitively solved with a single configuration, prompt, or tool choice.

For SE, this is especially relevant because we work with context‑dependent definitions (interfaces, environmental conditions, design constraints), normative claims (“must”, “shall”, “may not”), and chains of dependencies in which a small error can have major downstream impact.

Where SE is vulnerable

In practice, the risks are highest where:

Numbers and limits matter — tolerances, load cases, performance requirements.
References are crucial — standards, contract requirements, system requirements.
Cross‑document consistency counts — CONOPS, requirement set, interface control, verification plan.

The danger is not only that the output is wrong. The danger is that the team — under time pressure — too quickly labels the output as plausible and incorporates it into artefacts that later carry heavy contractual, safety‑related or operational weight. Convincing output is not proof of correctness. In SE, where evidencing is the norm, plausibility is not enough.

The pitfall: speed plus persuasiveness leads to faster propagation

Automation bias as a human phenomenon

Human factors research has long described that people using automation tools tend to over‑rely on the tool, verify less actively, and detect deviations too late. This is called automation bias or complacency — and it is not a matter of carelessness, but a cognitive phenomenon that arises as soon as tooling is perceived as reliable.

What further reinforces this with AI language models is the presentation quality. An LLM produces fluent, well‑structured text in the correct domain register. That increases perceived credibility — and thus the risk that verification steps are skipped.

How this translates to SE

In construction, infrastructure, energy transition and other complex technical projects, three situations are typically risky:

Requirement interpretation under time pressure: AI provides a neat interpretation of a contract requirement. The team adopts it as “the intent is clear”, while a single nuance later leads to rework or claims.
Change impact analyses: AI produces an apparently complete impact list. The team trusts its completeness and misses an interface dependency or a verification artefact that also needs updating.
V&V artefacts: AI generates test cases that appear logical but do not align with the real acceptance criteria or are not traceable to requirements.

The essence: AI makes it easier to push ahead — and that increases the need for explicit review and acceptance rules.

What does work: verification loops and explicit sign‑off

The pattern: draft → verify → revise

The most robust pattern for AI in evidence‑sensitive processes is the verification loop. AI never produces the final product, only a reviewable proposal. That proposal then goes through an explicit verification step before it is allowed into the baseline. This aligns with existing SE practices — but must be made explicit and embedded as soon as AI tooling is used.

Concrete controls I would secure in SE

1) Output classification — what is this, actually?

Draft / idea: may be shared quickly, but not placed in the baseline.
Proposal for requirement or decision: may only go to review with source and trace.
Baseline candidate: requires explicit sign‑off by an authorized role.

2) Source and traceability requirements for AI output

No source or trace = no acceptance as requirement, interface agreement or verification claim.
Every normative sentence (“must”, “shall”) receives a trace link to a source: contract, standard, stakeholder requirement.
Mandatory challenge step
Have AI (or a second prompt) explicitly search for contradictions or missing conditions.
Have an engineer assess and sign off on that step.

4) Four‑eyes principle for high‑impact items

For items with high impact — safety, compliance, major cost, interface agreements — a second reviewer must perform an independent check.

5) Logging for audit

tore prompt, context, output, sources used and who performed which acceptance — so that you can later reconstruct why something was adopted.

This is human‑in‑the‑loop as process design: the human is not the last resort, but an explicit control layer with authorities and responsibilities.

Governance: from using AI to controlling AI

NIST AI RMF as a structuring framework

For governance, it is useful to structure AI risks instead of reacting ad hoc to incidents. NIST AI RMF 1.0 offers a useful framework with four functions: govern, map, measure, and manage. The core idea: AI risks are not a technical problem of the tool, but a governance issue for the organization that deploys the tool.

For SE projects, this helps position AI not as mere tooling, but as a capability with scope (where to use it and where not), risks (what can go wrong and with what impact), controls (which checks and who signs off) and monitoring (how to detect drift or quality issues).

RACI for acceptance: who is allowed to mark something as “seen and accepted”?

If you make one thing explicit, make acceptance authority explicit. Without this clarity, AI‑driven speed almost automatically leads to faster propagation.

Responsible: who creates or updates the artefact (with AI as a helper)?
Accountable: who accepts and baselines it (and carries the risk)?
Consulted: who must review the content — discipline owners, interface owners.
Informed: who must be kept informed — project management, QA, client.

ISO/IEC/IEEE 29148 remains fully in force as the quality bar. The standard does not change because requirements have been drafted by an AI — the verification threshold stays the same.

Conclusion

You can trust AI up to the level at which you can verify it.

This is not a cynical conclusion — it is a workable and practical principle. It means that AI can be used very effectively to arrive more quickly at analyses and concept artefacts, as long as the verification process is at least as good as, or better than, without AI.

Human‑in‑the‑loop is not a brake on innovation and not a temporary measure until AI is “good enough”. It is an explicit design choice in the SE chain — just as deliberate as a verification milestone or a baseline review. Organizations that set this up properly combine AI’s speed advantage with the evidencing that complex engineering requires.

In high‑impact projects, trust is not a feeling. It is something you build with verification, traceability and demonstrability. AI may accelerate — the human remains the owner of acceptance.

About Basewise: Basewise supports organizations in construction, infrastructure, energy transition, shipbuilding and offshore with the application of Systems Engineering. We combine domain expertise with methodical guidance — including for the responsible use of AI in SE processes.

No Comments Yet

Let us know what you think