Operational Integrity: The Operating Model Heavy Manufacturing Needs

Discipline, Reliability, Resilience, and Leadership Cadence as the Foundation for Sustainable Performance

By Chet Brandon

Heavy manufacturing does not fail all at once. It usually fails in layers.

A procedure is bypassed because the job is urgent. Equipment runs outside normal condition because production needs the tonnage. A near miss is explained away because no one was hurt. A maintenance backlog becomes normal. An alarm becomes background noise. A supervisor accepts a workaround because “that is how we have always done it.”

None of those decisions may look catastrophic in isolation. Together, they signal something larger: the erosion of operational integrity.

Operational integrity is the condition in which an organization consistently runs its assets, processes, people systems, and management routines within defined expectations, with a commitment to excellence for the benefit of all stakeholders. It is not simply safety, reliability, compliance, quality, or production performance. It is the disciplined alignment of all of them into one operating model that protects people, strengthens trust, sustains performance, and creates long-term value.

In heavy manufacturing, operational integrity should be treated as a core business system, not a side initiative.

The Executive Framework

A practical operating model for heavy manufacturing can be stated simply:

Operational Integrity = Operating Envelope + Operational Discipline + Operational Reliability + Operational Resilience + Leadership Cadence

Each element answers a different leadership question.

  • Operating envelope: Do we know the risk boundaries of the operation?
  • Operational discipline: Are we doing the work the right way, consistently?
  • Operational reliability: Will assets, processes, and controls perform as intended?
  • Operational resilience: Can people and teams maintain control when conditions change?
  • Leadership cadence: Are leaders seeing weak signals, making decisions, and following through?

This framework matters because heavy manufacturing risk rarely stays inside one functional box. A single weakness can show up as a safety event, quality defect, production loss, environmental release, maintenance failure, or regulatory exposure.

Operational integrity gives leaders a way to manage those risks as one system.

What Operational Integrity Means

Operational integrity means the operation performs as intended, within known risk limits, with reliable controls, competent people, effective supervision, and credible data.

It asks:

  • Are we operating inside the design and risk envelope?
  • Are critical controls understood, verified, and maintained?
  • Are people doing the work as planned, or has informal practice replaced the standard?
  • Are equipment failures predictable and managed, or are we living in reaction mode?
  • Are leaders seeing weak signals early enough to act?
  • Are corrective actions reducing risk, or only closing items in a database?

The test is not whether the plant had a good month. The test is whether the plant can explain why it had a good month and whether that performance is repeatable under pressure.

That distinction matters. Heavy manufacturing environments are full of energy: heat, pressure, chemicals, molten material, rotating equipment, mobile equipment, suspended loads, confined spaces, electrical systems, dust, and stored mechanical energy. When the management system weakens, the consequences are rarely theoretical.

Operational Integrity as an Overarching Operating Model

Many organizations manage safety, quality, maintenance, production, environmental compliance, and process safety as separate programs. Each has its own metrics, meetings, priorities, audits, and corrective action systems.

That structure may be administratively convenient, but it often misses how risk actually behaves.

Poor shift turnover can create a safety event, quality defect, production loss, and environmental release. Weak preventive maintenance can do the same. So can poor management of change, contractor control, procedure quality, or supervision.

Operational integrity brings these disciplines into one management frame. It treats the plant as an integrated system where reliability, discipline, compliance, safety, quality, environmental performance, and human adaptability are interdependent.

The objective is simple: run the operation the right way, every day, under normal and abnormal conditions.

The Operating Envelope

Every heavy manufacturing process has an intended operating envelope. That envelope includes equipment design limits, process parameters, staffing assumptions, maintenance requirements, permit conditions, safe work practices, quality specifications, and emergency response assumptions.

Operational integrity begins by making that envelope visible and manageable.

Leaders should know which controls are critical. Operators should know which deviations matter. Maintenance should know which assets cannot be allowed to degrade. Engineers should know which changes require formal review. Supervisors should know when a job must stop.

A weak operating envelope is often marked by ambiguity. People may know the production target but not the risk boundary. They may know what “usually works” but not what the standard requires.

In a high-hazard manufacturing environment, ambiguity is risk.

Operational Discipline: Doing the Right Things Consistently

Operational discipline is the human and organizational commitment to execute known standards consistently, especially when the work is difficult, urgent, uncomfortable, or inconvenient.

It is not blind rule-following. It is disciplined execution based on clear expectations, competent people, effective supervision, and a culture that does not normalize drift.

Operational discipline shows up when:

  • Pre-job planning is done because the job needs it, not because an auditor is present.
  • Lockout/tagout is verified, not assumed.
  • Critical procedures are followed, not treated as suggestions.
  • Deviations are escalated, not hidden.
  • Shift turnover communicates risk, not just production status.
  • Leaders challenge workarounds before they become culture.
  • Employees stop and ask when conditions change.

Operational discipline is built through repetition and leadership consistency. People pay close attention to what leaders tolerate. If leaders accept shortcuts during production pressure, that becomes the real standard. If leaders reinforce expectations when the schedule is tight, that becomes the culture.

Discipline is not the enemy of productivity. In heavy manufacturing, discipline is what makes productivity sustainable.

Operational Reliability: Equipment, Processes, and Controls That Hold

Operational reliability is the technical and system capability of assets, processes, utilities, controls, and management routines to perform as intended over time.

A reliable operation does not confuse heroic recovery with good performance. A plant that constantly survives breakdowns, expedites parts, bypasses alarms, and depends on a few experienced employees to “save the day” may be committed, but it is not reliable.

Operational reliability depends on asset integrity, preventive and predictive maintenance, spare parts strategy, engineering standards, process control, inspection programs, and disciplined management of change.

Not all assets are equal. A nuisance failure and a critical control failure should not receive the same attention. The organization must know which equipment protects life, prevents loss of containment, maintains environmental control, protects quality, or prevents major business interruption.

Reliability also includes administrative systems. A poorly maintained procedure, weak training matrix, ineffective corrective action process, or unreliable inspection routine can create as much exposure as a failing pump or valve.

Operational integrity is broader than maintenance excellence. It is not only about keeping machines running. It is about keeping controls effective.

Operational Resilience: Maintaining Control When Conditions Change

If operational discipline is the commitment to follow the standard, and operational reliability is the ability of assets and controls to perform as intended, operational resilience is the human and organizational capacity to maintain control when conditions are no longer normal.

Heavy manufacturing does not operate in a laboratory. Equipment fails. Production pressure rises. Procedures encounter conditions they did not fully anticipate. Staffing changes. Contractors enter the system. Weather disrupts routines. Supply chains affect parts availability. Abnormal situations emerge.

Resilient organizations recognize weak signals early, escalate concerns without hesitation, adapt without abandoning risk controls, recover without creating new hazards, and learn quickly.

Operational resilience is not undisciplined improvisation. It is the capability to adapt intelligently while staying anchored to the controls that matter most.

Resilience depends on several human-centered capabilities:

  • Situational awareness: People recognize changing conditions, weak signals, and abnormal risk.
  • Competence and judgment: Employees and leaders understand not only what to do, but why it matters.
  • Psychological safety with accountability: People can stop, question, escalate, and report concerns without fear, while still being held to clear standards.
  • Adaptive capacity: Teams can respond effectively when the written procedure does not fully match the real condition.
  • Learning discipline: The organization extracts lessons from near misses, deviations, recoveries, and failures.
  • Recovery capability: The plant can stabilize after disruption without creating secondary risk.

Resilience does not replace discipline; it prevents discipline from becoming brittle. It does not replace reliability; it protects the organization when reliability is challenged.

Leadership Cadence: Turning Intent Into Control

Operational integrity requires a leadership cadence strong enough to detect drift before the system fails.

Cadence is the rhythm of review, decision-making, field verification, and follow-through. It is how leaders keep the operating model alive after the kickoff meeting is over.

A strong operational integrity cadence reviews:

  • Critical risk controls.
  • Asset integrity and reliability threats.
  • High-energy events and serious near misses.
  • Management of change quality.
  • Procedure health and field execution.
  • Corrective action effectiveness.
  • Environmental and regulatory compliance stability.
  • Recurring cross-site failure patterns.
  • Human and organizational resilience during abnormal conditions.

The cadence should be practical, not bureaucratic. Leaders should not create another meeting to admire another dashboard. The purpose is to identify weak signals, make decisions, assign ownership, remove barriers, and verify that corrective actions improved control.

The discipline of the meeting matters less than the discipline of the follow-through.

How the Model Works Together

Reliable equipment supports disciplined work. Disciplined work protects reliable equipment. Resilient people and teams maintain control when reliability is stressed or the work no longer matches the plan.

When equipment is unreliable, people create workarounds. When workarounds become normal, discipline weakens. When discipline weakens, equipment is operated improperly, inspections are missed, defects are accepted, and early warning signs are ignored. When resilience is weak, the organization does not recognize drift until the event has already occurred.

The reverse is also true. When standards are clear, preventive maintenance is respected, abnormal conditions are escalated, people are competent to make good judgments, and leaders respond to weak signals, the system strengthens.

This is the heart of operational integrity: technical reliability, human discipline, organizational resilience, and leadership cadence reinforcing each other through a strong management system.

Case Study: Imperial Sugar and the Cost of Lost Operational Integrity

The 2008 Imperial Sugar refinery explosion in Port Wentworth, Georgia, is a powerful case study in the importance of operational integrity.

The event was not simply an explosion. It was the catastrophic result of multiple layers of control weakness aligning over time.

A massive combustible dust explosion and fire killed 14 workers and injured dozens more. The U.S. Chemical Safety Board concluded that the explosion was fueled by significant accumulations of combustible sugar dust throughout the packing building. The primary explosion likely began inside a sugar conveyor beneath large storage silos. That conveyor had been enclosed with steel panels, creating a confined, poorly ventilated space where sugar dust could accumulate to an explosive concentration. The initial explosion then disturbed additional dust on equipment, floors, and elevated surfaces, producing a destructive cascade of secondary explosions.

This event illustrates failure across all five elements of the model.

The operating envelope was not adequately understood or controlled. Combustible sugar dust was a known hazard, but the boundary between normal housekeeping conditions and catastrophic dust accumulation was not effectively managed.

Operational discipline was weak. Housekeeping expectations and dust control practices did not prevent hazardous accumulations. When abnormal buildup becomes normal, the organization has already started to lose control.

Operational reliability was compromised. Equipment design, dust collection, conveying systems, maintenance, and ventilation were not reliable enough to prevent dust release and accumulation. Reliability was not just about keeping equipment running. It was about ensuring equipment did not create or amplify a catastrophic hazard.

Operational resilience was insufficient. A resilient organization recognizes weak signals, learns from smaller fires and precursor events, and escalates concerns before conditions become catastrophic.

Leadership cadence did not drive effective correction. The organization needed a stronger rhythm of hazard recognition, critical control review, housekeeping verification, engineering review, and corrective action closure.

The lesson is clear. Catastrophic events are often preceded by visible signals: buildup, leakage, repeat maintenance issues, minor events, abnormal conditions, unclear ownership, and weak corrective actions. Operational integrity is the leadership system that makes those signals matter before they become history.

Data Must Reveal Control, Not Just Count Events

Heavy manufacturing organizations often track lagging indicators: injuries, environmental events, downtime, quality defects, audit findings, and cost impacts. These measures matter, but they are not enough.

Operational integrity requires data that shows whether the system is in control.

Leaders should look for trends, changes in direction, sudden deviations, recurring causes, outliers, and clustering across departments or sites. Pareto analysis should identify the top three to five recurring drivers creating the greatest risk or operational drag. Best-performing areas should be compared with worst-performing areas to identify transferable practices and missing controls. High-energy near misses, repeat failure types, unclear incident descriptions, and recurring involvement of the same equipment, tasks, or conditions should be treated as red flags, not statistical noise.

The goal is not more charts. The goal is better judgment.

A strong operational integrity review should ask:

  • What is changing?
  • Where is performance unstable?
  • Which failures repeat?
  • Which controls are degrading?
  • Where are we lucky rather than good?
  • Which high-impact exposure can be reduced quickly with reasonable resources?
  • Where are people adapting successfully, and what can we learn from that?
  • Where are people compensating for weak systems, and how long can that continue?

Data should drive decisions, not decorate presentations.

Corrective Actions Must Strengthen the System

A weak corrective action process is one of the most common threats to operational integrity.

Too often, corrective actions are written to close the investigation rather than strengthen the operation. They rely heavily on retraining, reminders, toolbox talks, or procedure revisions. Those actions may have a place, but they are rarely sufficient by themselves.

Operational integrity requires corrective actions that improve control quality. That means applying the hierarchy of controls, balancing engineering, procedural, and behavioral actions, and prioritizing actions that are high-impact, practical, timely, and scalable.

The best corrective actions do at least one of four things:

  1. They eliminate or reduce the hazard.
  2. They make the correct action easier.
  3. They make the wrong action harder.
  4. They improve the organization’s ability to detect and respond to deviation.

Corrective actions should also be coherent. A plant should not have hundreds of disconnected improvement items competing for attention. The work should roll up into a clear improvement strategy tied to the most significant operational risks.

Closure is not the same as control. A corrective action is only effective when it changes the probability or severity of recurrence.

Where to Start

A manufacturing organization does not need to launch a large new program to strengthen operational integrity. It needs to start with the highest-consequence risks and the controls that matter most.

Five actions will create momentum.

1. Define the critical operating envelope

Identify the highest-risk processes, assets, materials, and tasks. Clarify the boundaries that must not be crossed: process limits, equipment limits, permit limits, safe work requirements, staffing assumptions, and emergency response assumptions.

2. Identify the top five operational integrity risks

Use incident history, near misses, audit findings, maintenance data, environmental events, quality losses, and leadership judgment to identify the top recurring or high-consequence exposures. Do not let volume alone drive priority. A low-frequency, high-consequence exposure deserves attention.

3. Verify critical controls in the field

Move beyond paper confirmation. Confirm whether critical controls are present, understood, maintained, and used as intended. Field verification should include operators, maintenance, engineering, EHS, and supervision.

4. Connect reliability work to risk reduction

Review the reliability strategy for assets that prevent serious injury, loss of containment, environmental release, major quality failure, or business interruption. Maintenance priority should reflect consequence, not only downtime.

5. Build a leadership cadence around weak signals

Create a routine leadership review focused on operating envelope deviations, critical control health, high-energy events, repeat failures, management of change quality, corrective action effectiveness, and emerging risk.

The goal is not more meetings. The goal is better control.

Author’s Note: Fully operationalizing operational integrity is beyond the intended scope of this article. The purpose here is to define the concept, explain why it matters, and provide a practical leadership framework for heavy manufacturing organizations. For readers who want to move from concept to execution, I have also prepared an Operational Integrity Implementation Guide that provides a more detailed roadmap, including implementation phases, leadership cadence, roles and responsibilities, metrics, maturity assessment, and practical templates. The guide can be downloaded here: Operational Integrity Implementation Guide_Chet Brandon.pdf

Leadership’s Role in Operational Integrity

Operational integrity cannot be delegated to EHS, maintenance, engineering, or quality. Those functions are essential, but they do not own the full operating model.

Operational integrity belongs to line leadership.

Plant managers, operations leaders, maintenance leaders, technical leaders, and frontline supervisors set the pace. They decide what gets attention. They decide whether standards are real. They decide whether weak signals are acted on or explained away.

Senior leaders must insist on visibility into risk, not just performance outcomes. A green dashboard does not always mean a healthy operation. It may simply mean the organization has not yet experienced the consequence of its drift.

The conversation should be candid and operational. The goal is not blame. The goal is control.

Culture Follows the Operating Model

Many organizations say they want a stronger safety culture, reliability culture, or compliance culture. Those are worthy goals, but culture is not built by aspiration. Culture follows the operating model.

If planning is weak, the culture becomes reactive.

If maintenance is deferred without risk review, the culture accepts degradation.

If supervisors are not trained to recognize critical controls, the culture depends on luck.

If leaders reward production while tolerating procedural drift, the culture learns the real priority.

If people are afraid to report concerns, the organization loses early warning signals.

If corrective actions do not address root causes, the culture sees investigations as paperwork.

Operational integrity creates the conditions for a stronger culture because it aligns expectations, systems, decisions, and follow-through. People trust what they see consistently reinforced.

What Good Looks Like

A mature operational integrity model has visible characteristics.

Leaders understand the highest-risk operations and critical controls.

Operators understand the operating envelope and know when to stop or escalate.

Maintenance strategies are risk-ranked and connected to safety, environmental, quality, and production consequences.

Procedures are accurate, usable, and field-verified.

Management of change is treated as a core control, not an administrative burden.

Near misses and weak signals are valued because they reveal system vulnerability.

Corrective actions are prioritized by risk reduction, not ease of closure.

Performance reviews examine stability, trend movement, and control health.

Sites learn from each other by comparing best performers with poor performers.

Teams adapt when conditions change without abandoning critical controls.

The organization acts before weak signals become major events.

That is operational integrity in practice.

Why It Matters Now

Heavy manufacturing is operating under increasing pressure: labor constraints, aging infrastructure, supply chain volatility, cost pressure, decarbonization expectations, regulatory scrutiny, and higher stakeholder expectations. These pressures do not reduce risk. They amplify it.

The answer is not more programs layered onto already busy organizations. The answer is a clearer operating model.

Operational integrity gives leaders a way to integrate safety, reliability, environmental compliance, quality, production performance, and human adaptability into one disciplined system. It helps the organization move from reactive correction to proactive control and distinguish between good luck and good management.

The standard is not perfection. Heavy manufacturing will always involve complexity, variability, and risk. The standard is disciplined control of the things that matter most.

Operational integrity is how a manufacturing organization earns the right to run. It protects people, assets, communities, customers, and business continuity. It turns values into routines, data into decisions, and leadership expectations into operational reality.

In the end, operational integrity is not a slogan. It is the daily proof that the organization can be trusted to operate with discipline, reliability, resilience, leadership cadence, and control.

Author’s Note

There is a certain irony in this moment for heavy manufacturing. We are surrounded by remarkable technology: advanced automation, predictive analytics, digital twins, artificial intelligence, process control systems, robotics, real-time monitoring, and engineering tools previous generations of manufacturing leaders could hardly have imagined.

And yet, the greatest challenge remains deeply human.

The strength of an operating entity still depends on whether people understand the mission, believe the standards matter, have the capability to execute, and are properly directed by leaders who know what to reinforce. Technology can detect, calculate, automate, and inform. But it cannot replace leadership judgment, workforce engagement, operational discipline, or the credibility created when leaders follow through on what they say matters.

That is the real work of operational integrity. It is not choosing between technology and people. It is using technology to better enable people, while recognizing that people remain the decisive force in whether the system actually works.

The future of manufacturing will be more digital, connected, and intelligent. But the organizations that perform best will still be those that motivate, enable, and direct their people with clarity, discipline, and purpose. Advanced tools may raise the ceiling of what is possible. People determine whether the organization reaches it.

Source Note: Case study information on the 2008 Imperial Sugar refinery combustible dust explosion is based on findings from the U.S. Chemical Safety and Hazard Investigation Board, Investigation Report: Sugar Dust Explosion and Fire, Imperial Sugar Company, Port Wentworth, Georgia, February 7, 2008, Report No. 2008-05-I-GA, September 2009. The CSB reported that the incident resulted in 14 worker fatalities and numerous injuries, and concluded that combustible sugar dust accumulations, enclosed conveyor conditions, inadequate dust control, equipment design, maintenance, and housekeeping contributed to the catastrophic explosion and fire. Weblink: https://www.csb.gov/imperial-sugar-company-dust-explosion-and-fire/

Posted in Culture, Employee Engagement | Tagged , , , , , , , , , , , , , , , , | Leave a comment

Cyber-Physical Risk in the Age of AI: How Safety Professionals Help Directors Make Better Operational Technology Investment Decisions – Part 4

Business professionals in a conference room reviewing cybersecurity and factory operations data on screens and laptops
A Board of Directors discusses cybersecurity metrics and factory operations in a conference room overlooking an industrial facility.

By Fay Feeney and Chet Brandon

Series Context

This four-part series examines how artificial intelligence is reshaping cyber risk in operational technology and what it means for industrial organizations. It brings together perspectives from safety leadership, cybersecurity, operations, and board governance to address cyber-physical risk as an enterprise issue. The series is co-authored by Chet Brandon, a global Environmental, Health & Safety (EHS) and operational risk leader, and Fay Feeney, an expert in board governance and enterprise risk oversight.

The first three articles argued that Operational Technology cyber risk has moved from technical concern to operational and resilience challenge. This final article asks a harder question: How should boards govern, resource, and make capital decisions in response?

Introduction

The next major Operational Technology (OT) capital request your board reviews may not be a modernization project at all. It may be a proposal for new robots, AI-enabled inspection systems, autonomous material-handling assets, or other connected technologies that promise more throughput, less labor strain, and sharper quality performance.

Those proposals often arrive wrapped in the language of innovation and competitiveness. But for directors, the more important question is not whether the technology is impressive. It is whether management is asking the board to approve capital expenditures that build a better business, or simply a faster way to create new categories of operational risk.

That is the boardroom challenge. OT cyber risk is no longer merely a management issue boards oversee; it is increasingly a fiduciary obligation boards must actively govern.

Operational technology now sits at the intersection of strategy, safety, cyber risk, resilience, and capital allocation. When directors approve spending on new OT-enabled assets, they are not merely approving equipment. They approve of a new operating model, a new risk profile, and a new set of assumptions about how the company will create value. Boards that understand this will demand decision-ready information.

Executive summary

This article advances a straightforward proposition: safety professionals who work close to operational technology are among management’s most underused assets in improving business decision quality. They see weak signals before they become failures. They understand how controls perform under pressure. They recognize where workarounds, maintenance drift, procedural gaps, and human-machine interaction begin to erode resilience.

When their observations are translated through operational leadership, the CISO, and enterprise risk management, they give the CEO and the board a clearer picture of what a proposed OT investment really means for the business.

For directors, the lesson is practical. OT is not a narrow engineering or technical issue. It is the equipment and the control and monitoring systems that run the physical side of the business—the plants, production lines, machines, robots, and connected assets that make, move, and deliver value in the real world.

If information technology runs data and transactions, operational technology runs physical processes and assets. As a result, OT failures, design flaws, or cyber intrusions can become safety, production, regulatory, financial, and reputational events with remarkable speed.

The broader governance implication is clear: new OT investments should come into the boardroom as enterprise risk and strategy decisions. The board should expect management to show how those investments change risk appetite, resilience, strategic capacity, and long-term value, not merely productivity or obsolescence.

Why decision quality is essential for digital investments.

Boards are generally disciplined about reviewing outcomes. Yet many spend less time examining the quality of the decision process that produced them. In complex operating business systems, management teams still default too easily to speed, confidence, precedent, and partial information. In manufacturing, where robotics, automation, quality, workforce safety, supply chain, continuity, and cybersecurity increasingly intersect, those habits create dangerous blind spots.

This is why decision quality is now a board issue that deserves a renewed assessment. Directors should not stop at asking, “Do we have the right recommendation?” They should also ask, “Did management frame the issue correctly? Did it examine credible alternatives? Did it test assumptions and develop fallback plans?” In OT-intensive environments, those questions are not process niceties. They are part of governing resilience and enterprise value.

Safety professionals matter here because their disciplines are grounded in structured inquiry. Hazard identification, root-cause analysis, barrier management, scenario assessment, and learning from near misses are not only safety disciplines. They are decision-quality disciplines. They help management resist optimism bias, sunk-cost thinking, and the temptation to mistake a confident proposal for a sound one.

Resetting boardroom decision-ready expectations from leadership

“As Technology, Risk, or Audit Committee members, our role is to oversee Operational Technology and ensure that OT decisions are treated as enterprise risk and strategy decisions. We expect management to bring us OT issues and investments framed through safety, cyber, and ERM lenses, so we can see how they affect the company’s risk appetite, resilience, and long-term value.”

Why safety matters in new OT investments

As manufacturers invest in a new generation of connected physical assets, safety professionals should be viewed as contributors to enterprise judgment, not simply compliance resources. Robots, cobots, autonomous mobile systems, machine-vision tools, and AI-enabled controls do more than improve throughput. They change how people interact with equipment. They alter line dependencies and operating rhythms while expanding the digital attack surface. They create new failure modes and recovery challenges. And they change the company’s future operating risk profile.

That makes safety professionals especially valuable. They are often the first to see where human-machine interaction may be misunderstood, where safeguards may be overestimated, where maintenance assumptions may be unrealistic, or where emergency fallback procedures are too theoretical to be useful.

They help management determine whether a proposed investment is ready for deployment, whether risks are understood, and whether the company is considering the impact on their enterprise risk profile.

For Directors, that is an important distinction. The question is not whether new OT creates value. It often does. The question is whether management has done enough to ensure that the value case and the risk case are being considered together.

In practice, safety professionals do not usually brief the board directly. Their value depends on what happens next—how their observations move upward and are translated.

The first step is operational leadership. Plant managers, engineering leaders, maintenance and reliability teams, production executives, and quality leaders add context around throughput, asset performance, labor implications, customer commitments, and commercial performance. They connect operational evidence to business reality.

That translation becomes more powerful when joined by the CISO and ERM. The CISO contributes the cyber-physical perspective: how connectivity, software updates, vendor access, identity management, remote diagnostics, segmentation weaknesses, and monitoring gaps could affect the safe and reliable operation of robots and other connected equipment. In a connected manufacturing environment, the relevant question is rarely whether a cyber event is “IT” or “OT.” The question is whether it can disrupt physical production, compromise worker safety, or degrade the integrity of critical assets and what level of reputation risk the organization is prepared to accept.

ERM adds portfolio discipline. It helps convert site-level observations into a small number of scenarios, consequence ranges, likelihood bands, and treatment options that can be assessed alongside other capital and strategic choices. This is where management should demonstrate not merely that a project is technically feasible, but that it is a sound decision relative to the company’s risk appetite, strategic priorities, and competing uses of capital.

The CEO’s role is to bring these threads together in a board-ready form. Directors do not need a stack of technical details. They need a board paper that shows how safety, operations, cyber, finance, and enterprise risk perspectives converge on a recommendation. Where that integration is absent, the board is being asked to approve spending. Where it is present, the board is being invited to make a business decision.

Artificial intelligence is increasingly embedded in many of the OT investments now coming before boards. Robotics, machine vision, and connected control systems rely on AI, introducing more complex behavior, tighter interdependencies, and less predictable failure modes. For directors, this does not change the responsibility—it raises the standard for it. Management should demonstrate how these systems perform under both normal and degraded conditions, and how resilience is maintained when assumptions do not hold.

A simple operating model for directors

A useful way for directors to think about this flow is as a simple operating model.

First, safety professionals detect and interpret operating risk. They identify control weaknesses, unsafe interactions, maintenance drift, procedural non-conformance, resilience gaps, and weak signals in OT-dependent operations.

Second, operational leaders integrate those findings with production and asset context, linking them to continuity, quality, labor, customer commitments, and commercial performance.

Third, management, the CISO, and ERM translate the issue into enterprise risk language—documenting scenarios, consequences, alternatives, assumptions, and response options in forms suitable for executive and board review.

Finally, executives and the board receive decision-useful reporting, so the issue appears not as a narrow technical matter but as a governance question involving risk appetite, resilience, capital allocation, and strategic tradeoffs.

Once that model is in place, directors can hold management accountable for using it consistently rather than episodically.

The Artificial Intelligence Operational Technology (AIOT) Resilience Index: Turning Operational Reality into Board-Level Oversight

As OT environments become more connected, automated, and AI-enabled, boards need a practical way to evaluate whether the organization is genuinely becoming more resilient—or simply becoming more technologically complex. That requires more than isolated cybersecurity metrics or compliance reporting. Directors need a disciplined framework that translates operational risk into decision-useful information that can be monitored over time. One approach is the use of an AIOT Resilience Index: a board-level measurement framework designed to evaluate how effectively the organization is managing cyber-physical risk across operations, safety, resilience, and governance.

The purpose of the index is straightforward. It is intended to help directors and executives understand whether the organization is improving its ability to anticipate, withstand, respond to, and recover from disruptions affecting operational technology. Rather than focusing only on technical vulnerabilities, the index evaluates the operational realities that determine whether an organization can continue to operate safely and reliably under strain.

The AIOT Resilience Index combines both reactive and proactive dimensions of performance. The reactive side evaluates capabilities such as incident detection, emergency shutdown readiness, operational recovery, corrective action closure, and crisis communications. These indicators help leadership understand how effectively the organization can stabilize operations and limit consequences once an event occurs. The proactive side focuses on activities more directly within management’s control, including critical asset risk profiling, safeguard integrity, governance maturity, predictive monitoring capability, and strategic investment in modernization and resilience.

Importantly, the index is designed around factors management can actively influence rather than abstract external threat conditions. That distinction matters in the boardroom. Directors cannot govern geopolitical uncertainty or the existence of cyber threats, but they can oversee whether management is systematically reducing exposure, strengthening safeguards, improving resilience, and investing appropriately in risk reduction. The index therefore becomes less a technical scorecard and more a governance tool for evaluating operational readiness and long-term resilience.

The value of the index is not the number itself. Its value is the discipline it creates. A well-constructed index allows directors to identify trends, compare facilities or business units, evaluate whether risk reduction investments are producing measurable improvement, and determine where exposure may exceed the organization’s stated risk appetite. It also helps frame more informed discussions about capital allocation, legacy asset exposure, third-party dependency, and the operational implications of AI-enabled systems.

Used properly, the AIOT Resilience Index becomes a mechanism for connecting plant-floor realities to boardroom oversight. It gives management a structured way to present operational risk in enterprise terms and gives directors a clearer basis for evaluating resilience, strategic readiness, and long-term value protection in increasingly connected industrial environments.

The illustration below shows how the AIOT Resilience Index™ can translate complex operational technology risk into a board-ready view of resilience, readiness, and governance performance. By separating reactive capabilities from proactive risk leadership, the index helps directors see not only how well the organization can respond to disruption, but how effectively management is reducing exposure before an event occurs.

Case study: approving new robots and connected equipment

Consider a board reviewing a request for a $32 million multi-year investment to deploy a new robotic assembly cell, AI-enabled machine-vision inspection system, and autonomous material-handling platform at a high-volume plant. The proposal includes collaborative robots working near people, new safety interlocks, integration with legacy line controls, expanded network connectivity, vendor remote support, and software that coordinates production flow and inspection data.

The initial management narrative is familiar: labor constraints are tightening, throughput can improve, quality escapes can be reduced, and the facility needs more automation to remain competitive. For a board, however, that framing is incomplete.

The real board question is broader. Does this investment materially improve the company’s future strategic capacity and resilience, and does the organization understand the new operating risk profile it is about to create? The board is not simply approving equipment. It is approving a new way of operating.

For that reason, the CEO should bring six elements into the boardroom.

  • Strategic context: how the new robotic system supports growth, margin improvement, workforce availability, customer commitments, and longer-term automation strategy.
  • Current operating risk: where existing manual or semi-automated processes create safety exposure, quality variation, rework, downtime, or throughput constraints.
  • Scenario-based consequences: a small number of realistic situations such as unsafe robot-human interaction, software or sensor malfunction, failure of a safety interlock, network disruption affecting production flow, or vendor access creating cyber-physical vulnerability.
  • Alternatives and tradeoffs: phased deployment, pilot testing, a more limited automation scope, or deferral, each with different cost, risk, and speed implications.
  • Portfolio fit: what other projects this request displaces and why management believes this investment deserves priority now.
  • Execution and contingency planning: operator training, maintenance readiness, cybersecurity controls, vendor dependency, fallback procedures, and the plan if the technology underperforms or deployment takes longer than expected.

This is precisely where safety professionals, the CISO, and ERM add visible value. Safety professionals can identify where workers may be exposed, where safeguarding assumptions are weak, where human-machine interaction is poorly understood, and where recovery procedures are unrealistic.

The CISO can explain how insecure remote support, software patching failures, weak segmentation, or poor access control could magnify the operational consequences of the new asset base.

ERM can place the entire picture into the context of enterprise exposure, risk appetite, and competing capital priorities. Together, they transform an innovation proposal into a strategic capital decision worthy of board judgment.

Questions directors should keep asking

The board’s role is not to second-guess engineering design. It is to insist that management present OT investment in decision-ready form. As directors upgrade their board’s AI, digital, and cybersecurity expertise and raise overall skills, then can begin now by asking a short set of practical questions:

  • How does this investment change our future operating risk profile, not just projected productivity?
  • What assumptions about workforce readiness, vendor support, and change management sit beneath the expected return?
  • Which risks are genuinely reduced, and which new risks are introduced?
  • If rollout slips or technology performs below expectations, what is the fallback position?
  • How are safety, cybersecurity, and resilience being governed together rather than as separate workstreams?
  • Where does this proposal sit relative to other opportunities and risk-reduction investments competing for capital?

These questions do more than improve oversight of one proposal. They raise the standard by which management prepares OT matters for the board.

Some Closing Thoughts

The companies that will benefit most from robotics, connected automation, and intelligent equipment will not necessarily be the ones that buy the most technology first. They will be the ones whose leaders understand that every new OT investment is also a decision about resilience, safety, cyber-physical exposure, and the company’s long-term capacity to perform under strain to deliver innovation at scale.

That is why this conversation belongs in the boardroom. Safety professionals are not merely helping management avoid incidents. Properly integrated with operational leadership, the CISO, and ERM, they help the CEO bring better business decisions to the board—decisions grounded in operating reality, tested against risk appetite, and weighed against strategic alternatives.

When directors insist on that discipline, they do more than improve oversight of OT. They improve the quality of the judgments on which the company’s future value will depend. Their investors and stakeholders will appreciate the dividends that delivers.

In the age of AI-enabled operational technology, the companies that govern this well will not simply reduce cyber risk; they will build safer, more resilient, and more valuable industrial enterprises.

Posted in AI, Artificial Intelligence, EHS Management, enterprise risk management, Leadership, Machine Learning, Sustainability Leadership | Tagged , , , , , , , , , , , , , , , | Leave a comment

Cyber-Physical Risk in the Age of AI: How Safety Professionals Identify and Manage OT Cyber Risk – Part 3

Translating Cyber-Physical Risk into Action

Cybersecurity professionals working together in a control room with multiple monitors displaying global cyber threat data

By Chet Brandon and Fay Feeney

Series Context

This four-part series examines how artificial intelligence is reshaping cyber risk in operational technology and what it means for industrial organizations. It brings together perspectives from safety leadership, cybersecurity, operations, and board governance to address cyber-physical risk as an enterprise issue. The series is co-authored by Chet Brandon, a global Environmental, Health & Safety (EHS) and operational risk leader, and Fay Feeney, an expert in board governance and enterprise risk oversight.


Introduction: From Awareness to Execution

In Parts 1 and 2, we established two critical realities: cyber risk in operational technology (OT) environments is physical risk, with consequences that can include serious injury, environmental harm, and major business disruption; and artificial intelligence is accelerating both the threat landscape and the tools available to manage it. But understanding the problem is not enough. The real challenge facing organizations today is execution—how to translate cyber-physical risk into structured, actionable practices that improve safety, resilience, and operational performance. This is where safety professionals play a central role. For decades, EHS leaders have managed complex, high-consequence risks in industrial environments, and the same disciplines that prevent catastrophic process safety events can be applied to cyber-physical threats—if organizations integrate them effectively.


The Tangible Outputs of EHS Risk Management

EHS professionals create value through structured outputs that guide decision-making and risk reduction. In the context of OT cyber risk, these outputs fall into two primary categories: pre-incident risk identification and prioritization, and post-incident resilience and recovery. Before an incident occurs, safety professionals help organizations identify and prioritize which physical assets that could be compromised by where cyber threats that are likely to impact operations, evaluate those risks in terms of safety, environmental, and business consequences, and prioritize vulnerabilities based on real-world impact. After an incident, they contribute to stabilizing operations, restoring systems safely, and embedding lessons learned into future prevention efforts. This structured approach ensures that cyber risk is managed with the same rigor as other high-consequence operational risks.


Identifying and Prioritizing Cyber-Physical Hazards

The first step in managing OT cyber risk is developing a clear understanding of where system vulnerabilities intersect with high-consequence outcomes. In industrial environments, not all cyber vulnerabilities carry equal weight. Safety professionals bring a critical lens by asking, “If this system is compromised, what happens physically?” This reframes the discussion from technical severity to operational consequence, including impacts on safety, environment, and production.

Effective risk identification begins with mapping critical assets and processes. This includes identifying safety-critical systems such as safety instrumented systems (SIS), emergency shutdown systems, key control loops, and monitoring functions that maintain process stability. It also requires understanding dependencies—how sensors, controllers, networks, and human interfaces interact to maintain safe operation. From there, organizations can identify where cyber vulnerabilities exist, including exposed network pathways, remote access points, legacy systems, or gaps in monitoring and control integrity.

Once critical systems and vulnerabilities are mapped, established safety methodologies can be expanded to include cyber scenarios. Process Hazards Analysis (PHA) and Hazard and Operability Study (HAZOP) studies can be adapted to evaluate how manipulated inputs, false sensor readings, or disabled alarms could drive deviations in process conditions. Teams can explore scenarios such as incorrect temperature signals, overridden interlocks, or delayed shutdown responses, asking how these deviations could propagate through the system. Failure Modes and Effects Analysis (FMEA) can then be used to identify specific failure modes associated with loss of system integrity—such as loss of control, incorrect system response, or delayed operator awareness—and assess their potential impact. Layers of Protection Analysis (LOPA) adds another layer by evaluating whether existing safeguards are sufficient if digital systems are compromised, particularly where multiple protections may rely on shared infrastructure.

Prioritization is where safety professionals add the greatest value. Rather than treating all vulnerabilities equally, risks are ranked based on severity of consequence, likelihood of occurrence, and effectiveness of existing controls. High-priority risks are those where a cyber event could lead to serious injury or fatality, major environmental impact, or significant business interruption—especially where safeguards may be degraded or insufficient. This risk-based approach ensures that mitigation efforts are focused on the vulnerabilities that matter most.

The resulting risk profile becomes a powerful management tool for driving timely and measurable improvement. Rather than remaining a static assessment, it should be actively used to guide decision-making, resource allocation, and performance monitoring. High-risk scenarios identified in the profile can be translated into targeted mitigation actions, such as isolating critical systems, strengthening access controls, improving alarm validation, or enhancing manual backup capabilities. Each action should be assigned ownership, timelines, and expected outcomes.

To ensure progress, organizations should align the risk profile with key risk indicators (KRIs) and performance metrics. These may include reduction in exposure of safety-critical systems, closure rates for high-risk vulnerabilities, improvements in detection and response times, and completion of resilience testing. Regular integration of the risk profile into management reviews ensures that priorities evolve with changing threats and system conditions. By using the risk profile as a living tool, organizations move beyond identifying risk to actively reducing it in a structured and measurable way.


Partnering with Cybersecurity and Process Control Teams

Effectively managing cyber-physical risk requires close partnership between EHS professionals, cybersecurity specialists, and process control engineers. Chief Information Security Officers (CISOs) are a key sponsor for these efforts and help translate findings into information for the executive leaders, and provide executive oversight. No single function has complete visibility into how operational technology systems are designed, how they can fail, or how they may be targeted. Safety professionals bring deep expertise in consequence analysis and risk prioritization, but must collaborate with those who understand system architecture, network design, and control logic to fully assess exposure.

Cybersecurity teams provide insight into threat vectors, access pathways, and system vulnerabilities, while process control and engineering teams contribute a detailed understanding of control system architecture, instrumentation, interlocks, and operating limits. Together, these perspectives allow organizations to map how a malicious attack could move through systems and ultimately impact physical operations.

This collaboration is essential for identifying realistic failure modes triggered by cyber events and evaluating their severity. It also enables more effective prioritization by focusing attention on vulnerabilities with the greatest potential to impact safety, environment, and business continuity. Integrating these perspectives creates a more complete understanding of cyber-physical risk and improves both prevention and resilience strategies.


Protecting Safety-Critical Control Systems

Once high-risk scenarios are identified, the next step is ensuring that critical control systems remain reliable under all conditions—including during a cyber event. In industrial environments, systems such as safety instrumented systems (SIS), emergency shutdown systems, alarm systems, and key control loops are essential to maintaining safe operations. EHS professionals help define which systems are safety-critical, what level of independence is required, and how failures could impact operations.

A foundational step is defining and validating system independence. Safety-critical systems should be architected to operate independently from primary control systems and broader networks wherever possible. This includes physical and logical separation, dedicated controllers, and minimized shared infrastructure to reduce common-cause failure risk.

Network segmentation and controlled connectivity are essential. Critical systems should be isolated within defined security zones with tightly controlled access. Remote access must be limited, monitored, and governed through strict controls to reduce exposure.

Ensuring instrumentation and signal integrity is another key focus. Redundant instrumentation, validation logic, and cross-checking of critical measurements help detect abnormal conditions even if one source is compromised.

Alarm system reliability must be maintained through rationalization, verification, and testing. In cyber scenarios, well-designed alarm systems improve the likelihood that operators will recognize abnormal conditions.

Manual override capability provides an essential layer of protection. Systems should allow operators to intervene safely when automation is unreliable, supported by clear procedures, training and routine hands on practice.

Routine testing and validation of safety functions, including cyber-informed scenarios, ensures safeguards perform as expected. Finally, strong configuration management and change control prevents unauthorized or unintended modifications that could introduce vulnerabilities.

The objective is clear: even if cyber systems are compromised, critical safety functions must continue to operate as intended. At a minimum, segregating OT risk for operational continuity makes shutting down the entire operations unnecessary and allows businesses to operate.


Building Cyber-Aware Process Safety Programs

Cyber-physical risk cannot be managed in isolation; it must be embedded into existing safety systems. This requires evolving traditional process safety programs to explicitly include digital threat scenarios. Cyber risks should be incorporated into hazard analyses, management of change processes should evaluate digital modifications, and operational procedures should reflect potential cyber-driven abnormal conditions. Training programs must also prepare operators to recognize and respond to anomalies that may originate from compromised systems.

A practical way to do this is by integrating cyber scenarios directly into Process Hazard Analyses (PHAs) through targeted “what if” questions that connect digital failure to physical consequence. For example: What if a sensor provides false data due to cyber manipulation? What if control logic is altered or overridden? What if alarms are suppressed or delayed? What if remote access is gained through a vendor or compromised credentials? What if operators lose visibility into critical process conditions? Framing cyber risk in this way allows teams to evaluate how these events could drive process deviations, challenge existing safeguards, and ultimately impact safety, the environment, and operations—bringing cyber risk into the core of process safety decision-making.


Strengthening Operational Resilience

Even with strong preventive controls, organizations must assume that disruptions will occur. Operational resilience is defined by the ability to anticipate disruptions, maintain safe operations under abnormal conditions, and recover quickly from incidents. Safety professionals strengthen resilience through layered protections, structured response frameworks, and disciplined operational procedures.

A critical—and often underappreciated—capability safety professionals bring is the design and execution of emergency shutdown and response strategies. In high-hazard industries, EHS leaders routinely define when and how to transition systems to a safe state under rapidly evolving conditions. This includes establishing clear criteria for initiating shutdowns, ensuring that shutdown systems are independent and reliable, and developing procedures that enable operators to act decisively when system integrity is uncertain.

In cyber-physical events, this capability becomes essential. When control systems are compromised, data integrity is questionable, or system behavior becomes unpredictable, the ability to execute a safe, controlled shutdown may be the most effective way to prevent escalation. Safety professionals ensure that these actions are not improvised—they are pre-defined, tested, and supported by clear decision authority and operator training.

Beyond shutdown, EHS professionals also design incident response structures that coordinate actions across operations, cybersecurity, engineering, and leadership. This includes incident command frameworks, escalation protocols, and communication strategies that maintain situational awareness and enable timely decision-making under uncertainty.

Resilience, therefore, is not just about keeping systems running—it is about knowing when and how to safely stop them, stabilize conditions, and recover without introducing additional risk. This is a core competency of safety professionals and a critical component of managing cyber-physical threats in modern industrial environments.


Business Continuity Planning for Industrial Cyber Events

Traditional business continuity planning often focuses on IT recovery, but industrial environments require restoration of safe operational control. EHS professionals help define shutdown procedures, restart protocols, alternative operating modes, and safety measures for personnel. Effective plans include validated recovery procedures, coordination across functions, and structured communication strategies to ensure recovery is both efficient and safe.


Enabling Visibility Through Metrics and Dashboards

Managing OT cyber risk requires clear visibility. Dashboards and key risk indicators translate technical data into operational insight, helping leadership understand exposure, performance, and improvement over time. Metrics such as asset visibility, vulnerability status, detection performance, and resilience readiness provide actionable information that supports decision-making and continuous improvement.


Applying the Framework: A Practical Assessment and Action Tool

To translate these principles into consistent execution, organizations should use a structured risk assessment process. The concept is demonstrated in this OT Cyber-Physical Risk Assessment and Action Form. This tool enables cross-functional teams—EHS, cybersecurity, and engineering—to work through a standardized process for identifying hazards, evaluating consequences, assessing safeguards, and defining actions.

The form guides teams through key steps, including identifying safety-critical systems, defining cyber-driven failure modes and physical outcomes, evaluating safeguard independence and effectiveness, assessing vulnerabilities and access pathways, and prioritizing risks using severity, likelihood, and control effectiveness. It also drives accountability by requiring defined actions, owners, timelines, and expected risk reduction outcomes.

Importantly, the form supports measurable improvement by linking actions to key risk indicators and performance metrics. When used in workshops or site-level assessments, it helps organizations move beyond discussion to clear, documented actions that improve both security and operational reliability over time.


Integrating People, Process, and Technology

Effective OT cyber risk management requires integration across people, process, and technology. Operators, engineers, cybersecurity professionals, and safety leaders must work together within structured risk management systems supported by appropriate technology, as no single function can manage this risk in isolation. Safety professionals serve as integrators—translating cybersecurity insights into operational implications, connecting them with real-world conditions, and ensuring that risk management approaches align with how work is actually performed in industrial environments.


Conclusion: Turning Risk Insight into Action

Managing cyber risk in operational technology environments requires more than awareness—it requires disciplined execution of systems. By applying proven approaches from process safety and operational risk management, organizations can identify critical vulnerabilities, protect key systems, strengthen resilience, and improve recovery capability. This enables a shift from reacting to cyber threats to actively managing cyber-physical risk as part of core operations.


Looking Ahead to Part 4

In Part 4, we move to the boardroom—examining how directors and executives govern cyber-physical risk and ensure organizations have the capabilities needed to manage this evolving threat.


Addendum – Key Risk Assessment Tools

  • PHA – Process Hazard Analysis, a structured study to identify and evaluate hazards in a process that could harm people, property, or the environment.
  • HAZOP – Hazard and Operability Study, a systematic examination of a process to identify potential hazards and operability problems due to deviations from design intent.
  • FMEA – Failure Mode and Effects Analysis, a step‑by‑step method to identify possible failure modes in a system, assess their effects, and prioritize actions to mitigate them.
  • LOPA – Layers of Protection Analysis, a semi‑quantitative method to evaluate whether existing independent protection layers are sufficient to reduce risk for specific hazardous scenarios.
Posted in Uncategorized | Tagged , , , , , , , , , , , , , , | Leave a comment

Automated Reasoning for Human Error Detection in Industrial Operations

Bridging logic, human performance, and operational resilience

Diagram showing common human error pathways including alarm overload, disorientation, procedure violation, miscommunication, fatigue, documentation error, input error, equipment failure, cognitive bias, and lack of training
Common pathways leading to human error in operational control rooms.

Introduction: The Persistent Challenge of Human Error

Despite decades of advancement in engineering controls, automation, and safety management systems, human error remains a dominant contributor to serious incidents in industrial environments. In high-hazard sectors—chemicals, metals, energy, and advanced manufacturing—the issue is not simply that people make mistakes. It is that:

  • Systems are often designed around work-as-imagined, not work-as-done
  • Weak signals of failure are present—but not recognized in time
  • Decision-making under pressure introduces variability that systems fail to anticipate

Traditional approaches—training, procedures, and supervision—have plateaued in effectiveness. What is emerging now is a new capability:

The application of automated reasoning to detect, interpret, and respond to human error potential in real time.


The Opportunity for Industrial Organizations

Industrial operations are entering a period of increasing complexity—driven by advanced technologies, workforce transitions, and rising expectations for safety and performance.

At the same time, organizations are facing a growing challenge:
the loss of deeply experienced professionals who have historically served as the primary line of defense against human error.

What is being lost is not just knowledge—but the ability to recognize when conditions are aligning for failure.

The strategic question for leaders is this: How do we preserve and scale that judgment across the organization—consistently, in real time, and at global scale?

Automated reasoning represents a critical step in that evolution, enabling organizations to move from reacting to errors to understanding and managing the conditions that create them.


What Is Automated Reasoning in This Context?

Automated reasoning is the use of formal logic, structured knowledge, and inference engines to derive conclusions from known conditions.

In the context of human error detection, it moves beyond simple monitoring to answer:

  • Given the conditions, what errors are likely?
  • Are the safeguards sufficient for this specific situation?
  • What is the most probable failure pathway right now?

This is fundamentally different from traditional analytics.

Traditional SystemsAutomated Reasoning Systems
Detect anomaliesExplain why they matter
Monitor conditionsInterpret risk implications
Trigger alarmsEvaluate decision quality and context

While automated reasoning is often grouped under the broader umbrella of artificial intelligence, it represents a fundamentally different capability. Most AI—particularly machine learning—focuses on identifying patterns and making predictions based on data. Automated reasoning, by contrast, applies explicit logic and structured rules to determine cause-and-effect relationships and draw explainable conclusions. In practical terms, AI can tell you something is changing or likely to happen, while automated reasoning explains why it matters, how it could lead to failure, and what should be done about it. This distinction is critical in industrial settings, where transparency, consistency, and defensible decision-making are essential.

The Core Shift: From Data to Logic-Based Insight

Most industrial AI deployments today rely heavily on pattern recognition—identifying deviations in process variables, behaviors, or outcomes.

Automated reasoning introduces a critical layer:

It applies structured logic to determine whether current conditions create a credible pathway to human error.

This includes reasoning across:

  • Task complexity
  • Environmental conditions
  • Time pressure
  • Procedure alignment vs. actual execution
  • Worker capability and experience
  • System fragility and safeguards

While AI-driven pattern recognition is powerful, it is not sufficient on its own to address human error in complex industrial systems. AI can identify anomalies and correlations, but it does not inherently understand cause-and-effect relationships or the operational significance of what it detects. In many cases, this results in signals without clarity—highlighting that something is different, but not whether it meaningfully increases risk or requires intervention. Human error, however, is rarely driven by isolated data points; it emerges from the interaction of conditions, constraints, and system design. Without a layer of structured reasoning, organizations risk reacting to noise or missing the deeper risk pathways entirely.

A more effective path forward lies in combining AI’s ability to detect patterns with automated reasoning’s ability to interpret them—creating a system that not only sees change, but understands its implications. This integrated approach will be explored further later in the article.


A Parallel from Industrial Controls: Automated Reasoning as a Supervisory Safety Channel for AI

One useful way to understand the role of automated reasoning in AI-enabled safety systems is through a concept familiar to industrial practitioners: the use of independent supervisory protection layers in control system design. In modern industrial control architectures—particularly in high-integrity control systems (such as programmable logic controllers) where independent supervisory channels are used to validate safety-critical functions are involved—organizations often add a second channel of supervision to monitor critical functions, validate outputs, and intervene when the primary control path begins to drift, degrade, or behave unexpectedly. This is not done because the primary system is assumed to fail routinely, but because high-consequence systems require independent oversight.

The same principle applies to AI. In process operations, a primary control layer may regulate normal operation, while an independent supervisory or safety layer checks whether signals are plausible, logic outputs remain within safe bounds, conflicting states exist, or hazardous conditions may be developing despite “normal” outputs. The principle is simple but foundational: do not rely on a single decision channel where the consequences of failure are unacceptable. That same principle has direct relevance for intelligent systems.

AI systems—particularly those based on probabilistic models—can identify patterns, generate recommendations, and support decisions at remarkable speed. Yet they can also misinterpret context, produce non-causal correlations, drift from intended behavior, or in some cases generate hallucinated conclusions. This is where automated reasoning can serve a role analogous to an independent safety channel. Rather than replacing AI, it supervises it—testing whether recommendations make logical sense, whether they align with established constraints, whether they violate known risk rules, or whether outputs imply credible failure pathways before action is taken.

In this model, AI proposes, automated reasoning verifies, and humans decide. That mirrors familiar thinking in high-reliability operations. Just as a basic process control system may govern operations while an independent safety instrumented system provides protection, AI can serve as a primary analytical channel while automated reasoning acts as a supervisory validation layer. The role of automated reasoning, in this sense, is not simply optimization—it is error trapping.

This concept has significant implications for human error reduction. If AI identifies an abnormal condition, flags a procedural deviation, or predicts elevated human error likelihood, automated reasoning can independently assess whether the inference is justified, whether causal conditions are truly present, and whether the recommended intervention is appropriate. That creates protection against two risks at once: human error in operations and decision error within the AI itself.

Viewed this way, resilient intelligent systems should be designed much like resilient industrial systems—with layered protections rather than reliance on a single channel of cognition. Pattern detection, logical supervision, and human judgment each serve a different but complementary role. This is not merely AI with safeguards added; it is high-reliability design thinking applied to intelligent systems.

For decades, process safety has taught a simple lesson: independent layers of protection matter. The same lesson applies to intelligent systems. As AI becomes increasingly embedded in safety-critical decisions, automated reasoning may serve as the supervisory channel that helps keep those systems trustworthy, explainable, and resilient. And that may prove to be one of the most important bridges between traditional safety engineering and the future of AI-enabled risk management.


Capturing Context: The Critical Enabler of Effective Automated Reasoning

One of the most important—and often misunderstood—aspects of automated reasoning is how context is captured, structured, and interpreted.

Without context, even the most advanced reasoning engine becomes little more than a rules engine applying generic logic. With context, it becomes something far more powerful:

A system that understands not just what is happening—but what it means under the current conditions.


Why Context Matters in Human Error Detection

Human error does not occur in isolation. It emerges from the interaction between:

  • The task being performed
  • The conditions under which it is performed
  • The capabilities and state of the individual
  • The design and resilience of the system

The same task can be:

  • Low risk in one context
  • High risk in another

Example:
Opening a valve:

  • Routine, low risk during normal operations
  • High risk during startup, under time pressure, with similar valve configurations

👉 The difference is not the task—it is the context


How Automated Reasoning Captures Context

Automated reasoning systems structure context into four primary dimensions, each representing a different aspect of how work is actually performed.

1. Task Context (What is being done)

This dimension defines the inherent characteristics of the work itself and how prone it is to error under normal conditions.

  • Routine vs. non-routine work
  • Task complexity and step count
  • Precision required
  • Known failure modes

👉 How inherently error-prone is this task?


2. Operational Context (Under what conditions)

This dimension captures the external pressures and environmental conditions that influence how the task is executed.

  • Time pressure and production demand
  • Shift timing (night vs. day)
  • Environmental factors (heat, noise, visibility)
  • Concurrent work and distractions

👉 What external pressures are influencing performance?


3. Human Context (Who is performing the work)

This dimension focuses on the capabilities and current state of the individual or team performing the task.

  • Experience and familiarity
  • Fatigue and cognitive load
  • Interruptions and task switching
  • Supervision and crew dynamics

👉 How capable is the system—through people—right now?


4. System Context (How well the system supports the work)

This dimension evaluates how effectively the system design, procedures, and safeguards support successful execution.

  • Procedure quality and usability
  • Equipment design (e.g., similarity, ergonomics)
  • Safeguards and interlocks
  • Prior incidents or near misses

👉 Is the system enabling success—or inviting failure?


From Context to Insight: The Reasoning Layer

Once context is captured, automated reasoning evaluates how these dimensions interact, rather than treating them as isolated inputs.

Example interaction:

  • Moderate complexity
  • Moderate fatigue
  • Moderate time pressure

Individually → acceptable
Combined → elevated risk of error

Context is not additive—it is interactive

This mirrors what experienced EHS leaders do intuitively—recognizing when normal conditions combine into abnormal risk.


Context as a Dynamic Input (Not a Static Snapshot)

Context in industrial operations is not fixed; it evolves continuously as work progresses and conditions change.

  • A task interruption occurs
  • A supervisor leaves the area
  • Process conditions drift
  • Time pressure increases

The system recalculates in real time:

“Given the updated context, what is the new error likelihood?”


Practical Sources of Context Data

In practice, context is assembled from multiple digital and operational systems across the enterprise.

  • Digital permit-to-work systems
  • DCS / SCADA process data
  • EHS platforms and incident history
  • Training and workforce systems
  • Wearables and environmental sensors
  • Production schedules and constraints

The value is not the data alone—but how it is structured for reasoning


The Limitation: Context vs. Human Judgment

Even with advanced data capture, automated reasoning cannot fully replicate the richness of human situational awareness and experience.

  • Subtle hesitation or uncertainty
  • Team dynamics and trust
  • Intuition developed through experience
  • The sense that “something isn’t right”

These remain distinctly human capabilities.


Strategic Implication

The effectiveness of automated reasoning is directly dependent on how accurately and completely context is captured and represented.

The effectiveness of automated reasoning is directly tied to the quality of context it receives.

This shifts the leadership question from:

  • “How good is the algorithm?”

to:

  • “How well do we capture work as it is actually performed?”

How Automated Reasoning Detects Human Error Potential

1. Contextual Rule Evaluation (Beyond Static Procedures)

Automated reasoning evaluates whether procedures and expectations align with actual working conditions in real time.

  • Procedures fit current conditions
  • Workers are likely to deviate
  • System design forces adaptation

2. Weak Signal Amplification

The system connects small, seemingly insignificant signals into a coherent picture of emerging risk.

  • Minor deviations
  • Workarounds
  • Near misses

3. Error-Likely Situation Modeling

Human performance frameworks are applied to determine which types of errors are most likely under current conditions.

  • Skill-based errors
  • Rule-based errors
  • Knowledge-based errors

4. Dynamic Safeguard Evaluation

Safeguards are continuously evaluated for their presence, effectiveness, and likelihood of proper use.

  • Availability
  • Functionality
  • Likelihood of correct use

5. Decision Quality Assessment

The system evaluates the conditions under which decisions are being made to identify degradation in decision quality.

  • Cognitive overload
  • Bias patterns
  • Goal conflict
  • Degraded situational awareness

Integration with Industrial Systems

Automated reasoning does not function in isolation; it is integrated into the broader digital and operational ecosystem.

  • Process control systems (DCS/SCADA)
  • EHS management platforms
  • Permit-to-work systems
  • Wearables and monitoring tools
  • Incident and near-miss databases

This enables reasoning across both:

  • Technical system state
  • Human system state

A Practical Example: Chemical Reactor Operation

In a typical industrial scenario, automated reasoning evaluates multiple contextual inputs to identify emerging human error risk.

Inputs:

  • Inexperienced operator
  • Non-routine process conditions
  • Prior deviations
  • High time pressure

Output:

  • Elevated knowledge-based error risk
  • Likely parameter misinterpretation
  • Insufficient safeguards

Recommended actions:

  • Add second operator verification
  • Reduce ramp rate
  • Increase supervision

Benefits for Industrial Organizations

Proactive Risk Identification

Automated reasoning enables organizations to anticipate failure before it occurs by identifying error-likely conditions in advance.

  • Shift from “What went wrong?”
  • To “What is likely to go wrong next?”

Consistency Across Operations

It standardizes decision-making logic while still adapting to local context and conditions.

  • Standardized logic
  • Contextual flexibility

Scalable Expertise

The knowledge and judgment of experienced professionals can be embedded and applied consistently across the organization.

  • Captures expert reasoning
  • Distributes it across sites

Stronger Governance

Automated reasoning provides transparency and defensibility in risk-based decision-making.

  • Clear logic pathways
  • Audit-ready decisions

Limitations

Automated reasoning enhances decision-making but does not replace the uniquely human aspects of safety leadership and performance. It can identify error-likely conditions, but it cannot fully influence outcomes in real-world operations.

Trust-Building

Trust is the foundation of safety performance, built through credibility and relationships over time.

  • Enables open reporting of risks and weak signals
  • Drives whether people act on guidance

Automated reasoning can recommend actions, but it cannot build trust or credibility.

Without trust, even the right answer may not be followed


Real-Time Influence

Safety often depends on influencing decisions in the moment under pressure.

  • Challenging shortcuts
  • Reinforcing the right priorities

Automated reasoning can identify risk, but it cannot persuade, adapt messaging, or manage resistance.

Being right is not enough—impact requires influence


Deep Intuition

Experience creates an intuitive sense of risk beyond formal data.

  • Recognizing subtle inconsistencies
  • Acting on incomplete signals

Automated reasoning lacks this depth of experiential judgment and the ability to act confidently in ambiguity.

Intuition bridges the gap between data and reality


Cultural Awareness

Organizational culture shapes how work is actually performed.

  • Informal norms vs. formal procedures
  • Willingness to speak up or deviate

Automated reasoning struggles to fully capture these dynamics.

Culture determines how work really gets done


Closing Insight on Limitations

Automated reasoning can identify when conditions are right for failure.

But only people can:

  • Build trust
  • Influence decisions
  • Interpret nuance
  • Shape culture

That is where safety ultimately succeeds—or fails.


The Future: Human + Reasoning Systems

The most effective safety systems will not rely on a single form of intelligence. Instead, they will combine complementary capabilities into a cohesive system that enhances both insight and action.

At its core, this model integrates three distinct but interdependent strengths:

Machine Learning → Pattern Detection

Machine learning excels at identifying patterns across large, complex datasets that are not visible through traditional analysis.

  • Detects anomalies, trends, and weak signals
  • Identifies correlations across operations, time, and conditions
  • Continuously improves as more data becomes available

Its strength lies in answering:

“What is changing, and where should we pay attention?”


Automated Reasoning → Logical Interpretation

Automated reasoning translates data and conditions into structured, explainable conclusions about risk.

  • Evaluates cause-and-effect relationships
  • Assesses whether conditions create credible failure pathways
  • Provides transparent, auditable logic for decisions

Its role is to answer:

“Given these conditions, what does it mean—and what is likely to happen next?”


Human Expertise → Contextual Judgment

Human expertise brings the ability to interpret nuance, adapt to ambiguity, and influence outcomes in real time.

  • Applies experience, intuition, and situational awareness
  • Adjusts decisions based on context not fully captured in data
  • Builds trust and drives action across the organization

This is the layer that answers:

“What should we do—and how do we ensure it actually happens?”


How These Capabilities Work Together

Individually, each capability is powerful but incomplete. Together, they create a system that is both analytically strong and operationally effective.

  • Machine learning identifies emerging signals
  • Automated reasoning explains risk pathways and implications
  • Human expertise ensures the right decisions are made and executed

Insight without interpretation creates noise.
Interpretation without action creates delay.
Action without insight creates risk.

The integration of all three closes that gap.


Strategic Implication

This combined model represents a shift from fragmented tools to integrated decision systems.

Organizations that embrace this approach will:

  • Detect risk earlier
  • Understand it more clearly
  • Act on it more effectively

Closing Insight on Human + Machine Synergies

The future of safety is not human or machine—it is human with machine.

When pattern recognition, logical reasoning, and human judgment operate together, safety performance moves from reactive to truly predictive and resilient.


Final Perspective

Automated reasoning represents a fundamental shift in how industrial organizations approach human error.

From reacting to failure → to understanding the conditions that make failure likely

When fueled by rich context, it enables organizations to:

  • Anticipate human error
  • Strengthen system resilience
  • Scale expertise across operations

And ultimately:

Create systems that actively support people in making the right decisions—especially when it matters most.

References:

Pavlus, J., 2025. Amazon takes on AI’s biggest nightmare: Hallucinations. Fast Company, 4 December. Available at: Fast Company article

Williams, J.C., 1985. HEART- A proposed method for achieving high reliability in process operation by means of human factors engineering technology. In Proceedings of a Symposium on the Achievement of Reliability in Operating Plant, Safety and Reliability Society (Vol. 16), pp.5/1-5/15.

Posted in Uncategorized | Tagged , , , , , , , , , | Leave a comment

Cyber-Physical Risk in the Age of AI: How Safety Leaders and Boards Can Protect Operational Technology – A Four Part Series

Workers monitoring screens titled OT CYBER RISK: DUAL AI ROLES, AI DEFENSE SYSTEM, and AI ATTACK SIMULATION.
Two industrial workers monitor a high-stakes simulation showcasing the dual roles of AI in cybersecurity defense and attack.

Chet Brandon and Fay Feeney

Series Introduction

This four-part series examines how artificial intelligence is reshaping cyber risk in operational technology and what it means for industrial organizations. It brings together perspectives from safety leadership, cybersecurity, operations, and board governance to address cyber-physical risk as an enterprise issue. The series is co-authored by Chet Brandon, a global EHS and operational risk leader with deep experience in highly automated industrial environments, and Fay Feeney, an expert in board governance and enterprise risk oversight. Together, they connect plant-level realities with boardroom decision-making to provide practical strategies that strengthen resilience, protect operations, and improve risk-informed leadership.

Part 2 – AI and the Cyber Battlefield for Operational Technology

How Artificial Intelligence Is Reshaping Threats, Defense, and Governance


Introduction: AI Is Changing the Nature of Cyber Risk

Artificial intelligence is fundamentally altering the cyber risk landscape—and nowhere is this more consequential than in operational technology (OT) environments.

Historically, cyber attacks against industrial systems required deep expertise in both information technology and industrial control systems. Today, AI is lowering that barrier. It is enabling faster reconnaissance, more precise targeting, and increasingly automated attack execution. At the same time, AI is providing organizations with new tools to detect threats earlier, respond faster, and strengthen operational resilience.

This dual reality defines the modern cyber battlefield: AI is both amplifying the threat and transforming the defense. It has elevated cyber risk oversight from periodic review to dynamic, data-driven foresight

For industrial organizations, the implications are clear. Cyber risk is no longer just a technical issue—it is an operational, safety, and strategic risk that must be actively managed across the enterprise.


How AI Is Increasing Cyber Threats to Operational Technology

AI is accelerating the scale, speed, and sophistication of cyber attacks in ways that directly impact industrial operations.

One of the most significant shifts is in automated vulnerability discovery. AI-driven tools can rapidly scan complex networks, identify exposed assets, and detect weaknesses in system configurations. In OT environments—where systems are often interconnected, legacy-based, and difficult to patch—this creates a larger and more accessible attack surface. We have been managing the internet of things (IoT) and their security challenge since at least the mid‑2000s, and as a chronic enterprise governance issue since the early 2010s.

AI is also enabling more effective targeting of industrial systems. By analyzing system architectures, communication patterns, and process data, attackers can identify which assets are most critical to operations. This allows them to focus on high-impact targets such as distributed control systems, programmable logic controllers, and safety-related systems.

Another key development is the rise of automated attack development. AI can assist in generating exploit code, refining attack pathways, and adapting strategies in real time based on system responses. This reduces the time and expertise required to launch sophisticated attacks.

Another critical—and often underestimated—dimension is the rise of AI-enabled social engineering. Attackers are increasingly using AI to craft highly targeted phishing campaigns, impersonate trusted personnel, and exploit human behavior to gain initial access to systems. Social engineering now represents the primary entry point for most cyber incidents, with the majority of breaches involving human interaction rather than direct technical exploitation. In industrial environments—where operators, engineers, and third-party vendors interact across IT and OT systems—these attacks can provide a pathway into otherwise well-protected networks, making the human element a critical component of cyber-physical risk. This expands the attack surface beyond technology to include people and processes—reinforcing that cyber-physical risk is not purely a technical challenge, but one that requires integration of cybersecurity, operational discipline, and human performance.

These capabilities are particularly concerning when combined with the resources of nation-state actors and organized cyber groups. AI allows these actors to coordinate multi-stage campaigns that move from initial access to operational disruption more efficiently than ever before.

The result is a new category of risk: cyber attacks that are not just disruptive to data, but capable of causing real economic damage and physical consequences.


From Cyber Intrusion to Physical Impact

The defining characteristic of OT cyber risk is its ability to cross from the digital world into the physical world.

AI-enabled attacks can manipulate process conditions, interfere with control logic, or degrade system visibility in ways that directly impact how industrial systems behave. In highly automated environments, even small changes to sensor inputs, control parameters, control logic, or alarm functions can create cascading effects across interconnected processes. These disruptions often occur without immediate detection, allowing abnormal conditions to develop before operators can intervene.

In practice, this can lead to a range of high-consequence outcomes:

  • Unstable operating conditions
    Manipulation of setpoints, sensor readings, or control loops can push processes outside of safe operating limits. For example, false temperature or pressure signals may cause systems to overcompensate, resulting in oscillations, loss of control stability, or drift into unsafe process states.
  • Equipment damage
    Altered control logic or delayed shutdown responses can expose equipment to conditions beyond design tolerances. Overpressure, overheating, improper sequencing, or mechanical overstress can degrade or permanently damage critical assets such as reactors, compressors, turbines, or rotating equipment.
  • Production shutdowns
    Loss of control system integrity or uncertainty about system status often forces operators to initiate precautionary shutdowns. In some cases, automated trips or interlocks may activate unexpectedly, halting production. Restarting complex industrial systems can be time-consuming and requires careful validation to ensure safe conditions.
  • Environmental releases
    Disruptions to process control or safety systems can lead to loss of containment of hazardous materials. This may include uncontrolled emissions, leaks, or spills, particularly if detection systems or alarms are compromised or delayed.
  • Serious injury or fatality (SIF) risks
    The most critical consequence arises when cyber manipulation affects systems designed to protect people. Disabled alarms, altered interlocks, or incorrect process data can place workers in hazardous conditions without adequate warning, increasing the likelihood of severe incidents.

Unlike traditional IT incidents, these outcomes unfold in real time within physical systems and often under conditions of incomplete or misleading information. Operators may be forced to make rapid decisions without full visibility into system status, increasing the complexity and risk of response actions. This dynamic reinforces the need for integrated approaches that combine cybersecurity, process safety, and operational discipline to manage cyber-physical threats effectively.

AI increases this risk by enabling attackers to better understand how industrial processes operate—and how to disrupt them in ways that maximize impact.

For organizations, this reinforces a critical point: cybersecurity in OT environments is fundamentally about operational integrity and safety.


How AI Can Strengthen Defense in OT Environments

While AI is increasing the threat, it is also providing powerful tools for defense.

One of the most impactful applications is advanced anomaly detection. AI systems can analyze large volumes of operational data—sensor readings, control system outputs, and network activity—to identify subtle deviations from normal behavior. These deviations may indicate early-stage cyber activity or manipulation of system conditions.

This capability is particularly valuable in OT environments, where traditional signature-based detection methods are often insufficient.

AI also enhances vulnerability management and risk prioritization. By correlating asset criticality, system exposure, and threat intelligence, AI can help organizations identify which vulnerabilities pose the greatest operational risk. This enables more focused and effective mitigation efforts.

In addition, AI supports continuous monitoring of system integrity, helping organizations detect changes in control logic, unauthorized access attempts, or abnormal communication patterns.

These capabilities shift cybersecurity from a reactive posture to a more predictive and proactive model.


The First Documented AI-Enabled Attack on OT: A Warning Signal for Industry

A recent Dark Reading article: World’s First AI-Driven Cyberattack Couldn’t Breach OT Systems, reinforces the dual nature of AI in OT cyber risk. It described an AI-driven cyberattack attempt where adversaries reportedly used AI to support targeting and exploitation, yet still failed to breach the OT environment. That distinction matters. AI may increase the speed, sophistication, and adaptability of attacks, but strong OT fundamentals—segmentation, access control, asset visibility, monitoring, and response discipline—can still prevent a digital intrusion from becoming a physical operational event.


AI and the Future of Incident Response and Recovery

AI is also transforming how organizations respond to and recover from cyber incidents.

In the event of a disruption, AI can assist in rapid analysis of system data, helping teams identify the scope of an intrusion, isolate affected systems, and determine appropriate response actions. This reduces the time required to stabilize operations. As we’ve recently seen with a Hasbro cyberattack, it provides segregation of risk to only take impacted operations offline. This allows business to work around the affected portion of the system.

AI can also support scenario modeling and simulation. By using digital models of industrial systems, organizations can simulate how cyber incidents might unfold and test response strategies in advance. This strengthens both emergency preparedness and business continuity planning.

During recovery, AI can help guide the safe restoration of operations by analyzing system conditions, verifying configurations, and identifying potential risks associated with restart activities.

In complex industrial environments, where recovery must be carefully managed to avoid additional hazards, this capability is especially valuable.


Enabling Better Decision-Making at the Board Level

AI is not only transforming operations—it is also enhancing how organizations govern cyber risk.

Boards of directors are increasingly expected to oversee cyber-physical risk as a core enterprise issue. To do this effectively, they need clear, actionable insights that connect technical risk indicators to business outcomes.

AI can support this by aggregating data from across the organization—OT systems, cybersecurity platforms, operational metrics, and risk assessments—and translating it into decision-ready information.

AI-enabled dashboards can provide visibility into:

  • asset criticality and exposure
  • vulnerability trends
  • incident detection performance
  • resilience testing outcomes
  • potential business impact scenarios

AI can also support scenario analysis, helping boards understand how different types of cyber incidents could affect operations, safety, and financial performance.

This allows directors to make more informed decisions about risk appetite, resource allocation, and strategic priorities.

Importantly, AI enhances governance—it does not replace it. Effective oversight still depends on informed judgment, director expertise, strong leadership, and alignment between operational realities and strategic decision-making.

We will delve deeper in to the Board level actions to control cyber risk in Operational Technology in Part 4 of this series.


Integrating AI into a Cyber-Physical Risk Strategy

The true value of AI emerges when it is integrated into a broader risk management framework.

The introduction of AI into operations is happening faster than organizations change yet today they can focus on:

  • Aligning AI tools with operational risk priorities
    Establish risk-based use cases by mapping AI applications to high-consequence operational scenarios (e.g., SIF exposure, critical asset failure, business interruption), ensuring AI investments target the most impactful risks.
  • Integrating cybersecurity with safety and process risk management
    Embed cyber threat scenarios into existing safety frameworks such as PHA, HAZOP, and LOPA, and create cross-functional teams that jointly assess cyber-physical risks and define coordinated mitigation strategies.
  • Ensuring data quality and system visibility
    Develop a unified data architecture that integrates OT, IT, and safety system data, and implement data governance practices that ensure accuracy, completeness, and real-time visibility into critical operational conditions.
  • Establishing governance structures for AI use
    Define clear accountability for AI deployment, validation, and monitoring through formal governance processes aligned with enterprise risk management and board oversight expectations.
  • Maintaining human oversight and decision-making authority
    Implement human-in-the-loop controls for AI-driven insights, ensuring that critical operational and safety decisions are reviewed and validated by qualified personnel before execution.

Safety professionals, cybersecurity experts, and operational leaders must work together to ensure that AI-driven insights are translated into practical actions that enhance system safety, strengthen resilience, and reduce operational risk.


Conclusion: Navigating the AI-Driven Risk Landscape

Artificial intelligence is redefining both sides of the cyber risk equation. It is enabling more sophisticated attacks on operational technology, while also providing new capabilities to defend, detect, and recover from those threats.

For industrial organizations, the challenge is not simply to adopt AI, but to apply it effectively within the context of operational risk, safety, and governance.

Those that succeed will be organizations that:

  • understand the physical consequences of cyber threats
  • leverage AI to enhance visibility and decision-making
  • integrate safety, cybersecurity, and operational resilience
  • align plant-level insights with board-level oversight

In the next part of this series, we move from strategy to execution—examining how safety professionals and EHS leaders can operationalize these concepts through structured risk management practices, metrics, and systems that support effective decision-making.

Looking Ahead to Part 3

AI is reshaping both cyber threats and defenses—but technology alone is not enough. The real value comes from how organizations integrate these capabilities into their risk management systems.

In Part 3, we move to execution—showing how safety professionals translate cyber-physical risk into practical frameworks, metrics, and actions that strengthen protection and resilience.

References:

Nelson, N. (2026, May 7). World’s first AI-driven cyberattack couldn’t breach OT systems. Dark Reading. Web:https://www.darkreading.com/ics-ot-security/worlds-first-ai-driven-cyberattack-couldnt-breach-ot-systems

Posted in AI, enterprise risk management, Technical Skills | Tagged , , , , , , , , , , , , , , , , , , | Leave a comment

Cyber-Physical Risk in the Age of AI: How Safety Leaders and Boards Can Protect Operational Technology

Chet Brandon and Fay Feeney

Series Introduction: Managing Cyber-Physical Risk in the Age of AI

Industrial organizations are entering a new era of risk—one defined by the convergence of artificial intelligence, cybersecurity, and operational technology. Systems that once operated in relative isolation are now highly connected, data-driven, and increasingly automated. While this transformation is unlocking significant gains in efficiency, productivity, and decision-making, it is also exposing critical operations to a new class of cyber threats with the potential to cause real economic and physical harm.

Unlike traditional cybersecurity risks, attacks on operational technology do not stop at data loss or system downtime. They can disrupt physical processes, damage equipment, trigger environmental releases, and create conditions that lead to serious injury or fatality. As AI accelerates both the scale and sophistication of cyber threats, the challenge facing organizations is no longer simply one of protecting information—it is about protecting operations, people, and enterprise value.

This four-part series explores how organizations can respond to this evolving threat landscape by integrating insights from safety, cybersecurity, operations, and board governance. It reflects a critical reality: no single function can manage cyber-physical risk alone. Success requires alignment from the plant floor to the boardroom.

  • Part 1 examines how cyber risk in operational technology environments becomes safety risk, and why safety professionals play a central role in managing these threats.
  • Part 2 explores the dual role of artificial intelligence—both as a driver of more sophisticated cyber attacks and as a powerful tool for defense, resilience, and governance.
  • Part 3 provides a practical operational playbook, outlining how EHS professionals identify, prioritize, and manage OT cyber risk, including the metrics and systems that support effective decision-making.
  • Part 4 brings the discussion into the boardroom, focusing on governance, oversight, and the role directors play in managing cyber-physical risk as a core enterprise issue.

At its core, this series is about connection—connecting digital risk to physical consequences, connecting operational insight to strategic decision-making, and connecting the expertise of safety professionals with the oversight responsibilities of corporate boards.

The organizations that succeed in this environment will be those that recognize cyber risk for what it has become: a core operational and strategic risk that demands integrated thinking, disciplined execution, and leadership across the enterprise.


AI, Cybersecurity, and Operational Technology: Why Safety Professionals Hold the Key to Industrial Resilience

Introduction: AI, Cyber Conflict, and the New Threat to Operational Technology

The rapid advancement of digital technologies including artificial intelligence is transforming many aspects of the global economy. It is also reshaping the landscape of cyber conflict. AI-enabled tools and soon quantum computing are dramatically lowering the barrier to entry for sophisticated cyber operations, allowing attackers to identify vulnerabilities, automate reconnaissance, generate exploits, and coordinate attacks at a speed and scale that was previously impossible. As a result, cyber threats are evolving from isolated acts of digital intrusion into increasingly coordinated efforts capable of causing real economic disruption and physical damage.

This shift is particularly concerning for operational technology (OT)—the automated control systems that operate industrial facilities, energy infrastructure, transportation networks, and other critical components of modern economies. Unlike traditional information technology systems, OT environments control physical processes. Distributed control systems, industrial control networks, robotics, and safety instrumented systems regulate everything from chemical reactions and electrical generation to manufacturing lines and pipeline operations.

When these legacy systems are compromised, the vulnerabilities and consequences extend far beyond lost data or financial fraud. Cyber intrusions into OT environments can result in production shutdowns, equipment damage, environmental releases, and threats to worker safety.

In addition to economically incentivized hackers operating alongside of nation-state actors increasingly recognize the strategic value of targeting operational technology. Disrupting industrial operations can weaken economic stability, undermine public confidence, and create cascading supply chain effects across entire industries.

The addition of AI-driven cyber capabilities amplifies this risk by enabling adversaries to more effectively identify vulnerabilities within complex industrial systems and develop targeted attacks against critical infrastructure and automated manufacturing environments.


Two Perspectives on Operational Technology Risk

Operational technology cyber risk is often discussed through the lens of cybersecurity specialists and IT professionals. Yet the most consequential impacts of these attacks occur not in data centers, but in industrial facilities where automated systems control physical processes. Understanding and managing this risk requires perspectives that span both operational realities and enterprise governance.

Chet Brandon brings the perspective of a safety and operational risk leader with more than three decades of experience in highly automated heavy industries—including chemicals, metals, aerospace, and advanced manufacturing. In these environments, distributed control systems, safety instrumented systems, robotics, and other forms of operational technology are central to maintaining safe and reliable operations.

Fay Feeney brings a complementary perspective grounded in decades of experience advising corporate boards and insurance organizations on enterprise risk, governance, and emerging technology threats. Her work focuses on how boards evaluate complex risk landscapes, allocate resources, and oversee organizational resilience in the face of rapidly evolving threats—including cyber risk.

Together, these perspectives provide a more complete view of the challenge organizations face today. Cyber attacks targeting operational technology sit at the convergence of safety risk, business interruption, and enterprise governance.


When Cyber Risk Becomes Safety Risk

Operational Technology systems form the operational backbone of modern industry. When these systems are compromised, cyber risk quickly becomes operational risk—and in many cases, a direct threat to worker safety.

An attacker who gains access to operational control systems can manipulate process conditions, disable alarms, or interfere with automated shutdown protections. These disruptions can create pathways to serious injury or fatality (SIF) events, including:

  • loss of containment of hazardous materials
  • unexpected equipment startup
  • overpressure events
  • runaway chemical reactions

These are not hypothetical scenarios. They are the same failure modes safety professionals work every day to prevent.  More strategic opportunities exist to provide options to leaders to intelligently mitigate and transfer the risk.


Why Safety Professionals Are Critical to OT Risk Management

Safety professionals in high-hazard industries have long been responsible for managing low-probability, high-consequence risks—events that can result in serious injuries, environmental harm, major equipment damage, or extended operational disruption. Their work relies on structured methodologies designed to understand how complex industrial systems behave under abnormal conditions, how failures can propagate, and how safeguards prevent catastrophic outcomes.

Disciplines such as Process Hazard Analysis (PHA) and Hazard and Operability Studies (HAZOP) systematically examine how deviations in parameters like pressure, temperature, or flow can create unsafe conditions. These tools encourage teams to evaluate “what if” scenarios and identify the controls needed to maintain safe performance. In operational technology environments, the same approaches can be used to assess how cyber manipulation of control systems, sensor inputs, or alarm functions might lead to unsafe process states.

Failure Mode and Effects Analysis (FMEA) provides a structured way to identify potential failure points in equipment, instrumentation, and control logic, helping organizations prioritize vulnerabilities that present the greatest operational risk. Layers of Protection Analysis (LOPA) further evaluates whether safeguards—such as safety instrumented systems, operator actions, or emergency shutdown procedures—provide sufficient risk reduction, even when digital controls are compromised.

Beyond these tools, safety professionals bring a systems perspective that connects cybersecurity concerns with real operational consequences. They understand how technical, human, and organizational factors influence resilience. As industrial systems become more connected, safety leaders serve as a critical bridge between cybersecurity experts and operational teams, helping integrate cyber risk management into established operational risk frameworks and strengthening the organization’s ability to anticipate, withstand, and recover from cyber-physical disruptions. This unique collection of knowledge can be used to understand how cyber intrusions might manipulate industrial processes. Safety professionals therefore provide a critical bridge between cybersecurity experts and operational leaders.


The Business Interruption Dimension

Cyber disruption of operational technology can trigger large-scale business interruption. In continuous manufacturing industries—such as chemicals, metals, energy generation, and advanced manufacturing—even short outages can have cascading consequences across supply chains. Production losses can quickly reach tens or hundreds of millions of dollars.

Understanding these consequences requires operational insight that safety and operations professionals bring to risk discussions. Without this expertise the systemic nature of the risk is likely not identified, leading to widespread damage and losses. 


The Growing Importance of Operational Resilience

Safety professionals play a central role in designing resilient industrial systems by focusing on anticipating failures, limiting escalation, and enabling safe recovery when disruptions occur. Their systems-based approach—developed through decades of managing high-hazard operations—is directly applicable to cyber-physical risk.

  • Layered safety protections are fundamental to resilience. By establishing multiple independent barriers such as engineered safeguards, safety instrumented systems, alarm strategies, and trained operator responses, safety professionals reduce reliance on any single control. In cyber-physical events, redundancy, manual override capability, and physical isolation measures can help maintain safe conditions even if automated systems are compromised.
  • Emergency response frameworks further strengthen resilience by providing structured processes to stabilize operations during crises. Incident command systems, clear escalation protocols, and coordinated response plans enable organizations to shift to manual control, implement protective shutdowns, and safeguard personnel when digital system integrity is uncertain.
  • Management of change (MOC) processes also play a critical role. As industrial environments become more digitally integrated, safety professionals help evaluate how software updates, network changes, or remote connectivity may introduce new operational vulnerabilities. This proactive discipline supports more reliable system performance and safer recovery from disruptions.
  • Finally, incident investigation and business continuity communication practices help organizations learn and adapt. Rigorous root cause analysis of cyber-physical events can reveal weaknesses in safeguards, procedures, or training, while structured communication ensures accurate information reaches employees, leaders, regulators, and customers during disruptions.

Taken together, these principles demonstrate how safety leadership contributes to an organization’s ability not only to prevent incidents but also to withstand and recover from them. By applying proven approaches from process safety and operational risk management to emerging cyber threats, safety professionals help create industrial systems that are more robust, adaptive, and capable of maintaining safe performance in an increasingly connected and uncertain risk environment.

These risks are driving a growing focus on operational resilience across industrial organizations. In highly automated environments where digital systems control physical processes, resilience is the capability of an organization to continue operating safely even when systems are disrupted or operating conditions become uncertain. It reflects not only the strength of technology and infrastructure, but also the effectiveness of organizational planning, training, and decision-making during abnormal events.

Resilience begins with the ability to anticipate disruptions. This requires organizations to understand where vulnerabilities exist within their operational technology environment and how cyber threats could affect process conditions, safety systems, or plant operations. Scenario analysis, risk assessments, and proactive monitoring of system performance help identify potential failure pathways before they escalate into operational crises.

The second element is the ability to maintain safe operations under abnormal conditions. Industrial systems must be capable of continuing to function safely even when automated controls are degraded or compromised. This often requires layered protections, trained operators who can recognize abnormal situations, and procedures that allow plants to transition to safer operating modes when system reliability is uncertain.

Finally, resilience depends on the ability to recover quickly from incidents. When disruptions occur, organizations must be able to isolate affected systems, stabilize operations, and restore normal production safely and efficiently. Effective recovery requires clear response procedures, coordinated decision-making, and the capability to restart complex industrial processes without creating additional safety risks.

Ultimately, operational resilience has become one of the defining capabilities of modern industrial organizations. As automation, connectivity, and AI-driven technologies continue to reshape how facilities operate, the margin for error narrows while the potential consequences of disruption grow. In this environment, resilience is not simply a technical attribute—it is a leadership outcome shaped by disciplined risk management, informed decision-making, and the ability to translate complex threats into practical operational safeguards. Safety professionals play a pivotal role in this effort by ensuring that industrial systems are designed to withstand shocks, adapt under pressure, and recover without compromising the protection of people, the environment, or business continuity. Organizations that embed these resilience principles into their operations will be better positioned not only to manage cyber-physical risk, but also to sustain performance and trust in an increasingly uncertain industrial future.


Looking Ahead

Cyber threats targeting operational technology represent one of the most significant emerging risks facing industrial organizations. Understanding and addressing this risk requires collaboration across disciplines—from plant operations to cybersecurity to board governance.

In the next article in this series, we examine how artificial intelligence is changing the nature of cyber threats—and how it can also help organizations defend against them.

Posted in AI, Artificial Intelligence, Design for Safety | Tagged , , , , , , , , , , , , , , | Leave a comment

Building Sustainable Leadership: Why It Matters More Than Ever

More than a decade ago, I co-authored a paper exploring an idea that, at the time, felt important—but perhaps a bit ahead of its moment: sustainable leadership. Looking at today’s EHS, sustainability, and operational risk landscape, it’s clear that the core premise has not only held up—it has become essential.

Organizations are operating in environments defined by complexity, speed, and interdependence. Traditional leadership models that emphasize short-term performance, individual heroics, or rigid control structures simply cannot keep pace. What’s needed instead is a leadership system that can perform today and renew itself for tomorrow.

That is the essence of sustainable leadership.


From Transactional to Transformational—and Beyond

Leadership theory has evolved significantly over the last century. Early models focused on innate traits (“Great Man” theories), then shifted toward observable behaviors, situational fit, and contingency approaches. These frameworks helped explain how leaders operate—but they often stopped short of addressing long-term organizational viability.

The emergence of transformational leadership marked an important shift. Transformational leaders elevate purpose, values, and motivation, inspiring others to contribute at their highest level. Compared to transactional leadership—which trades rewards for compliance—transformational leadership focuses on meaning, ethics, and long-term impact.

Yet even transformational leadership, by itself, is no longer sufficient.

Today’s organizations require leadership that is:

  • Agile, adapting rapidly to changing conditions
  • Distributed, rather than concentrated at the top
  • Ethical and values-driven, under constant stakeholder scrutiny
  • Resilient, capable of regenerating leadership capacity over time

This is where sustainable leadership comes in.


What Do We Mean by Sustainable Leadership?

The concept of sustainability is often associated with environmental stewardship, but its original definition is broader:

Meeting the needs of the present without compromising the ability of future generations to meet their own needs.

Applied to leadership, this means building systems that:

  • Deliver strong business performance today
  • Develop leaders continuously—not episodically
  • Protect people, culture, and organizational knowledge
  • Balance short-term results with long-term capability

In practical terms, sustainable leadership ensures that leadership excellence is not dependent on a single individual, but embedded into the organization itself.


The Seven Tenets of Sustainable Leadership

Borrowing and adapting concepts from educational leadership research, sustainable leadership can be understood through seven reinforcing tenets (Hargreaves, A. & Fink D. 2006). Together, they provide a practical framework for EHS and business leaders alike.

1. Depth

Sustainable leadership promotes success that is deep and shared, not achieved at the expense of others. Performance gains should benefit employees, customers, communities, and the environment—not just one stakeholder group.

2. Length

Leadership must endure across generations. This means developing leaders who can carry forward core values while adapting strategies to new realities. Jim Collins described this as clock-building, not time-telling—creating systems that work repeatedly, not just once.

3. Breadth

In complex organizations, leadership cannot be centralized. Sustainable leadership is distributed leadership, empowering people closest to the work to make informed decisions and lead change.

4. Justice

Perceived fairness matters. Transparent decision-making, ethical conduct, and inclusive processes build trust—without which leadership credibility erodes quickly. Sustainable leadership is socially just, not self-serving.

5. Diversity

Diverse teams bring broader perspectives and greater resilience. Just as biological ecosystems thrive on diversity, organizations perform better when they value differences in background, thinking, and experience.

6. Resourcefulness

Sustainable leaders develop people and systems rather than depleting them. This includes managing pace—avoiding burnout, excessive turnover, and the erosion of institutional knowledge.

7. Conservation

Respect the past while building the future. Sustainable leadership preserves the organization’s core ideology while allowing practices, structures, and tools to evolve. The goal is continuity with renewal.


Why This Matters for EHS and Sustainability Leaders

EHS professionals sit at a unique intersection of people, operations, and risk. The challenges we face—process safety, human performance, climate risk, regulatory complexity, workforce transition—cannot be solved through compliance alone.

They require leaders who:

  • Think systemically
  • Build trust across functions
  • Develop successors intentionally
  • Align safety, sustainability, and business performance

Sustainable leadership provides a framework for doing exactly that.

It also addresses a growing reality: many organizations are facing leadership gaps created by retirements, downsizing, and years of underinvestment in development. Succession planning without leadership development is not enough. Sustainable leadership integrates both.


A Call to Action

Leadership today is less about control and more about creating conditions for others to succeed. That requires humility, discipline, and a long-term mindset.

For EHS and sustainability leaders, the opportunity is clear:

  • Embed leadership development into everyday work
  • Model ethical, values-driven decision-making
  • Distribute authority while maintaining accountability
  • Build systems that outlast individual roles

Sustainable leadership is not a program. It is a way of thinking about leadership as a renewable resource—one that must be intentionally cultivated.

The organizations that get this right won’t just perform better. They’ll endure.


A Historical Foundation — and a Still‑Relevant Framework

This work began as a response to the leadership realities of the early 2010s, but it was never intended to be time‑bound. When this framework was presented at the American Society of Safety Professionals (ASSP) 2012 Management Systems Symposium, most organizations were still grounded in compliance‑centric management systems, lagging indicators, and leader‑centric decision authority.

What made the concept of sustainable leadership distinctive then—and enduring now—is that it was designed as a leadership system, not a personality model or a program of the month. The intent was to describe how leadership must function over time, across generations of leaders, amid changing technologies, workforce expectations, and risk profiles.

More than a decade later, the pressures have intensified rather than diminished: accelerated digitalization, AI‑enabled decision support, geopolitical volatility, climate risk, demographic shifts, and growing scrutiny of corporate ethics. These forces have not invalidated the framework; they have validated it.

Sustainable leadership remains relevant precisely because it addresses how organizations renew leadership capacity while maintaining performance, rather than optimizing one at the expense of the other.


This work was originally developed and presented at the American Society of Safety Professionals (ASSP) 2012 Management Systems Symposium, at a time when many organizations were still heavily oriented toward compliance-driven safety management systems and short-term financial performance. Even then, it was clear that these approaches were insufficient to address emerging risks, workforce transitions, globalization, and the growing integration of safety, health, environmental, and business performance.

As part of that presentation, we introduced not only the conceptual model of sustainable leadership, but also a Sustainable Leadership Readiness Checklist—a practical tool designed to help leaders assess how well their behaviors, systems, and culture aligned with long-term organizational resilience.

In today’s context—defined by digital transformation, AI-enabled decision support, supply chain volatility, climate risk, and heightened expectations for ethical leadership—the relevance of this framework has only increased.


The Sustainable Leadership Readiness Checklist — Modernized

One of the original goals of this work was to avoid leaving leaders with theory alone. The Sustainable Leadership Readiness Checklist, introduced alongside the 2012 presentation, was designed as a practical diagnostic to translate leadership intent into observable behaviors and systems.

The checklist has stood the test of time because it focuses on conditions rather than tools. While technologies, organizational structures, and management fads change, the underlying leadership conditions required for long‑term performance remain remarkably stable.

How the Checklist Works

The tool uses a 1–5 scoring scale, where:

  • 5 = Consistently demonstrated and embedded into formal and informal systems
  • 4 = Regularly demonstrated, with minor gaps
  • 3 = Inconsistently applied or highly leader‑dependent
  • 2 = Sporadically demonstrated
  • 1 = Rarely demonstrated or absent

An average score of 3.5 or higher suggests a leadership system reasonably prepared to sustain performance across leadership transitions.

What has changed since 2012 is not the scoring logic, but the context in which leaders must demonstrate these behaviors. Below, each tenet is reframed through a modern EHS, sustainability, and operational‑risk lens.


Depth – Building Meaningful, Shared Success

Depth reflects whether leadership practices create durable value for people and the organization as a whole.

Key indicators include:

  • Promoting cross-functional and team-based success, rather than siloed optimization
  • Actively advocating learning for all levels of the organization
  • Embedding continuous improvement and learning into daily work—not just after incidents

In EHS terms, depth shows up when learning from near-misses is valued as much as lagging metrics, and when improvement efforts strengthen both operational performance and workforce capability.


Length (Endurance) – Leadership That Outlasts Individuals

Length focuses on leadership continuity and resilience.

Indicators include:

  • Demonstrated resilience in the face of adversity
  • Intentional succession planning and talent development
  • Building professional cultures anchored in shared values
  • Establishing high-trust environments where issues surface early
  • Consistently developing high-performing teams, not just high-performing individuals

Organizations that score poorly here often experience performance swings tied directly to leadership turnover.


Breadth – Distributed and Connected Leadership

Breadth assesses whether leadership capacity is spread across the organization.

Indicators include:

  • Practicing distributed leadership, especially at the point of risk
  • Building and maintaining strong internal and external relationships
  • Actively sharing knowledge and resources across boundaries
  • Developing others through coaching, mentoring, and stretch assignments

In high-risk operations, breadth is evident when frontline leaders feel authorized—and expected—to act on safety and operational concerns without waiting for permission.


Justice – Fairness, Ethics, and Accountability

Justice evaluates the ethical foundation of leadership.

Indicators include:

  • Transparency in decision-making and communications
  • Consistent alignment with ethics and stated values
  • Personal and organizational integrity under pressure
  • Clear accountability, applied fairly and predictably

Perceived injustice erodes trust faster than almost any other leadership failure—and no amount of technical excellence can compensate for it.


Resourcefulness – Developing, Not Depleting, Capacity

Resourcefulness reflects how leaders use—and renew—organizational resources.

Indicators include:

  • Open, adaptive communication that invites modification and challenge
  • Strong emphasis on teamwork over individual heroics
  • Encouragement of collective action, new ideas, and diverse perspectives

For EHS leaders, this often means resisting the temptation to solve every problem personally and instead building problem-solving capability throughout the system.


Diversity – Resilience Through Difference

Diversity goes beyond representation; it addresses cognitive and experiential variety.

Indicators include:

  • Awareness of global and cultural dynamics
  • Alignment of policies and procedures with inclusive values
  • Processes that promote interaction of differing ideas and viewpoints
  • Leadership behaviors that are flexible and adaptable, not rigid

Diverse leadership systems are better equipped to anticipate emerging risks and adapt to unfamiliar challenges.


Conservation – Preserving What Matters While Evolving

Conservation evaluates how well leaders balance continuity and change.

Indicators include:

  • Building learning and innovation networks inside and outside the organization
  • Thinking and acting in systems-oriented ways
  • Demonstrating skill in constructive conflict resolution

This tenet ensures that transformation does not come at the cost of identity, values, or hard-earned institutional knowledge.


How I Would Score Most Organizations Today

Based on three decades of experience across manufacturing, process industries, and global operations, and observing how organizations have responded to recent disruptions, here is a candid assessment:

  • Depth: Moderate (3–3.5) — Many organizations promote teamwork and continuous improvement rhetorically, but still reward individual performance and short‑term results disproportionately.
  • Length (Endurance): Weak to Moderate (2.5–3) — Succession planning exists on paper, but leadership continuity often depends on a few individuals rather than robust systems.
  • Breadth: Uneven (3) — Distributed leadership is encouraged during crises, but decision authority frequently recentralizes once conditions stabilize.
  • Justice: Variable (3–4) — Ethics and transparency are widely stated values, yet consistency under pressure remains a differentiator between organizations.
  • Resourcefulness: Moderate (3–3.5) — Many leaders talk about developing people, but workload, pace, and constant reorganization quietly deplete human capacity.
  • Diversity: Improving but fragile (3–3.5) — Progress has been made, but inclusion of diverse perspectives in decision‑making still lags representation.
  • Conservation: Underdeveloped (2.5–3) — Organizations are good at change initiatives, less skilled at preserving institutional knowledge and core identity through change.

The overall pattern is clear: most organizations hover just below true sustainability, capable of performing today but vulnerable to leadership transitions, burnout, or strategic whiplash.


Using the Checklist as a Leadership Development Tool

The readiness checklist was never intended as a scorecard alone. Its real value lies in the conversations it enables:

  • Where are we strong—and why?
  • Where are we vulnerable if key leaders leave?
  • Which tenets are under the most strain in today’s operating environment?

Used periodically, the tool helps leaders track progress, identify systemic gaps, and align leadership development efforts with long-term business and EHS objectives.


What Sustainable Leadership Must Add in the Age of AI

Since this framework was first introduced, leadership has entered a new era. Artificial intelligence, advanced analytics, digital twins, and automation now shape how decisions are made, risks are identified, and work is performed. These tools are powerful—but they are not neutral.

Sustainable leadership must now explicitly include human guardrails for AI-enabled systems:

  • Judgment over automation: AI can inform decisions, but leaders remain accountable for outcomes.
  • Ethical design and use: Values must be embedded into how digital tools are trained, deployed, and governed.
  • Human performance integration: Technology should augment cognitive capacity, not replace critical thinking.
  • Transparency and explainability: Leaders must understand—and be able to explain—how recommendations are generated.

In this sense, sustainable leadership becomes the operating system that ensures advanced tools strengthen rather than erode trust, safety, and organizational resilience.


Executive Snapshot: The Sustainable Leadership Readiness Diagnostic

For senior leaders and boards, the checklist can be simplified into a high-level diagnostic:

  • Depth – Are we building success for people, performance, and stakeholders simultaneously?
  • Length – Would leadership continuity survive an unexpected transition?
  • Breadth – Where does real decision authority live during normal operations?
  • Justice – Do people trust our processes when outcomes are unpopular?
  • Resourcefulness – Are we developing capacity or consuming it?
  • Diversity – Are different perspectives shaping decisions, or just present in the room?
  • Conservation – What would we lose if our most experienced leaders left tomorrow?

If these questions create discomfort, the diagnostic is doing its job.


A Real-World EHS Example: Sustaining Safety in a High-Hazard Operation

Dow Chemical provides a well-recognized example of how sustainable leadership can support long-term safety performance in a high-hazard industry. Operating globally across petrochemicals, specialty chemicals, and advanced materials, Dow manages inherently dangerous processes—reactive chemistry, high-pressure systems, flammable and toxic substances, and complex contractor interfaces—at enormous scale.

Over multiple decades, Dow has maintained a reputation for strong process safety performance while navigating mergers, divestitures, economic cycles, regulatory change, and continuous leadership turnover. This consistency has not come from the absence of risk, but from the presence of a durable leadership system aligned with the seven tenets of sustainable leadership.

Depth is evident in Dow’s long-standing integration of safety into operational excellence. Safety is treated as a core business value rather than a compliance obligation. Incident investigations emphasize learning and system improvement, and safety accountability is shared across operations, engineering, maintenance, and leadership—not isolated within the EHS function.

Length (Endurance) is reflected in Dow’s disciplined leadership development and succession practices. Core safety principles—such as rigorous management of change, process hazard analysis, and respect for operating discipline—have remained stable across generations of leaders, even as technologies and organizational structures have evolved.

Breadth appears in the way safety leadership is distributed throughout the organization. Operators are empowered to stop work, engineers retain lifecycle accountability for hazard controls, and leaders at all levels are expected to challenge unsafe conditions. Decision authority does not bottleneck at the top; it resides close to the hazard.

Justice underpins trust within Dow’s operating culture. Transparency following incidents, consistent ethical expectations, and fair accountability enable weak signals to surface early. Employees and contractors understand that raising concerns is not only permitted but expected.

Resourcefulness shows up in Dow’s attention to capability and pace. Investments in training, competence assurance, and leadership development are treated as risk controls. Fatigue, skill degradation, and organizational churn are recognized as contributors to serious incidents and managed accordingly.

Diversity strengthens Dow’s approach to risk. Cross-functional, cross-cultural, and cross-generational teams are routinely engaged in hazard reviews and operational decision-making, broadening perspectives and improving resilience in a global operating environment.

Conservation ensures that institutional knowledge is preserved. Hard-earned lessons from incidents, near misses, and industry learning are embedded into standards, engineering practices, and training systems. Change initiatives modernize operations without discarding the principles that historically kept people and communities safe.

The result is not a reliance on heroic individuals or perfect rules, but a leadership system capable of sustaining safety performance over decades—even as leaders, markets, and technologies change.

A note on Dow’s current challenge to maintain Sustainable Leadership : While Dow has long been held up as an example of sustained safety and operational discipline, resulting from sustainable leadership in action, recent developments present a realtime test. In early 2026 the company announced a major restructuring plan under its “Transform to Outperform” initiative, including the elimination of approximately 4,500 jobs globally—about 13 % of its workforce—as it pivots toward automation, AI, and cost efficiency amid weak demand and profitability pressures.

These strategic shifts, driven by structural market conditions rather than safety failures, create a leadership endurance challenge: calibrating short-term organizational transformation with the long-term commitments to justice, resourcefulness, and conservation of safety culture that have defined its historical performance.


A Call to Action for EHS Leaders

If you lead EHS, process safety, or sustainability in a high-risk organization, here is the uncomfortable but necessary question:

Would your safety performance remain strong if your top three leaders left tomorrow?

If the honest answer is “it depends,” then leadership—not technology, not procedures—is your most critical risk exposure.

Sustainable leadership demands intentional design. It requires moving beyond individual capability toward leadership systems that:

  • Develop successors before they are needed
  • Distribute authority to the point of risk
  • Preserve institutional knowledge while embracing innovation
  • Protect people from chronic overload and decision fatigue

The organizations that sustain safety over decades do not wait for disruption to test their leadership systems. They test them deliberately—through development, transparency, and disciplined succession—long before failure makes the test unavoidable.


Final Thoughts

If leadership is a critical risk control, is it being tested and audited with the same rigor as your highest‑hazard processes?

Sustainable leadership is not a nostalgic concept from an earlier era of management systems. It is a forward-looking framework designed to help organizations operate effectively at the edge of complexity.

For EHS and sustainability leaders, the challenge is not adopting new tools or frameworks—it is ensuring leadership itself remains a renewable resource.

The seven tenets—Depth, Length, Breadth, Justice, Resourcefulness, Diversity, and Conservation—provide a durable compass. When embedded intentionally, they allow organizations to perform today while remaining capable tomorrow.

That is the real measure of leadership sustainability.


One-Page Executive Diagnostic: Sustainable Leadership at a Glance

For executive teams and boards, sustainable leadership can be assessed quickly using this high-level diagnostic. It is not a compliance checklist—it is a leadership risk scan.

Depth
Do safety, operational excellence, and people development reinforce each other—or compete for attention?

Length
Is leadership continuity designed into the system, or dependent on individual tenure and goodwill?

Breadth
Where does real decision authority live during normal operations—not just during incidents?

Justice
Do people trust leadership processes when decisions are difficult, unpopular, or costly?

Resourcefulness
Are leaders building capacity over time, or consuming it through pace, churn, and constant reprioritization?

Diversity
Are diverse perspectives actively shaping risk decisions, or merely represented in meetings?

Conservation
What critical knowledge, standards, or values would be lost if experienced leaders exited suddenly?

A leadership system that scores weak in multiple areas may still perform today—but it is unlikely to perform reliably tomorrow.

If leadership is a critical risk control, is it being tested and audited with the same rigor as your highest‑hazard processes?


This article is adapted and expanded from a paper and presentation delivered at the American Society of Safety Professionals (ASSP) 2012 Management Systems Symposium, updated to reflect current organizational, technological, and workforce realities.

References:

Hargreaves, A. & Fink D., Sustainable  Leadership, Jossey-Bass 2006

Posted in Uncategorized | Tagged , , , , , , , , , | Leave a comment

You’ve Been Given the Assignment: Why Modern EHS Leadership Requires a New Operating Model

You’ve just been given the assignment:

As Vice President of Global Environmental, Health, and Safety, of a Fortune 500 manufacturing company, you are expected to transform a globally distributed function into a proactive, resilient, and business-aligned capability. Your mandate is explicit: drive a cultural shift toward proactive safety, embed EHS excellence into the company’s operating DNA, modernize global standards, leverage advanced data analytics, redesign performance metrics, and align EHS efforts with long-term business objectives. You are also expected to integrate external benchmarks, foster cross-functional collaboration, and elevate performance across every region and function.

This is not a program refresh.
It is not a compliance initiative.
It is an operating-model transformation.

And it exists because the systems that once kept organizations safe are increasingly fragile in ways we don’t always see.


Why This Assignment Exists Now

Organizations today operate in an environment defined by rising asset complexity, accelerating automation, thinning workforce experience, tighter margins, and near-zero tolerance for catastrophic risk. At the same time, regulators, investors, and boards expect not just compliance, but demonstrable control of operational risk.

The challenge is that many EHS management systems appear strong. Procedures are in place. Audits are clean. Injury rates are low. Yet incidents—often severe ones—continue to occur in organizations that believed they were well controlled.

This is the hallmark of system fragility: systems that look stable under normal conditions but fail abruptly under stress. The assignment you’ve been given exists because traditional EHS models, while necessary, were not designed for today’s pace, complexity, and uncertainty.


The First Insight: This Is Not an EHS Problem

One of the earliest realizations in this role is that you cannot transform EHS by “fixing EHS.”

Most legacy EHS management systems were designed for predictability. They assume hazards can be fully anticipated, work can be standardized, and compliance equals control. In reality, modern operations rely heavily on human adaptation—adjusting to degraded equipment, time pressure, staffing gaps, and conflicting priorities.

Fragility emerges when EHS systems:

  • Rely on procedures that describe work as imagined, not work as done
  • Treat adaptation as deviation rather than necessity
  • Depend on lagging indicators that mask accumulating risk
  • Use static risk assessments in dynamic operating environments

Under these conditions, the system absorbs stress quietly—until it can’t. Transformation stalls when leaders mistake the absence of incidents for the presence of control.


Reframing the Mission: From Compliance to Managing System Fragility

True transformation begins by reframing the EHS mission in language the business understands:

Protect people, safeguard operations, and preserve enterprise value by keeping the organization within safe operating boundaries.

This reframing is critical because fragility is not eliminated by more rules—it is reduced by understanding system limits, monitoring drift, and strengthening controls before failure occurs. When EHS is positioned as a capability that manages system health and risk exposure, it aligns naturally with operations, engineering, finance, and strategy.

Safety becomes inseparable from operational reliability and asset integrity. EHS evolves from a reporting function into a risk intelligence function.


What Changes—and Why It Matters

At its core, the transformation is a shift in how organizations think about control:

Traditional EHS Systems

  • Reactive and event-driven
  • Focused on lagging indicators
  • Built on procedural compliance
  • Optimized locally

Modern, Resilient EHS Systems

  • Anticipatory and risk-based
  • Focused on leading indicators and weak signals
  • Designed around control effectiveness and system health
  • Oriented toward enterprise-level risk

In many organizations, sites with excellent injury rates carry the highest latent risk due to aging assets, deferred maintenance, or fragile controls. Traditional EHS systems rarely surface this reality. Modern EHS must.


Building a Unified Global Safety Culture That Reduces Fragility

One of the most visible expectations of the role is building a unified global safety culture. The common mistake is equating unity with uniformity.

Fragility increases when global standards force identical solutions onto different operating realities. High-performing organizations instead unify around common principles: how risk is evaluated, how escalation occurs, how leaders respond to bad news, and how learning is captured.

A unified culture exists when leaders everywhere ask the same questions about system health and control effectiveness—even when local conditions differ.

Theory to Practice: Building a Unified Global Safety Culture

Building this kind of unified culture requires deliberate action, not messaging. Leaders must define a small set of non-negotiable global principles—how risk is evaluated, what constitutes unacceptable exposure, when and how escalation occurs, and how leaders are expected to respond when controls fail. These principles must be reinforced through leadership routines: common risk review questions used at every site, standardized escalation thresholds tied to severity potential rather than injury outcomes, and consistent expectations for learning reviews that focus on system weaknesses instead of individual error. Global standards should specify intent and critical controls while allowing local teams to determine how those controls are implemented. Leadership development, performance evaluation, and recognition systems must reinforce transparency and early risk identification, making it clear that surfacing fragility is a leadership responsibility—not a failure.


Modernizing Standards: Designing for Real Work and Real Variability

Legacy global standards often describe ideal conditions and perfect execution. They become brittle when reality deviates.

Modern standards acknowledge that variability is normal and adaptation is inevitable. They are designed around:

  • Critical controls, not exhaustive rules
  • Intent and boundaries, not perfection
  • Support for human performance under pressure

By shifting from rulebooks to decision-support frameworks, standards reduce fragility by helping people make better decisions when conditions are imperfect—which is most of the time.

Theory to Practice: Modernizing Global Standards

Modernizing standards in practice requires rethinking both their content and how they are used. Leaders must identify and explicitly define critical controls—the small number of safeguards whose failure would result in serious harm—and ensure standards clearly describe their purpose, performance expectations, and degradation signals. Standards should define decision boundaries, clarifying what must never be compromised, what requires escalation, and where informed local judgment is expected. Field validation is essential; standards must be tested against real work through frontline engagement and learning teams to ensure they reflect actual operating conditions. Finally, standards must be embedded into daily work through planning processes, digital workflows, and leadership conversations, transforming them from compliance artifacts into tools that support safe adaptation.


Leveraging Advanced Analytics: Making Fragility Visible

Fragility persists when leaders cannot see it.

Advanced analytics is transformative not because it produces better reports, but because it exposes where systems are weakening. Leading organizations use data to monitor control effectiveness, detect weak signals, and identify patterns of drift across sites and processes.

This allows leaders to intervene while risk is still manageable. When EHS analytics can answer questions like Where is risk accumulating faster than our controls? the function moves from hindsight to foresight.

Theory to Practice: Using Analytics to Reduce Fragility

Translating analytics into reduced fragility requires redefining EHS data strategy around exposure, control effectiveness, and system health rather than incident counts. This begins with identifying indicators that signal weakening controls—such as repeated temporary fixes, permit deviations, deferred maintenance, or workload saturation—and integrating data across EHS, operations, and maintenance systems. Analytics should highlight trends and variability, not rank sites by outcomes. Most importantly, organizations must institutionalize leadership routines where data is reviewed alongside operational context, enabling proactive intervention before systems drift outside safe operating boundaries.


Redesigning Metrics: From Reassurance to Governance

Metrics shape behavior—and fragile systems are often reinforced by reassuring metrics.

Low injury rates and clean audits can coexist with high exposure. Transformational EHS leaders redesign metrics to reflect:

  • Risk exposure and severity potential
  • Control reliability and degradation
  • Learning velocity and transparency

These are not scorecards; they are governance tools. They inform capital allocation, operational priorities, and leadership focus. They help boards and executives understand whether the system is becoming stronger—or more fragile.

Theory to Practice: Redesigning EHS Metrics

Redesigning metrics requires intentional trade-offs. Organizations must reduce the prominence of lagging indicators and introduce measures that track risk exposure, quality of control verification, time-to-escalation for high-risk conditions, and the effectiveness of corrective actions. Metrics should be designed to prompt inquiry rather than judgment, encouraging leaders to ask where systems are weakening rather than who is underperforming. When metrics reward learning, transparency, and early intervention, they become stabilizing forces rather than sources of distortion.


Leadership and Culture: Where Fragility Is Either Reinforced or Reduced

The most difficult part of the assignment is cultural, and it begins with leadership.

Fragility thrives when bad news is suppressed, deviations are punished, and leaders reward the appearance of control over insight. Resilient organizations do the opposite. Leaders signal that early warning is valued, that learning outweighs blame, and that system weaknesses are leadership problems—not worker failures.

Accountability shifts from who failed to how the system allowed failure to develop. Ownership replaces enforcement.

Theory to Practice: Leading for Resilience

Reducing fragility depends on how leaders behave when risk is surfaced. Leaders must be trained and evaluated on their ability to respond constructively to weak signals—rewarding early escalation, probing for system contributors, and resisting the urge to default to individual accountability. This requires consistent leadership routines: asking the same risk-focused questions at every level, participating in learning reviews, and visibly prioritizing control reliability. Over time, these behaviors create trust and ensure that risk is addressed before it manifests as harm.


The Real Outcome of the Assignment

The assignment has already been given. The only question is whether organizations are willing to change how EHS is led.

If the transformation is successful, the result is not simply fewer incidents or better audit scores.

It is an organization that understands its own limits, detects drift early, and adapts without losing control. Leaders make better decisions under uncertainty. Operations become more reliable. EHS is recognized not as a compliance function, but as a discipline that actively reduces fragility and protects enterprise value.

Ultimately, this assignment asks a deeper question:

Will EHS remain a function that reports on safety—or will it become a leadership capability that strengthens the systems the business depends on?

In today’s operating reality, only one of those models is sufficient.

Selected References: Foundations for Modern, Resilient EHS Leadership

1. Work-as-Done vs. Work-as-Imagined

(Safety-II, Resilience Engineering, System Fragility)
Supports article sections on fragility, real work, standards design, and system drift.

  • Hollnagel, E. Safety-I and Safety-II: The Past and Future of Safety Management.
    Foundational framework for shifting EHS from rule compliance to managing system performance under variability.
  • Dekker, S. Drift into Failure (2nd ed.).
    Explains how organizations gradually migrate toward risk despite procedures and controls.
  • Woods, D. et al. Resilience Engineering: Concepts and Precepts.
    Establishes adaptive capacity and brittleness as core properties of complex systems.

2. Human & Organizational Performance (HOP 2.0)

(System Learning, Weak Signals, Predictive EHS)
Supports sections on leadership behavior, learning, and early risk detection.

  • Conklin, T. The 5 Principles of Human Performance.
    Widely adopted operational model reframing incidents as system outcomes rather than human failure.
  • Dekker, Hollnagel, Woods. Human Factors and Safety Science: A Decade of Progress.
    Connects human performance, system design, and modern operational complexity.
  • Conklin et al. Pre-Accident Investigation Framework.
    Practical methodology for event-free learning and identifying latent system weaknesses.

3. Adaptive & Dynamic Risk Management

(From Static Assessments to Live Risk Awareness)
Supports sections on analytics, leading indicators, and managing drift.

  • Hollnagel, E. Resilience Engineering in Practice.
    Practical guidance for continuous monitoring of system performance and control effectiveness.
  • NASA – Dynamic Risk Assessment and Control (DRAC) methodologies.
    Applied models for real-time risk evaluation in high-consequence environments.
  • NATO STO / Military Adaptive Risk Doctrine (post-2019).
    Influential in shaping continuous risk sensing and decision-making under uncertainty.

4. Agile & Lean Portfolio Management for EHS

(Operating-Model Transformation, Not Programs)
Supports sections on governance, prioritization, and transformation execution.

  • Scaled Agile Framework (SAFe) – Lean Portfolio Management.
    Increasingly used to manage EHS initiatives as value streams aligned with enterprise priorities.
  • McKinsey & Company. Agile at Scale (Operations & Risk applications).
    Practical guidance for integrating EHS into enterprise transformation efforts.
  • LNS Research. EHS 4.0 / Industrial Transformation.
    Strong applied linkage between digital operations, EHS governance, and analytics.

5. High-Reliability Operating Systems (HRO 2.0)

(From Culture to Integrated Control)
Supports sections on leadership, escalation, and enterprise risk governance.

  • Weick, K. & Sutcliffe, K. Managing the Unexpected (updated editions).
    Foundational HRO principles informing leadership behavior and risk sensitivity.
  • INPO / DOE High-Reliability Models (post-COVID updates).
    Applied in nuclear, energy, and chemical sectors with integrated operations and EHS oversight.
  • MIT Sloan Management Review. Digital Operations & Reliability research.
    Connects HRO principles with real-time analytics and operational control centers.

6. Risk-Based Prioritization & Value-at-Risk (VaR) Models

(Board-Relevant EHS Governance)
Supports sections on metrics, governance, and enterprise value protection.

  • COSO. Enterprise Risk Management (2017–2023 updates).
    Framework for translating operational risk into strategic and financial impact.
  • McKinsey & Company. Risk as a Strategic Capability.
    Widely used to connect operational risk to EBITDA and enterprise value.
  • CCPS / API RP 754. Risk-based and severity-weighted process safety metrics.
    Practical tools for exposure-based prioritization beyond injury rates.

7. Digital Learning & Just-in-Time Competence

(Reducing the Gap Between Knowing and Doing)
Supports sections on standards, human performance, and control reliability.

  • Ericsson, A. et al. Peak.
    Applied research underpinning microlearning, field-based coaching, and skill sustainment.
  • PwC / Accenture. Digital workforce enablement (AR, AI task guidance).
    Practical deployment models in manufacturing, energy, and infrastructure sectors.
  • ILO / EU-OSHA. Digitalization of Occupational Safety and Health (post-2020).
    Applied guidance on AI-supported learning and competence in modern work systems.
Posted in AI, Artificial Intelligence, Innovation, New View of Safety, Uncategorized | Tagged , , , , , , , , | Leave a comment

Seeing Risk Before It Hurts: An Example of How Predictive Analytics Are Redefining Safety

A modern manufacturing floor with workers in PPE performing routine tasks around heavy machinery, overlaid with a subtle digital visualization showing emerging risk. Certain work areas glow amber and red like a heat map, while others remain green, indicating real-time safety conditions before an incident occurs. Faint AI-style lines and nodes connect people, machines, and sensors, suggesting predictive intelligence quietly monitoring and interpreting risk in the background.

This article describes a predictive injury prevention concept currently under development, not a finished or commercially available system. The work reflects an active effort to design, test, and refine an approach that could move occupational safety from reactive analysis toward real-time risk anticipation. The next step for this concept is a pilot phase, to be pursued through collaboration with a qualified technology partner capable of helping translate theory and design into a working implementation.

For most of its history, occupational safety has depended on learning from what has already gone wrong. Injuries occur, investigations follow, and controls are strengthened in hopes of preventing recurrence. While this approach has delivered meaningful progress, it leaves a persistent gap: the period of time when risk is forming but no one is yet hurt. Advances in predictive analytics and artificial intelligence now make it possible—at least in concept—to close that gap by identifying emerging risk conditions and intervening earlier than traditional systems allow.

More than a decade ago, I argued that the EHS profession needed to prepare for a fundamental shift in how risk would be identified and controlled—one driven by emerging digital and analytical capabilities that were only beginning to take shape. My vision in 2014 was that EHS could, and should, move beyond static indicators and retrospective analysis toward systems capable of continuously sensing conditions, integrating diverse data streams, and seeing risk before it hurts. While the term “AI” was not yet common in professional safety conversations, the intent was clear: use advanced analytics to proactively manage risk as a dynamic system rather than react to its failures. Today, the convergence of computer vision, machine learning, and causal modeling makes it possible to actively pursue that vision, translating early foresight into a concrete design effort aimed at redefining how safety risk is recognized, understood, and acted upon in real time.

The following example outlines how such a predictive approach could function in a manufacturing environment. It is intentionally presented as a design framework rather than a finished solution, with the goal of encouraging discussion, critique, and collaboration across the EHS and technology communities. The emphasis is on how modern data integration and causal analytics might be applied to injury prevention, and what new capabilities could emerge if these tools are implemented thoughtfully and responsibly.


The Limits of Reactive Safety Systems

Traditional injury prevention systems are inherently retrospective. Even many “leading indicators” are signals that something did exist rather than confirmation that it does exist right now. Audits, observations, and lagging metrics provide valuable insight, but they are episodic and often disconnected from the moment-to-moment realities of work.

As manufacturing systems become more complex, tightly coupled, and sensitive to production pressure, risk increasingly emerges dynamically. Unsafe behaviors, degraded equipment condition, environmental stressors, and organizational demands can align quickly, creating exposure that may not be visible through conventional reporting cycles.


A Shift Toward Real-Time Risk Awareness

Recent advances in artificial intelligence enable a fundamentally different approach. Instead of relying solely on periodic reviews, safety systems can now maintain continuous awareness of operating conditions. Rather than asking what went wrong, they can ask what is happening right now—and what combination of factors makes an injury more likely in this moment.

The predictive injury prevention concept described here is built around that shift. Its purpose is not to replace existing EHS processes, but to augment them with a real-time layer of risk intelligence that operates continuously alongside traditional systems.


Integrating Disparate Data Streams

One of the greatest challenges in advancing predictive safety is not the lack of data, but the fragmentation of it. In most manufacturing organizations, information relevant to injury risk exists across multiple systems that were never designed to work together. Video feeds, EHS management systems, Wearable devices, maintenance platforms, employee feedback/reporting systems, and production databases each capture a partial view of reality, often using different structures, time scales, and levels of data quality.

The predictive injury prevention concept addresses this challenge by using artificial intelligence not just as an analytical engine, but as an integration and data-preparation layer. Before any causal modeling occurs, AI is used to harmonize these disparate inputs into a form that can meaningfully support structural equation analysis.

From Raw Signals to Comparable Inputs

The first role of AI in the system is signal normalization. Video-based AI generates high-frequency observations—counts or rates of unsafe behaviors and conditions detected in specific zones. EHS systems produce lower-frequency, event-driven data such as hazard reports, near misses, and corrective action updates. Operational systems generate continuous performance data tied to production cycles, shifts, or equipment states.

Machine learning algorithms are used to align these inputs onto a common analytical timeline and spatial context. This includes time-window aggregation (for example, rolling 5–15 minute intervals), zone-level mapping, and shift-based normalization. The goal is to ensure that data describing behavior, system condition, and operational pressure are comparable and synchronized, rather than evaluated in isolation.

Data Quality, Noise Reduction, and Contextual Weighting

Raw data—particularly from video analytics—can be noisy. AI plays a critical role in filtering false positives, de-duplicating repeated observations, and weighting signals based on confidence and relevance. For example, repeated detections of the same behavior by the same individual in a short period are treated differently than multiple independent detections across a work group.

Natural language processing is applied to free-text fields in hazard reports, near-miss narratives, and employee concerns. These narratives are classified, clustered, and scored for relevance to specific risk drivers, allowing qualitative inputs to be translated into structured indicators without losing nuance.

Operational data are similarly contextualized. Production rates are evaluated relative to historical baselines rather than absolute values, distinguishing normal high output from abnormal stress. Maintenance indicators are adjusted for asset criticality and operating mode. In this way, AI ensures that the data feeding the model reflect meaningful deviations, not background variation.

Constructing Latent Variables for Structural Equation Modeling

Once data are cleaned, aligned, and contextualized, AI assists in feature construction—the process of grouping related observable indicators into candidate latent variables suitable for structural equation modeling. This step is critical, as SEM depends on theoretically sound groupings that reflect real-world risk mechanisms.

For example, AI-driven clustering and correlation analysis may confirm that PPE violations, line-of-fire exposure, and unsafe lifting consistently co-occur under similar conditions, supporting their use as indicators of a latent “Unsafe Acts” construct. Similarly, delayed preventive maintenance, rising vibration levels, and increased breakdown frequency may form a coherent “System Condition” construct.

Importantly, this process is guided by safety theory and professional judgment, not automated pattern recognition alone. AI accelerates discovery and validation, but human oversight ensures that constructs remain interpretable, defensible, and aligned with how work is actually performed.

Preparing Data for Causal Analysis

Before structural equation modeling is executed, AI-driven preprocessing ensures that the data meet the assumptions required for stable causal analysis. This includes handling missing data, standardizing variables, identifying outliers that represent true signals rather than errors, and testing for temporal stability.

The system also evaluates whether relationships between variables remain consistent over time or vary under different operating conditions. Where appropriate, models are adapted to account for site-specific or process-specific differences, allowing the causal structure to remain valid without forcing uniformity where it does not exist.

Creating a Living Risk Model

This integration process is not a one-time exercise. As new data are collected and conditions change, the AI layer continuously re-evaluates indicator performance, latent construct validity, and model fit. When new patterns emerge—such as a shift in how production stress influences behavior—the system flags these changes for review and model refinement.

The result is a living risk model: one that evolves with the operation, improves with experience, and maintains alignment between data, theory, and practice. This model also operates in real time, continuously integrating all data as it is recieved.

By using AI to integrate and prepare data for structural equation modeling, the system transforms disconnected signals into a coherent representation of risk. This foundation is what enables predictive analytics to move beyond correlation, providing reliable, explainable insight into how and why injury risk is forming in real time.


Understanding Risk Through Causal Modeling

Most safety analytics struggle with the same fundamental limitation: they treat risk factors as independent signals. An unsafe behavior is counted, a maintenance backlog is tracked, a production rate is monitored—each measured, trended, and reviewed largely on its own. While this provides visibility, it does not explain how these factors interact to create injury risk, nor does it help leaders understand which combinations of conditions matter most in a given moment.

Predictive injury prevention requires a different analytical approach—one that is explicitly designed to model cause-and-effect relationships in complex systems. This is where structural equation modeling (SEM) becomes a critical enabling technology.

SEM allows multiple observable signals to be grouped into broader, underlying risk drivers—often referred to as latent variables. These latent variables represent conditions that cannot be measured directly but are inferred from patterns in real-world data. For example, repeated PPE violations, frequent line-of-fire exposure, and unsafe lifting behaviors may collectively indicate an underlying behavioral risk state. Similarly, missed preventive maintenance, increasing breakdown frequency, and abnormal vibration levels may indicate system degradation that increases exposure even when work practices appear unchanged.

The power of SEM lies in its ability to model how these latent risk drivers influence one another and contribute—individually and in combination—to overall injury risk. Rather than assuming that all unsafe acts carry equal weight at all times, the model estimates how strongly each driver contributes to risk under current conditions, and how those contributions change as the system evolves.

In the predictive injury prevention concept, this modeling approach enables the calculation of an instantaneous risk level for a specific operating area. That risk level is not simply the sum of recent events. It reflects the structure of the system: how production pressure amplifies behavioral risk, how degraded equipment condition increases the consequence of minor errors, and how effective (or ineffective) corrective action processes dampen or accelerate exposure.

An Example Structural Equation Model

To make this more concrete, consider a simplified example of how risk might be modeled using SEM in a manufacturing environment.

First, observable data are grouped into latent drivers:

  • Unsafe Acts (UA):
    Indicated by PPE violations, line-of-fire exposure, unsafe lifting, and bypassed guards.
  • System Condition (SC):
    Indicated by preventive maintenance compliance, breakdown frequency, and equipment reliability measures.
  • Operational Stress (OS):
    Indicated by production rate deviation, unplanned changeovers, and yield instability.
  • Safety Response Capability (SRC):
    Indicated by hazard reporting rate, corrective action timeliness, and near-miss follow-up quality.

These latent variables are then related to overall injury risk through a structural equation such as:

Instantaneous Risk (IR) =
0.45 × Unsafe Acts

  • 0.30 × Operational Stress
    − 0.25 × System Condition
    − 0.20 × Safety Response Capability

In this example, unsafe acts have the strongest direct influence on risk, but their effect is moderated by operational stress and system condition. High production pressure increases the impact of unsafe behaviors, while strong maintenance performance and timely corrective actions reduce overall risk, even when behaviors are not perfect.

The model can also include interaction effects, such as:

IR = … + 0.10 × (Unsafe Acts × Operational Stress)

This term reflects a reality familiar to most practitioners: the same behavior that is tolerated under stable conditions becomes far more dangerous when the system is under stress.

Importantly, these coefficients are not assumed—they are estimated from actual site data and recalibrated as conditions change. As new hazards are reported, corrective actions are completed, or production stabilizes, the relationships update, allowing the model to distinguish between short-term noise and meaningful shifts in risk.

Why This Matters in Practice

This causal structure is what allows the system to move beyond alerting and into guidance. When a monitored area transitions from green to yellow or red, the system does not simply state that risk is high. It can explain why—for example, that rising production pressure combined with delayed maintenance has amplified the impact of recurring unsafe lifting behaviors—and point to corrective actions most likely to reduce risk quickly.

Just as importantly, the model can confirm when interventions work. If a targeted maintenance action or workflow adjustment reduces the contribution of a specific risk driver, the overall risk score reflects that improvement in near real time. Over time, this creates a learning loop in which the organization gains insight not only into where risk exists, but into which controls are most effective under which conditions.

By modeling safety as a dynamic system rather than a static checklist, structural equation modeling provides the analytical backbone that makes predictive injury prevention both credible and actionable. It allows EHS professionals and operations leaders to see risk forming, understand its drivers, and intervene with precision—before someone gets hurt.


From Analytics to Action

To be usable at the front line, the system translates complex analytics into a simple visual signal: green, yellow, or red. A green state indicates stable conditions with effective controls. Yellow signals elevated risk requiring timely attention and local correction. Red indicates a critical condition where immediate intervention is warranted.

Behind each color is a clear explanation of the dominant risk drivers and a prioritized set of suggested corrective actions. This allows supervisors and EHS professionals to respond quickly and decisively, without sorting through dashboards or debating which metric matters most.


Explainability and Learning by Design

A key design principle of the system is explainability. The model is intended to support professional judgment, not replace it. Users can see which factors are driving risk, how those factors have changed over time, and whether previous interventions successfully reduced exposure.

As new data is collected, the system recalibrates. When corrective actions reduce risk, the model learns from that success. When new patterns emerge, it adapts. Over time, this creates a feedback loop that strengthens both predictive accuracy and organizational learning.


Supporting the Human Side of Safety

Equally important is how the system interacts with people. The intent is not surveillance, but early recognition and prevention. Feedback mechanisms allow users to validate or challenge AI observations, improving trust and model accuracy simultaneously.

When elevated risk is detected, the system can also trigger targeted coaching prompts or short, task-specific learning reminders. In this way, technology reinforces safe behavior at the moment it matters most, supporting—not replacing—the conversations that are central to effective safety leadership.


The Potential Benefits of Predictive Injury Prevention

When implemented effectively, the benefits of this approach would significant. Organizations will gain continuous risk awareness instead of periodic snapshots, enabling earlier and more precise intervention. Injuries can be reduced by addressing exposure before harm occurs rather than after.

Safety, maintenance, and operations gain a shared, data-driven view of system health, improving coordination and reducing friction between priorities. Leaders gain predictive insight instead of retrospective explanation, allowing resources to be focused where they will have the greatest impact. Most importantly, employees benefit from safer, more stable work environments where risks are recognized and controlled before someone gets hurt.

Predictive analytics do not replace the fundamentals of EHS practice—they strengthen them. By combining modern analytical tools with established safety principles, this approach offers a practical path toward fewer surprises, faster learning, and more reliable protection of people in complex manufacturing systems.

Posted in AI, Artificial Intelligence, EHS Management, Injury Prevention, Machine Learning, psychological-safety | Tagged , , , , , , , , , , , , | Leave a comment

A Discussion About the Future of OSH with AI+Humans

I recently had the privilege of joining ISHN Magazine for a thought-provoking conversation on how AI is reshaping the work of Environmental, Health, and Safety professionals. Dave Johnson—longtime leader and past editor of ISHN—reached out after reading my article on building a “digital twin” of myself, and asked if I’d explore the implications of AI for the future of safety and work on his podcast.

For me, this wasn’t just another interview; it felt like a full-circle moment. As a young safety professional, I studied ISHN Magazine to absorb the wisdom of leaders who had spent decades in the field. Those pages were my classroom, my compass, and my early window into what excellence looked like. Now, decades into my own career, sitting across from Dave and talking about the frontier of AI, I couldn’t help but reflect on how far our profession—and the world around it—has come.

What strikes me most today is the paradox of experience: the more years I accumulate, the more I realize how much remains undiscovered. Every week still brings a new lesson, a new insight, a new perspective. And with AI entering the EHS landscape, that learning curve isn’t just continuing—it’s accelerating. We’re standing at the threshold of an era where human expertise and machine intelligence don’t compete; they amplify one another. The velocity of knowledge is about to shift from incremental to exponential.

AI won’t replace the human essence of what we do—but it will expose us to patterns we’ve never seen, risks we’ve never quantified, and possibilities we’ve never imagined. It challenges us not just to adapt, but to reinvent the way we think, decide, and lead. That’s where the real opportunity lies.

With that spirit in mind, Dave and I dove into a candid conversation about the present and future of our profession—where it’s headed, what might disrupt it next, and how we can shape a safer, smarter world of work.

Stay tuned. This journey is only beginning…

Click here to listen to the podcast: https://www.ishn.com/media/podcasts/5177-all-things-safety/play/140-an-ehs-pro-clones-himself-with-ai

Posted in Artificial Intelligence, Career Skills, Technical Skills, Uncategorized | Tagged , , , , , , , | Leave a comment