If you are familiar with traditional enterprise or IT cyber security frameworks, then logging and event management capabilities through a technology called SIEM should be familiar.
NIST, COBIT, ISO, and even PCI refer to it as a necessary capability at some level. However, within the Operational Technology (OT) environment, we need to answer the following questions about Security Information and Event Management (SIEM) technology:
- What is a SIEM’s function and purpose (regardless of IT or OT)?
- What is the difference between an IT and an OT SIEM?
- What factors drive the need for an OT SIEM?
- What is a reference architecture for an OT SIEM?
What is a SIEM?
Security Information and Event Management is a system which aggregates, parses, and analyzes various sources of cyber information (security and otherwise) for storage, alerts, response, and reporting. Alerts, alarms, events, and baselines are received and acted upon by analysts, automated systems, and security teams to identify a cyber risk and act on it.
A SIEM typically has several key functions:
Figure 1: Example SIEM concept flow
The SIEM creates a centralized system to receive, unifies and parses data, organizes it into short-term or long-term usage, and provides an alarm system when a predefined trigger or threshold is reached.
These triggers or thresholds are inherited from the generating application or system, leverage machine learning, statistics or heuristics, but also human or framework defined use cases.
Ultimately, a SIEM’s purpose is to receive messages (often in the format of Syslog, and Windows Event formats), make them available for cyber security functions, and alert upon them so security teams can effectively execute defined procedures and processes to manage the threat. This is best illustrated as an example:
Imagine you have a small or medium sized business that has a convergent infrastructure. There are Windows systems used for accounts payable, as well as processing and distributing the shop floor’s orders and related tasks. Both of those functions are critical, but one is IT-related and the other is OT-related.
Now let us imagine that the individual at the helm of the accounts payable computer opens a phishing email, an attacker drops malware onto that system, and fortunately, that system’s anti-virus detects it while generating an alert.
This is a simple example, but in the case of commodity malware, and an organization managing their resources, it is advantageous to have their systems forward logs to a secure system for analytics, dispatch work and training. In this case, the malware was caught (e.g., no massive ransomware attack), but the accounts payable person may need phishing awareness training or a visit from their manager.
In summary, a SIEM provides value by looking for behaviors and alarms in data generated from multiple cyber security investments/technologies and creating an exception for a resource to take action upon.
What is the difference between an IT and an OT SIEM?
There is a good deal of debate around the need for an OT security operations center (SOC) to monitor tune and use the SIEM. There is a separate question on the value of an OT SIEM and what the difference would be from and IT SIEM.
4 SIEM Differences Between IT and OT
- Data: Data in an OT SIEM should include process information to improve insights of alerts and reduce false positives from operating changes and identify potential process issues.
- Analysis: IT typically deals with cyber threats that affect Confidentiality-Integrity-Availability, while OT works with Safety-Reliability-Productivity. As you can imagine based on the convergence and inter-connected nature of today’s networks, threats and systems/infrastructure overlap in many regards. Engineers and site operators must quickly tie asset information together with an event to triage a situation and execute a process (e.g., to press the big red shutdown button vs. just re-image a system).
- Visibility: SIEM functions are monitored in a central data warehouse which is housed in a corporate data center. Visibility is limited to the security operations center. In OT, however, robust analysis and incident response requires local OT technicians to have access to the data and analysis to identify true root causes.
- ROI: The IT SIEM is a pure security tool, generating value from reduced risk of cyber-attack. An OT SIEM acts as an operating tool, improving uptime from unplanned outages due to anomalous patterns which have no malicious actor, but instead come from defective or problematic. This includes use cases such as:
- Predictive maintenance and resource monitoring
- Suddenly missing (offline) systems
- Transient asset spotting (potentially even rogue devices)
- Security alarms for traditional cyber threats or unauthorized accesses
- Unexpected system access or erroneous system behaviors
- Process failures, shutdown alerts, or manual alarm silencing
- Regulatory and compliance requirements
Regardless of the origin, there are overlapping SIEM use cases and cyber threats such as commodity malware, but the impacts and events affecting either side of the spectrum (IT vs. OT) are different the farther you traverse in either direction.
Figure 2: The nature of the difference
IT frequently faces threats from malware, phishing, data disclosures/breaches, and a variety of threats delivered straight from the Internet. For OT environments, threats are the compromise of specialized process control equipment, safety systems, and production lines. For both IT and OT, there are varying skill sets with different priorities based on the type of work performed and the events generated.
In IT, if an alert stating X user is doing Y, or Z malware alert has gone off, cyber security handling for those situations is reasonably understood. But in OT, a variety of proprietary vendors and technologies span decades, resulting in an overwhelming amount of alarms or alerts for teams focused on keeping a facility operational (and safe).
What factors drive the need for an OT SIEM?
There is often a need for both IT and OT SIEM within one environment. In fact, in almost all industrial cyber attacks, the actor pivoted from IT into OT by first gaining a foothold on the IT side and traversing protections existing between the two environments.
A single view to oversee asset management, reporting, and SIEM functionality is required for effective cyber risk reduction across IT and OT.
The question that remains is how best to achieve that integrated view. When does it make sense to have an OT SIEM that provides specific data aggregation, analysis, incident response and reporting for OT and then forwards critical alerts and information into the enterprise SOC?
Those are questions where the approach and strategy are specific to the organization. For some organizations, a single SIEM, with no specific OT functionality makes sense. For others, having a robust OT SIEM will be critical.
4 factors driving the need for SIEM
- Complexity of OT process: Companies in the power sector, oil refineries, water treatment, etc. operate complex physical processes that require deep experience in the industrial control systems operations. To identify and analyze risks and response, the OT personnel needs access to an OT SIEM to provide the detailed information that only they will understand in detail. The more complex the process, the more value derived from an OT SIEM.
- Criticality of OT process: Many industrial organizations’ OT processes are the lifeblood of their organizations. Downtime, whether from a malicious attack or unintentional device disruption, costs a lot of money. As a result, monitoring for process variabilities, controls device behavior anomalies due to potential failure, new devices that may cause disruption, etc. add significant value. An OT SIEM provides this valuable information.
- Network access/segmentation of OT infrastructure: The more separated the OT network from IT, the more valuable the OT SIEM becomes. As the dependence on local operations personnel to take actions increases, so does the value of an OT SIEM.
- Compliance and regulations: In some industries such as the North American power industry, cyber security regulations such as NERC CIP require detailed OT data. This data may not make sense to include in an IT SIEM as they are more compliance oriented than providing the security analytics a SOC might use.
Challenges with issuing work orders from enterprise IT SIEM
The answer is not as straight-forward as it may seem on the surface, and the complexities are derived from the amount of legacy equipment, and amount of process control and regulations in the environment. After all, who would want to pollute their enterprise environment with X compliance bureaucracy & overhead? Probably few.
Figure 3: Example IT SIEM being used for OT
In the IT SIEM figure, it appears that it might work. The problem though is that most enterprise solutions do not have access to a number of important sources, but rather receive an alert, determine where to assign and send it to the best of a traditional IT analyst’s ability.
With minimal information or context, it is “thrown over the fence to OT.” Assuming there is a ticket or work system linking the two domains and acting as the IT/OT convergent glue, the work lands on the OT individual(s) or team, and they attempt to triage an often trivial one line message such as:
<date> Cryptographic Certificate Expired UseCase Triggered on Asset ABC – Remediate, HIGH priority.
If the OT receiver is lucky, guidance is in place with appropriate procedures for the environment. Unfortunately, this isn’t enough information for even IT to make sense of, and the priority and remediation are a challenge due to operational constraints in OT.
In other words, unless the alert is provided with adequate context, and supporting information, this approach of using a SIEM by itself begs for complete asset visibility and adequate expertise for the asset or deployed environment.
Let’s take a common occurrence in IT/Enterprise land: out of date SSL/TLS certificates. In the enterprise domain, any alert, report, alarm that stated that a system has expired certificates will set off a flurry of events such as:
- An event or vulnerability report is received and ingested by the IT SIEM
- An analyst within their SOC investigates and issues a ticket
- The ticket may be assigned, and a new certificate is issued without a second thought
Again, this is a very simple example, but in OT, issuing a certificate warning is not a direct cyber security threat. In addition, the following conditions need to be understood before re-issuing a certificate:
- Is the device, and/or facility facing a direct and impactful risk due to the certificate expiration? If no, and other controls exist, other work may have a higher priority.
- Will revoking and installing a certificate result in downtime or a loss of connectivity? Is this allowed? If it has little impact, it may be left alone or schedule appropriately to a window of downtime or low risk of impact.
- Is the warning happening on a device that has multiple levels of remediating controls? (e.g., the device is isolated, segmented, and monitored).
- Is the device in a position where an expired certificate is on an asset where there are additional implications, requires tribal knowledge, or there are other stipulations? (e.g., mutual authentication which would amplify any changes; N Devices affected * N changes).
- Does a certificate expiry really mean a lapse in achieved security? If it is not compromised or revoked, then it might be okay for the moment (assuming it is within the organization’s compliance and risk thresholds).
There are additional concerns, but these are the top reasons why an OT SIEM is important. It must be manned by individuals that know their environments vs. teams in a completely different division (although for a convergent infrastructure, multiple eyes on is not a bad idea). An alert does not specifically equate to an issue that needs immediate changes, and it also requires visibility and presence by the right individuals in the OT environment.
What is a reference architecture for an OT SIEM?
As with any theoretical concept, how does one get the most value out of it or determine if it works in the real world vs. theoretical exercises? The idea is that:
- IT is left to its own devices and technologies as befit to them
- There is a shared ticket queue between the worlds so multiple eyes track events, posterity or mutual interest
- A shared risk register and change control board is present (hint, real world issues where IT owns an edge router, makes changes, and connectivity is lost to OT)
- OT investigates and tracks relevant events with complete visibility on asset info, logs, etc. but in a way that is safe for OT teams
- OT is in control of the actual application of the changes, but also passes alarms bi-directionally (IT to OT, and OT to IT where relevant)
- OT leverages IT technologies, or even their methodologies, for cyber security where applicable, but applies their own finesse
- OT creates their own monitoring use cases in addition to those in traditional operational infrastructure (e.g., historian or HMIs), but for those that relate to diagnostics, networking, system resources etc.
- OT efficiently actions fixes and remediates vulnerabilities or events as they arise
- Information flows easily to IT and OT so risks are tracked effectively and aligned with business risks or motivations
Figure 4: Example separated IT & OT SIEM architecture
OT SIEM vs. IT SIEM
An OT SIEM differs from IT by aggregating, analyzing and visualizing a different set of data with a different set of lenses. The result is a set of security and reliability insights that are not available from traditional IT SIEMs.
There is no absolute answer that every industrial company needs one, but several factors drive increased value from a separate OT SIEM. An OT SIEM acts as a clearing house for the most critical alerts and events to forward to the IT SIEM for an enterprise-wide view that is critical to IT OT converged security.
An OT SIEM must be tied into the tools used within an OT environment. If it is not, in an action-focused environment, the application of remediations may be missed, or worse, non-relevant alarms cannot be tuned, and vital security events may go unnoticed. It’s about getting the most out of your investments and multiplying their risk reduction and effectiveness.
Read more about Verve’s Security Information & Event Management solution: