How IBM uses AIops and observability to address proactive incidents

As the COVID19 pandemic has accelerated digital transformation and demonstrated the need for improved IT infrastructure, businesses have quickly moved to the cloud and implemented cloud computing strategies. stronger. Result? Multi-cloud services continue to grow across the enterprise as organizations increasingly recognize the challenges of leaving their business data entirely to vendors. Gartner predicts that nearly 75% of medium and large enterprises will use a multi-cloud and/or hybrid strategy by 2021, and it’s already happening.

In fact, in 2018, a survey by the IBM Institute for Enterprise Value of 1,106 business and technology leaders found that 85% of companies used multi-cloud systems for information management. them. Many infrastructures and operations (I&O) organizations are now ” adjusting their strategies to leverage cloud capabilities for a future of integrated solutions, leading to AI, IoT, and edge computing. “, according to Gartner. So multi-audio is here to survive.

However, multi-audio services also present some challenges. Flexera’s 2020 Multi-Cloud Challenges Survey found “multi-cloud management topped the list”, along with issues with security, cloud cost management, and lack of resources / expertise.

Dinesh Nirmal, general manager of IBM Automation, how AIops and observability work together, its business benefits and about IBM’s recent updates to IBM Cloud Pak for Watson AIops software.

Analysis paralysis from too much data

IBM’s response to the need to better manage multicloud environments is to enable the interplay between AIops and observability. While this seems straightforward, there’s a big problem with the overabundance of data in today’s enterprise ecosystem. There are so many data sources that enterprise leaders are literally swimming in massive data lakes. More critical is the reality that It’s often difficult to convert this data into actionable insights.

That’s where observability, actionable observability, or application resource management comes in,

A [major] pillar in IT is all about incident avoidance and incident resolution that’s where AI plays a huge role, as all this data comes through observability, helping you to correlate it using AI, this can help to look for anomalies within alerts, events, and logs to say, ‘we’re seeing some anomalies and based on the past behavior, it looks like it could lead to this problem.”                                                                                                                                                                                                                                                                                                                                                                   Dinesh Nirmal, general manager of IBM Automation

organizations need to be able to observe and know their entire IT infrastructure whether it’s hybrid cloud, multicloud, or behind the firewall to ensure application performance management (APM). IBM uses actionable observability to bring in data from across all the APM vendors to ensure applications are always running successfully.

Combining AIops and observability

AIops a term first coined by Gartner is the application of big data and machine learning (ML) to automate processes and operations, ensuring a correlation with the required speed for businesses today.  When this is combined with observability, a system can be thoroughly analyzed and the data pipeline can be seen and appreciated wholly. A Forrester study (commissioned by IBM) found that combining AIops and observability can reduce customer-facing outages by up to 50% and mean time to recovery (MTTR) by up to 95% for enterprises.

Managing multicloud is difficult. Steve Hershkowitz, chief revenue officer at Virtana, notes in an article that the attractive features of the cloud are the same ones that “make it exceedingly complex to manage on an ongoing basis.”

One of the key things multicloud computing rests on is operational control the ability of organizations to monitor their entire IT systems but this isn’t often easy to do. With the sophistication of multicloud environments, there is an increasing need to improve observability in IT systems for better analytics and optimal performance. More than ever, organizations need effective AIops to uncomplicate their cloud environments so that they can effectively design, build, and manage applications in the cloud.

However, another issue often arises: While data is the fuel for AIops, several challenges with the AIops data pipeline can lead to ineffective AIops. That’s where observability comes in, helping to solve issues like AI bias along the AIops data pipeline. Unifying AIops and observability enables enterprises to understand why problems happen, see other similarly related problems, discover the best ways to fix the problems, and provide insights on how to stop the problems from happening in the first place.

AI for incident detection and management

The IBM approach to combining AIops and observability for providing actionable insights is embedded in a new version of its IBM Cloud Pak for Watson AIops software, which the company recently announced to help enterprises proactively resolve incidents by providing a new “stories and alerts” dashboard.

The solution is an end-to-end approach that requires cross-field integration. To get the full scope, the version was developed with Instana (which IBM acquired in 2020 for observability data) and at present can onboard data from Turbonomic (which IBM acquired in 2021 for applications’ resource utilization). The full-stack application allows IT managers and site reliability engineers (SREs) to obtain a comprehensive view of how their IT environments are performing.

It combines the monitoring and event data from different sources, including Instana and Turbonomic, to learn the normal behavior and the baseline characteristics of applications. The software uses AI to quickly detect what the abnormal behavior in production applications is, then uses automation to take corrective action, resolve detected issues and reduce manual processes.

The Forrester study showed that organizations who deployed IBM Cloud Pak for Watson AIops eliminated 80% of the time spent remediating false-positive incidents. It also increased visibility into application performance, reducing the time to resolve issues by 75%.

While Nirmal agreed there are other players like ServiceNow in the space, IBM has a huge advantage because of its longtime customers and because the company has the skills and knowledge to build the right AI models using data it’s been working with for decades.

All-around automation using AI

Nirmal also weighed in on training the AI models for decision-making: “Predictability is driven by the data you feed into the AI, the more trusted, clean data that you can give, the better accuracy you get.” In addition, organizations need to make sure they have good data to train their models, he explained.” Not only that, even after you train it, you must continuously retrain it because the data is changing every day, every minute, every hour.”

IBM notes the combination of observability and AIops can have a major impact on an organization’s bottom line. The company claims this staggering impact is why organizations like T-Mobile, Electrolux, Carthartt and Taiwan’s National Center for High-performance Computing are rapidly adopting its solutions.

Gartner notes one of the most compelling technology trends for 2022 is hyper-automation, which is often a result of AIops and observability. Nirmal pointed out that outages bring productivity, optimization, and other problems, so one of the most prevalent themes in the IT industry is all-around automation using AI.

Unifying AIops and observability helps get value from multicloud 

Virtana’s State of multicloud Report 2022, which surveyed 360 CIOs and IT leaders in the US and UK, notes multicloud challenges will continue to grow as adoption increases. Rather than take a reactive approach, enterprise IT leaders should be more proactive. As more enterprises migrate to a multicloud approach, it’s key to prioritize managing those multicloud environments effectively.

Nirmal believes enterprise IT decision-makers want to get benefits from multicloud in three critical areas: optimization, productivity, and product costs. However, automation is what gives them the benefit across those three pillars. Unifying AIops and observability, he said, is an effective way to ensure enterprises are meaningfully automating processes, quickly detecting incidents along production pipelines and getting the best value from multicloud solutions.