Artificial Intelligence (AI) is becoming ubiquitous and a commonly applied data processing method. Applying AI for IT Operations (AIOps) is a type of AI-powered solution for infrastructure and operations (I&O) teams. AIOps ensures the success of organizational initiatives that use AI to improve customer experiences, marketing and sales effectiveness, business operations, etc.
Data is essential for Artificial Intelligence (AI). In fact, data is what fuels analytics and Machine Learning (ML), which are the foundational data processing methods of AI-powered solutions. The data that AIOps requires must come from IT-related sources such as application logs, server logs, network logs and telemetry, and the IT help desk.
Network data encompasses a lot of details about how end-users and IoT devices experience connectivity, applications, and services. This inherently broad view of experiences makes network data especially essential for AIOps. Network data is also a good starting point for an AIOps program. It is also essential that all data, including network data be consistent, complete, and accurate, which means you will need a reliable mechanism to connect sources of data to your AIOps solutions. The entire AIOps-ready product portfolio offered by cPacket meets these requirements.
Here is a succinct definition of how several foundational AI technologies turbocharge IT operations:
“Increases IT operational accuracy, agility, automation and overall efficiency by consolidating alerts and providing actionable insights using Analytics and Machine Learning applied to IT infrastructure data.”
Focus – by consolidating events, issues, and help desk tickets into a single problem and alert. This is a very helpful because it eliminates alert noise and the time spent triaging many alerts to get to the point of understanding what is going on. It also reduces the stress of being overwhelmed by a barrage of alerts
Actionable Insights with Context – drives heightened actions and automation. It gives humans the ability to act decisively with greater accuracy, agility, and confidence. It enables automation to adapt to situations, make decisions in real-time, and do more. Actionable insights intelligently assist and automate by:
- Detecting Unusual Behavior (aka Anomaly Detection) – that is always things you want to know about and act on quickly. By way of examples – network throughput that suddenly drops to an unacceptable level is unusual and in need of attention. East-west traffic that should not be happening is also unusual and worthy of investigation because it could be malicious activity.
- Detecting Patterns and Trends – either gives you peace-of-mind when things are on a good trajectory or, conversely, enables proactivity.
- Predicting Likely Events – that is another way to act proactively to ensure operational and business continuity as well as ideal experiences.
- Determining Root Causes –a huge time-saver because when you are spoon-fed why something happened you can get right to remediation, skipping past what could be a lengthy troubleshooting investigation and averting or minimizing the adverse consequences.
- Providing Context – that complements root cause determination (“the why”) what happened, is happening, and is likely to happen. The added context includes where and when. Combing what, where, when, and why encompasses Situational Intelligence.
- Providing Corrective Recommendations (aka Prescriptive Recommendations) – that also save time as well as reduce risk and stress. Specific actions replace guesswork with best practices so you can act quickly and confidently when situations arise.
The field of AIOps category is an evolution of automating tasks for IT, which is something that has been around for a long time. Automation is becoming more advanced and “intelligent” by applying AI technologies, specifically big data, correlation, advanced analytics, and machine learning. As a solution category, AIOps is tracked and measured by leading technology market analysts.
The overall market, that includes AIOPs platforms and the supporting ecosystem (the latter includes cPacket Networks) is large and growing. Analyst estimates for combined annual growth rate (CAGR) are over 25%; some are reporting over 30%. So, if you have not yet adopted AIOps, it is likely that an AIOps initiative is in your future.
As an IT/I&O professional, you have many things to constantly juggle – designing, installing, provisioning, maintaining, securing, and assuring all your infrastructure while also providing technical support.
These service and experience expectations are constantly increasing. While doing this and ensuring that your IT infrastructure has the capacity to meet increasing demand and evolving use cases, you must perform ongoing capacity planning and refreshes of aging equipment. You are also increasingly being called upon to work on projects that contribute to business value and competitive advantages, whether your organization is digital native or undergoing digital transformation.
All these roles and responsibilities are a lot to juggle, especially if your budgets and resources are insufficient and not growing at the same rate as your workload. It is no surprise that, like many IT/I&O teams, you and the entire I&O team is overwhelmed.
AIOps Lends a Helping Hand
Going back to Gartner’s definition, AIOps leverages not just big data, but big and fast data from as many sources as possible – giving you insights and context to understand your network and other infrastructure, and hence take the best possible actions in any situation. With deeper understanding of what’s happening and what to do, you gain intelligent assistance and intelligent automation that combine to give you a helping hand that empowers you to act with greater accuracy and agility, reducing guesswork and stress, giving you time back in your day, and reducing alerts when you are on-call.
When you are more efficient and effective, the entire I&O organization including the Chief Information Officer (CIO) and Chief Information Security Officer (CISO) are too.
While we may not like to admit it, with the constant rush to keep things running 24 hours a day, remedies are too often based on guesswork versus best practices. This is an area where AI can understand the situation and quickly recommend the ideal resolution based on proven and highly effective best practices. This is intelligent assistance!
Insights, with context from correlated and analyzed from all possible data sources, is free of blind spots and infrastructure bias and provides the following information that assists your troubleshooting and remediation with rich situational intelligence.
|What is happening||Where|
|What is likely to happen||Why|
|What to do (e.g., recommendations for how to resolve and prevent)|
Taking intelligent assistance to the next level, advanced analytics and machine learning are leveraged to predict unplanned and undesired events that are likely to occur. Such predictions empower you to proactively avert such events and their consequences.
The situational intelligence, plus corrective recommendations, focus you on the most important experience/performance problem or network security threat to address right now. Having this focus eliminates alert noise and fatigue, and greatly aides your efficiency to maximize secure digital experiences.
It is not possible to keep up with every problem and unplanned event that occurs along with all the corresponding alerts. This is a case where automation, especially intelligent automation that makes on-the-fly decisions based on data and analytics, provides great value.
Insights with context from correlated and analyzed data also makes it possible for real-time decisions to facilitate automated optimizations and corrections; in other words – autonomous operation of your network. Automation can also baseline performance to compare key performance metrics after changes have been made to determine whether the result is an improvement. The before and after assessment makes it possible to decide whether to keep or rollback the changes. This process of correcting, assessing, and keeping or reverting is called “closed-loop” automation.
Real-time decision-making and closed-loop remediation using baselining is truly intelligent automation!
Focus Versus Fatigue
Ongoing problems and failures of the infrastructure components create alert noise and alert fatigue that can be overwhelming and cause extended workdays. When faced with multiple alerts you need to triage the problem to work on, which requires assessing the severity and impact of each alert. This is a key benefit of AIOps – providing focus by assessing the situation to present you with the most important problem to address at any given moment. This reduces your overload so you can work efficiently with less stress.
The Trouble with Trouble Tickets
IT most often becomes aware of a problem when a trouble ticket is submitted to the help desk. Tickets only identify a problem but not the cause. They also often blame the network because it is the doorway to what end-users are trying to reach. Invariably help desk tickets initiate a laborious investigation and troubleshooting effort to identify the root cause. This common workflow is fraught with inefficiency and frustration for all stakeholders, which is why IT service organizations and providers of help desk software employ many methods to avert or “deflect” trouble tickets. Conversely, AIOps does a much better and more efficient job of detecting problems as well as diagnosing them, troubleshooting them, and identifying the root cause, which greatly speeds up remediation. Some remedies can even be automated, which truly gives you time back in your day.
Lowering Mean Time to Resolution (MTTR)
One of the key performance indicators of IT’s performance is MTTR; and lower is better. Lowering the number of trouble tickets directly lowers MTTR and reduces your overall support burden. The less quantitative metric of mean time to innocence(MTTI). MTTI arises because the network is often blamed when people and processes experience problems, putting an undue burden on the NetOps team. AIOps identifies root causes and therefore reduces the blame game and more accurately and quickly redistributes remediation work to the appropriate team – AppOps, CloudOps, or SecOps.
How AI and AIOps Work
IT systems generate continuous log and telemetry data that tell a story about what happened and is happening throughout the infrastructure. AIOps ingests that data, reconstructs the stories, and provides actionable insights for humans and automated processes. Humans can now act with greater accuracy, agility, and confidence. Automation can now do more.
Decades ago, business operations were made more efficient using computers, databases, and client applications. People would describe this as being “computerized,” which had a wow factor. Now, with computerization being commonplace, AI is the new wow factor, however at a fundamental level Analytics and Machine Learning are just modern methods of data processing. The difference is that data generated from computerization is leveraged to turbocharge the outcomes and benefits that “AI-powered” application software can provide.
The following very simple and high-level model shows how AI and AIOps work. In the most simplistic sense, Analytics and Machine Learning convert data to desired outcomes. Pertaining to IT Operations, AIOps solutions convert data to the situational intelligence, intelligent assistance, and intelligent automation previously discussed. Depending on the problem being solved one or several types of Analytics and/or Machine Learning models will be used.
Data flows from its sources through an extract, transform, and load (ETL) pipeline to normalize it so that it can be processed by analytics and machine learning algorithms to surface actionable insights that provide intelligent assistance and automation.
Data Quality Matters!
Acquiring data is much easier said than done (in fact it is quite difficult). There are many challenges and factors to address that include data engineering, governance, and provenance. Keeping this topic at a high-level, you will need to ensure that the data you are using has these characteristics:
- Consistency and reliability (i.e., “trustworthy”)
- Completeness and lossless
- Precision and resolution to meet your needs
Data quality matters… a lot! So, your data acquisition implementation is as essential as the data itself. As with all data processing solutions, the garbage-in-garbage-out principal applies. If you want your AIOps to deliver, you need to ensure that it receives data with the characteristics listed above.
Blind spots and infrastructure bias are problems that result from too few sources of data and data that is inconsistent and/or complete. This is often the case for data that comes from only the infrastructure versus a reliable, unbiased, and objective external source such as Network Test Access Points (TAP) and Network Packet Brokers (NPB) (such as the cTap and cVu products from cPacket, respectively).
The Essentiality of Network Data for AIOps
Data is the fuel for analytics and machine learning, and hence for creating AI-powered outcomes. Data from all possible IT domains, as shown in the diagram above, is essential for AIOps. Of these, network data (i.e., from network packets), is essential. It can be argued that network data is the most important data source for AIOps. This is because it encompasses a lot of information about status, performance, and security of IT infrastructure and the experiences that end-user and IoT devices have with it when using data, apps, services, and when interconnecting with each other. In other words, network data is a proxy for experiences, performance, and security because it provides understandings of:
- Problems and their frequency of occurrence
- End-user and IoT traffic flows, and traffic patterns/trends
- Cyber threats and malicious and suspicious activity
- Application performance and corresponding user/customer experiences
Network-Centric AIOps is a Good Starting Point
What we have learned so far is that the network is often initially blamed for poor experiences, and network data includes information about status, performance, and security of IT infrastructure and people’s experiences with it when using data, apps and services. If you have not yet begun an AIOps initiative, a good starting point is a “crawl-walk-run” plan using only network data at the outset, then adding additional sources in subsequent phases. This approach will quickly unburden NetOps from the MTTI syndrome. This approach will also provide immediate benefit throughout the IT/I&O department.
Another advantage of starting with network data is that it can simplify your initiative, ensuring early success. Starting with only network data limits the scope and inherent difficulties of data acquisition, data engineering, and data pipelining so your “crawl-walk-run” implementation has a higher likelihood of early success. Another method for early success is to acquire network data by tapping into the network packets using independent network data capture solutions from a single vendor which can help eliminate challenges with data consistency, provenance, wrangling, pipelining, etc. Starting with a source of data that you can trust is a winning strategy.
AIOps is an evolution of automation for IT Operations that uses analytics and machine learning to help you be more efficient amid ever-increasing complexities and workload. AIOps provides value by focusing on the most important issue to address at any given moment, reducing alert noise and fatigue as well as work-related stress. AIOps also heightens automation of IT processes. This combination of intelligent assistance and automation gives you greater efficiency and time back in your day. All of this is made possible by leveraging data, especially network data. These benefits also ensure great experiences, productivity, profitability, business continuity, and competitive advantage, customer satisfaction (and hence loyalty), job satisfaction – especially for the help desk, I&O personnel.
Network data is a proxy for much of what happens and is experienced with IT systems. Starting out this way broadly supports all the IT/I&O domains: the help desk, AppOps, DevOps, SecOps, SRE, and of course NetOps. Starting an AIOps initiative with a network-centric approach is a way to ensure early and ongoing success, especially because risk and complexity of data acquisition are alleviated using solutions such as Network Packet Brokers and Network TAPs that provide data that is consistent, reliable, complete, accurate, and precise.
Now that you understand that high-quality network data you can trust is essential, we encourage you to evaluate cPacket’s AIOps-ready product portfolio. Our network monitoring and data acquisition solutions process, store, analyze and route your network data for AIOps use. The consistently simple and tightly integrated product portfolio includes: cTap® Series Network Test Access Points, cVu® Series Network Packet Broker+, cStor® Series Packet Capture devices, cProbe™ Series Flow Export devices, and cClear® Series Data Correlation and Analysis engines.
About The Author
Ron Stein – Director of Product Marketing at cPacket Networks. Ron has marketed AIOps and performance assurance solutions for networking at Aruba Networks. His domain expertise include AI, ML, Advanced Analytics, cloud, IoT and RFID with industry experience that spans healthcare, financial services, utilities, public safety, smart cities and IT service management.