Operations Insight customised Alarm

Karthic
2 min readSep 12, 2023

--

Oracle Cloud Infrastructure (OCI) Operations Insights enables administrators to uncover performance issues, forecast consumption, and plan capacity using machine-learning based analytics on historical and SQL data.

We will take operation insights as an example for this blog . But the workflow can be applied for any services to customise alarm notification.

When an alarm is triggered for operation insights metric NumsqlsNeedingAttention it does not have the sqlidentifier in the message. We will use the below functionality to resolve that.

Create a OCI function with sample python code using cloudshell . One of my colleague noticed that the function fails when the application is using ARM platform so please use x86 while creating the application.

import io
from datetime import datetime,timedelta
import oci
import json

def opsi_sqlstatistics(compartment_id,resource_type,start_time):
try:
signer = oci.auth.signers.get_resource_principals_signer()
opsi_client = oci.opsi.OperationsInsightsClient({},signer=signer)
summarize_sql_statistics = opsi_client.summarize_sql_statistics(
compartment_id=compartment_id,
database_type=[resource_type],
time_interval_start=datetime.strptime(
start_time,
"%Y-%m-%dT%H:%M:%S.%fZ")
)
str_summary = str(summarize_sql_statistics.data.items)
json_summary = json.loads(str_summary)
sql_identifier_list = []
for i in range(len(json_summary)):
sql_identifier_list.append(json_summary[i]['sql_identifier'])
return sql_identifier_list
except Exception as opsi_exception:
print(opsi_exception)


def ons_publish(compartment_id=None, **kwargs):
TWO_HOUR_BEFORE = (datetime.utcnow() - timedelta(minutes=120)).isoformat() + 'Z'
try:
signer = oci.auth.signers.get_resource_principals_signer()
ons_client = oci.ons.NotificationDataPlaneClient({},signer=signer)
resource_type = kwargs.get('resource_type')
start_time = TWO_HOUR_BEFORE
topic_id = kwargs.get('topic_id')
resource_display_name = kwargs.get('resource_display_name')

sql_list = opsi_sqlstatistics(compartment_id, resource_type,start_time)
body = f'SQL Identifier having issues are {sql_list}'
print("Publishing message to ONS topic")
ons_client.publish_message(
topic_id=topic_id,
message_details=oci.ons.models.MessageDetails(
body=body,
title=f"SQL Degradation for {resource_display_name}"))
except Exception as ons_exception:
print(ons_exception)

def handler(ctx, data: io.BytesIO = None):
cfg = dict(ctx.Config())

# fetch details from function config
compartment_id = cfg['compartment_id']
topic_id = cfg['topic_id']

try:
body = json.loads(data.getvalue())
alarm_type = body.get("type")

if alarm_type in ["OK_TO_FIRING"]:
resource_display_name = body["alarmMetaData"][0]["dimensions"][0]["resourceDisplayName"]
resource_type = body["alarmMetaData"][0]["dimensions"][0]["resourceType"]

ons_publish(compartment_id=compartment_id,
resource_display_name=resource_display_name, topic_id=topic_id,resource_type=resource_type)
else:
print("Alarm type is not in OK_TO_FIRING state")
except (Exception, ValueError) as ex:
print(ex)

Create a notification topic with function created above as a subscription.

Create another topic with email or other preferred subscription available to receive message. Note down the topic ocid and configure this as a value to topic_id and also configure compartment_id for the function as well.

Operations insight have many metrics available . We will focus on the metric NumsqlsNeedingAttention for this example

Create an alarm when the metric NumsqlsNeedingAttention is greater than zero

When the alarm is triggered it will invoke the function configured.You can enable function logs for troubleshooting if needed.
The function will invoke the summarizesqlstatistics api of operation insights and the response json is filtered to only send sql_identifier as a message to the ons topic .

You can use this method to invoke any API and customise the message sent to ONS notification if needed.

--

--

No responses yet