Cost anomaly detection in Oracle Cloud

7 min readMay 27, 2024

In Oracle Cloud we have the Cost Management Service to know about the utilisation. There is a new FinOps Overview page available as well.

In this blog we will see a script which can be used to find whether the consumption is consistent or not by checking yesterday’s consumption with previous day consumption on daily granularity.


import oci
from datetime import datetime, timedelta

config = oci.config.from_file('<config_file_path>')

usage_api_client = oci.usage_api.UsageapiClient(config)
tenant_id = input('Enter your tenant OCID:  ')
topic_ocid = input('Enter your topic OCID: ')


def date_formatter(day):
    return day.isoformat('T', 'milliseconds') + 'Z'


today = datetime.now().replace(microsecond=0, second=0, minute=0, hour=0)
yesterday = today - timedelta(days=1)
day_before_yesterday = today - timedelta(days=2)


def ons_publish(**kwargs):
    try:
        ons_client = oci.ons.NotificationDataPlaneClient(config)

        percent_increase = kwargs.get('increase_percent')
        alarm_body = kwargs.get('increased_services')

        print("Publishing message to ONS topic")
        ons_client.publish_message(
            topic_id=topic_ocid,
            message_details=oci.ons.models.MessageDetails(
                body=alarm_body,
                title=f"Cost increased by {percent_increase}%"))
    except Exception as ons_exception:
        print(ons_exception)


def usage_report(start_time, end_time, granularity):
    request_summarized_usages_response = usage_api_client.request_summarized_usages(
        request_summarized_usages_details=oci.usage_api.models.RequestSummarizedUsagesDetails(
            tenant_id=tenant_id,
            time_usage_started=date_formatter(start_time),
            time_usage_ended=date_formatter(end_time),
            granularity=granularity,
            group_by=["service", "compartmentName"],
            compartment_depth=1,
            is_aggregate_by_time=False,
            query_type="COST"))
    usage_data = request_summarized_usages_response.data.items

    total_usage = 0
    service_cost = []
    for cost in usage_data:
        if cost.computed_amount is not None and cost.computed_amount > 0:
            service_cost.append(
                {"Service": cost.service, "Cost": cost.computed_amount, "Compartment": cost.compartment_name})
            total_usage = total_usage + cost.computed_amount

    max_service_cost = max(service_cost, key=lambda x: x['Cost'])
    print(f'Max_Service={max_service_cost["Service"]}:{max_service_cost["Cost"]}')
    return int(total_usage), service_cost


old_total, old_service_cost = usage_report(day_before_yesterday, yesterday, "DAILY")
print('#****************************************************************')
curr_total, curr_service_cost = usage_report(yesterday, today, "DAILY")
difference = (curr_total - old_total) / curr_total
if difference < 0:
    print(f"################################################################")
    print(f"Current total is less than previous day total by {round(abs(difference * 100), 2)}%")
    print(f"################################################################")

else:
    print(f"################################################################")
    print(f"Current total is greater than previous day total by {round(abs(difference * 100), 2)}%")
    difference = f"{round(abs(difference * 100), 2)}"
    print(f"################################################################")
    # Services with increased cost
    increased_services = []
    for curr_cost in curr_service_cost:
        for old_cost in old_service_cost:
            if curr_cost['Service'] == old_cost['Service'] and curr_cost['Compartment'] == old_cost[
                'Compartment'] and int(curr_cost['Cost']) > int(old_cost['Cost']):
                increased_services.append(curr_cost)
                break

    if increased_services:
        body = "Services with Increased Cost\n"
        for service in increased_services:
            print(f"Service: {service['Service']}, Cost: {service['Cost']}, Compartment: {service['Compartment']}")
            body = body + f"{service['Service']},{service['Cost']},{service['Compartment']}\n"
        ons_publish(increase_percent=difference, increased_services=body)
    else:
        print("No services with increased cost found.")

Below comments are created by using Oracle GenAI to explain the code

This Python script appears to be an Oracle Cloud Infrastructure (OCI) automation tool that fetches cost usage data for a specific tenant and calculates the percentage change in total cost from the previous day. Additionally, it identifies services with increased costs and publishes messages to an Oracle Notification Service (ONS) topic.

Let’s break down the code step by step:

Imports and Configuration:

The code begins by importing the necessary modules from the ‘oci’ library, which is the Oracle Cloud Infrastructure SDK for Python. It also imports datetime and timedelta for time-related operations.
The ‘config’ object is created using the ‘oci.config.from_file’ function, which loads the OCI configuration from a file. This configuration includes details like API keys, region, etc.

Creating Clients:

The ‘usage_api_client’ is created using the provided configuration. This client is used to interact with the Oracle Usage API, which fetches cost and usage data.
The ‘ons_client’ is also created using the configuration. This client will be used to publish messages to an ONS topic later in the script.

Input Parameters:

The script takes two inputs from the user: ‘tenant_id’ and ‘topic_ocid’. The tenant ID represents the specific tenant for which cost data will be fetched, and ‘topic_ocid’ is the OCID of the ONS topic to which messages will be published.

Date Formatter Function:

The ‘date_formatter’ function is defined to format dates in a specific ISO format required by the Oracle Usage API. It ensures that the dates are in the right format before making API calls.
The ‘today’, ‘yesterday’, and ‘day_before_yesterday’ variables store the current date and two previous dates, respectively. These are used to fetch cost data for the last two days.

ONS Publish Function:

The ‘ons_publish’ function is defined to handle the publication of messages to the ONS topic. It takes keyword arguments ‘increase_percent’ and ‘increased_services’, which will be used in the message body.
Inside this function, a new message is created and published to the specified ONS topic using the ‘publish_message’ method of the ‘ons_client’.

Usage Report Function:

The ‘usage_report’ function fetches cost usage data from the Oracle Usage API. It takes three parameters: ‘start_time’, ‘end_time’, and ‘granularity’, which define the date range and the granularity of the usage data to be fetched.
The function sends a request to the Usage API using the ‘request_summarized_usages’ method of the ‘usage_api_client’. It specifies the tenant ID, start and end times, granularity, and other parameters in the request.
The response data is stored in ‘usage_data’, which is a list of cost usage entries.
The function then processes this data, calculates the total usage, and returns it along with a list of individual service costs.

Main Logic:

The script calls the ‘usage_report’ function twice: once for the period from ‘day_before_yesterday’ to ‘yesterday’ and once for ‘yesterday’ to ‘today’. It stores the totals and service costs for both periods.
It then calculates the percentage difference in the total cost between the two days.
If the current day’s total cost is higher than the previous day, it identifies the services whose costs have increased by comparing the ‘old_service_cost’ and ‘curr_service_cost’ lists.

Publishing to ONS:

If there are any services with increased costs, the script publishes a message to the specified ONS topic using the ‘ons_publish’ function. The message body includes the percentage increase and details of the services with higher costs.
If no services have increased costs, it prints a corresponding message.

In summary, this script automates the process of fetching Oracle Cloud cost usage data, calculates the day-to-day changes, and alerts about services with increased costs via ONS notifications. It’s a useful tool for monitoring and analyzing cloud spending.

OCI Function for automation

Let’s create a serverless function to automate this with few modification to the code. There are several ways we can invoke the function .

import oci,io
from datetime import datetime, timedelta, timezone


def date_formatter(day):
    return day.isoformat('T', 'milliseconds') + 'Z'


today = datetime.now().replace(microsecond=0, second=0, minute=0, hour=0)
yesterday = today - timedelta(days=1)
day_before_yesterday = today - timedelta(days=2)


def ons_publish(**kwargs):
    try:
        signer = oci.auth.signers.get_resource_principals_signer()
        ons_client = oci.ons.NotificationDataPlaneClient(config={},signer=signer)

        topic_ocid = kwargs.get('topic_id')
        percent_increase = kwargs.get('increase_percent')
        alarm_body = kwargs.get('increased_services')

        print("Publishing message to ONS topic")
        ons_client.publish_message(
            topic_id=topic_ocid,
            message_details=oci.ons.models.MessageDetails(
                body=alarm_body,
                title=f"Cost increased by {percent_increase}%"))
    except Exception as ons_exception:
        print(ons_exception)


def usage_report(start_time, end_time, granularity):
    signer = oci.auth.signers.get_resource_principals_signer()
    usage_api_client = oci.usage_api.UsageapiClient(config={},signer=signer)

    request_summarized_usages_response = usage_api_client.request_summarized_usages(
        request_summarized_usages_details=oci.usage_api.models.RequestSummarizedUsagesDetails(
            tenant_id=signer.tenancy_id,
            time_usage_started=date_formatter(start_time),
            time_usage_ended=date_formatter(end_time),
            granularity=granularity,
            group_by=["service", "compartmentName"],
            compartment_depth=1,
            is_aggregate_by_time=False,
            query_type="COST"))
    usage_data = request_summarized_usages_response.data.items

    total_usage = 0
    service_cost = []
    for cost in usage_data:
        if cost.computed_amount is not None and cost.computed_amount > 0:
            service_cost.append(
                {"Service": cost.service, "Cost": cost.computed_amount, "Compartment": cost.compartment_name})
            total_usage = total_usage + cost.computed_amount

    max_service_cost = max(service_cost, key=lambda x: x['Cost'])
    print(f'Max_Service={max_service_cost["Service"]}:{max_service_cost["Cost"]}')
    return int(total_usage), service_cost


def cost_anomaly(topic_id):
    old_total, old_service_cost = usage_report(day_before_yesterday, yesterday, "DAILY")
    print('#****************************************************************')
    curr_total, curr_service_cost = usage_report(yesterday, today, "DAILY")
    difference = (curr_total - old_total) / curr_total
    if difference < 0:
        print(f"################################################################")
        print(f"Current total is less than previous day total by {round(abs(difference * 100), 2)}%")
        print(f"################################################################")

    else:
        print(f"################################################################")
        print(f"Current total is greater than previous day total by {round(abs(difference * 100), 2)}%")
        difference = f"{round(abs(difference * 100), 2)}"
        print(f"################################################################")
        # Services with increased cost
        increased_services = []
        for curr_cost in curr_service_cost:
            for old_cost in old_service_cost:
                if curr_cost['Service'] == old_cost['Service'] and curr_cost['Compartment'] == old_cost[
                    'Compartment'] and int(curr_cost['Cost']) > int(old_cost['Cost']):
                    increased_services.append(curr_cost)
                    break

        if increased_services:
            body = "Services with Increased Cost\n"
            for service in increased_services:
                print(f"Service: {service['Service']}, Cost: {service['Cost']}, Compartment: {service['Compartment']}")
                body = body + f"{service['Service']},{service['Cost']},{service['Compartment']}\n"
            ons_publish(increase_percent=difference, increased_services=body, topic_id=topic_id)
        else:
            print("No services with increased cost found.")

#This will be called when the function is invoked
def handler(ctx,data: io.BytesIO=None):
    cfg = ctx.Config()
    topic_id = cfg.get('topic_id')
    cost_anomaly(topic_id)

NOTE: Now there is a way to schedule function as part of resource scheduler .

Ignore the below steps as these were used when the schedule function feature was not available.

There is a new service called Resource Scheduler which can be used to start/stop resources at a defined time . I will schedule a start for an instance around 05AM UTC and this will trigger an event for the compute.

An event rule was created to capture this and trigger the function.The event type is Instance-Action Begin .

Attributes resourceName is set to the compute instance name and instanceActionType set to start.

I have created stop schedule to stop the instance after 10 minutes to save cost of running the instance.

If you already have a scheduling tool available you don’t need this and you can invoke the python code directly as well

Cost anomaly detection in Oracle Cloud

Written by Karthic

No responses yet