Log Spike detection and Alert in OCI Logging Analytics

Karthic
3 min readAug 15, 2024

--

While doing root cause analysis using logs the first thing which can give a indication of a problem is unexpected log spike.
Logs can grow for multiple reasons .

  1. Black Friday or some event which can increase your application traffic
  2. Issues in application or platform
  3. Enabling Debug mode and forgot to switch off etc..

As you can see in this histogram visualization the log records for Linux Audit Logs is consistent.If there is any abnormal spike we want to get notification.

We can use detection rule to accomplish this task.

In Log Explorer run a query like this to find log records per log source on a span of every 30 minutes.(30 min span is just an example) .

You can specify the log sources which are important as well if you have too many log sources.

Save this query as a saved search from the Actions menu at the top right

Using the same Actions menu create a detection rule.

Give a name to the Detection Rule

Mention the metric namespace and metric name . This will be auto-created in OCI Monitoring it does not have to pre-exist once the required IAM policy is applied

Set the Frequency to Run Indefinitely and Interval to 30 minutes.

Create the Required IAM policies as described in the pop up like below.

In metrics explorer you will see the metrics flowing in every 30 minutes.

Analyse the pattern for a day or so to understand the log pattern when the application is working fine and create alarm for abnormal value more than the threshold.

For example I have set an alarm for Linux Audit Logs when mean value more than 4500 just to test the alarm is firing.

You will get notified when there is more log records for a log source or a group of log sources.

--

--

No responses yet