Automation to enable Logging Analytics for instances

Karthic
4 min readSep 11, 2024

--

OCI(Oracle Cloud Infrastructure) has an observability service called Logging Analytics to help in analyzing logs.

The steps to enable logging for OCI compute instances based on Linux image which supports Oracle cloud agent is to enable the management agent and then deploy the Logging Analytics plugin .

Manual Steps:

You will do the below from Agents page in OCI console.

These steps are fine if you have few instances and you don’t need any automation to enable logging

But if you are using auto scaling for your instances based on either scheduling or metrics and you want logging to be enabled for these instances as well automatically.

Automation Steps:

The first thing to do is make sure the management agent is enabled as part of the instance configuration.This will help the management agent to be enabled for the new instances created as part of the auto scaling. If you are new to OCI instance autoscaling you can refer this doc for more info.

Now the next step is to enable the logging analytics plugin for these agents. I have created a python script which will do that .

The below script will work in cloud shell replace the <compartment_ocid> placeholder with the agent compartment ocid. You can change the signer method to either run locally or via OCI functions.

import oci

delegation_token = open('/etc/oci/delegation_token', 'r').read()
signer = oci.auth.signers.InstancePrincipalsDelegationTokenSigner(
delegation_token=delegation_token)

compartment_id = "<compartment_ocid>"
management_agent_client = oci.management_agent.ManagementAgentClient(config={}, signer=signer)


def list_active_agents(plugin):
response = management_agent_client.list_management_agents(
compartment_id=compartment_id,
lifecycle_state="ACTIVE",
availability_status="ACTIVE",
platform_type=["LINUX"],
plugin_name=plugin,
is_customer_deployed=False,
install_type="AGENT",
compartment_id_in_subtree=False,
access_level="ACCESSIBLE")
return response.data


la_agent_list = []
for la_agent in list_active_agents(["logan"]):
la_agent_list.append(la_agent.id)


all_agent_list = []
for all_agent in list_active_agents(["dbaas","jm","jms","appmgmt","opsiHost","osmh","logan","None"]):
all_agent_list.append(all_agent.id)


deploy_list = set(all_agent_list) - set(la_agent_list)


list_management_agent_plugins_response = management_agent_client.list_management_agent_plugins(
compartment_id=compartment_id,
lifecycle_state="ACTIVE",
display_name="Logging Analytics",
platform_type=["LINUX"])

la_plugin_id = list_management_agent_plugins_response.data[0].id

if len(deploy_list) > 0:
deploy_plugins_response = management_agent_client.deploy_plugins(
deploy_plugins_details=oci.management_agent.models.DeployPluginsDetails(
plugin_ids=[la_plugin_id],
agent_compartment_id=compartment_id,
agent_ids=list(deploy_list)))
print(deploy_plugins_response.status)

Below is the function code. You can pass the compartment_id,enable_logging via the function configuration. Please provide the necessary IAM permission for the function dynamic group which is needed for the function to work.

When the enable_logging is set to ALL it will deploy Logging analytics plugin to all active agents in that compartment. If its set to any other value it will deploy Logging Analytics plugin only to the active agents where there is no plugin deployed

import oci
import io
import time


def list_active_agents(plugin, compartment_ocid, management_agent_client):
response = management_agent_client.list_management_agents(
compartment_id=compartment_ocid,
lifecycle_state="ACTIVE",
availability_status="ACTIVE",
platform_type=["LINUX"],
plugin_name=plugin,
is_customer_deployed=False,
install_type="AGENT",
compartment_id_in_subtree=False,
access_level="ACCESSIBLE")
return response.data


def handler(ctx, data: io.BytesIO = None):
try:
cfg = dict(ctx.Config())
compartment_id = cfg['compartment_id']
enable_logging = cfg['enable_logging']
time.sleep(60)
print(f"Sleeping for 60 seconds to give time for agents to become active", flush=True)
print(f"Function config for enable logging: {enable_logging}", flush=True)
signer = oci.auth.signers.get_resource_principals_signer()
management_agent_client = oci.management_agent.ManagementAgentClient({}, signer=signer)
la_agent_list = []

for la_agent in list_active_agents(["logan"], compartment_id, management_agent_client):
la_agent_list.append(la_agent.id)

all_agent_list = []
if enable_logging.upper() == "ALL":
for all_agent in list_active_agents(["dbaas", "jm", "jms", "appmgmt", "opsiHost", "osmh", "logan", "None"],
compartment_id, management_agent_client):
all_agent_list.append(all_agent.id)
else:
for all_agent in list_active_agents(["None"],
compartment_id, management_agent_client):
all_agent_list.append(all_agent.id)

deploy_list = set(all_agent_list) - set(la_agent_list)
print(f"LA plugin Deploy list of agents:{deploy_list}", flush=True)

list_management_agent_plugins_response = management_agent_client.list_management_agent_plugins(
compartment_id=compartment_id,
lifecycle_state="ACTIVE",
display_name="Logging Analytics",
platform_type=["LINUX"])

la_plugin_id = list_management_agent_plugins_response.data[0].id

if len(deploy_list) > 0:
deploy_plugins_response = management_agent_client.deploy_plugins(
deploy_plugins_details=oci.management_agent.models.DeployPluginsDetails(
plugin_ids=[la_plugin_id],
agent_compartment_id=compartment_id,
agent_ids=list(deploy_list)))
print(deploy_plugins_response.status)
except Exception as ex:
print(str(ex), flush=True)

You can trigger the function based on new instance creation event for full automation. The compute instance event type used to detect the instance launch end is com.oraclecloud.computeapi.launchinstance.end.
I have added a delay of 60seconds in the function to give some time for the agents to become active. Set the function timeout to a higher value like 200 seconds.

NOTE: It will take around 5 minutes ideally for the agent to go into SILENT mode when the instance is terminated. So its not advised to run the script in quick succession. This should not be a problem for instance auto scaling as there should be minimum cooling period of 300seconds.

Once the logging plugin is enabled you can start associating the logs.To automate this you can enable auto association for the log source you are interested in. As an example for Linux Syslog Logs

If the logs are not readable by the agent user(oracle-cloud-agent) when the management agent is a plugin enabled with Oracle Cloud Agent. You need to give the required permission as mentioned in the doc for those files to be readable by the agent user.

You can set these commands as part of the cloud init automation if you know which logs needs the permission beforehand.

For example to allow read permission of /var/log/messages to oracle-cloud-agent user in Oracle linux instance.

NOTE: As always test the script in non production environment to make sure its working as expected. If you find any bug please let me know so I can update the code.

--

--

No responses yet