
The resources from the previous blog post have been created, are up and running but are, sadly, not yet error proof. In this post, we will take a deeper look into error handling for APIM and Logic Apps and logging in Azure.
We will start by implementing error handling in our Logic App. That way, when we send out custom errors codes, we can catch them in the API Management Service and return meaningful exceptions. While there has already been a post about exception handling, I will be taking a different, more general approach.
The code behind is a JSON-file that lists the connections, actions and triggers.
If we take a look at one of the actions (the Parse_JSON action is show below), we can see that it expects an input with parameters (based on the schema we generated earlier) and that it has a runAfter parameter. This will determine when this action will take place. The action in runAfter also takes a parameter, “Succeeded” here. This means that after a successful ReturnGuid action, the Parse_JSON action executes.
We can catch exceptions now by adding a new action that can run after another action that has “Failed” or even “Skipped”.
To be able to catch every exception for every block, I will put my actions in one Scope (in the designer). Preferably, larger flows should be broken up in multiple scopes. Add a new scope, put your actions in there and return to the Code view.
Now that we have our Scope, we’ll set up an action that will filter out the failed action from the scope and a response that will contain the info from the failed action.
Other guides recommend to copy the error and status code from the output, but if an unexpected error occurs from an action that is not an HTTP call, nobody will notice as the Logic App will send out a generic error.
Below is the code used. @result(‘Scope’) returns an array of action outputs in the scope mentioned.
The body string is as follows:
“@concat(‘Error occurred at ‘ , body(‘Filter_Array’)[0][‘startTime’] , ‘rn’ , body(‘Filter_Array’)[0][‘code’] , ‘rn’ , ‘Tracking ID:’ , body(‘Filter_Array’)[0][‘clientTrackingId’])”
It provides the body with the error code, start datetime and a tracking ID.
The client will not get that much information, but this is more robust and allows us to track down the error if the client provides us with the error message.
Error handling in the APIM
When the API Management service catches an error in its policies, a couple of things happen. First, the APIM stops applying any incoming, outgoing or backend policies to the message.
Secondly, it jumps straight to a new policy, the “on-error” policy. For the policies allowed in “on-error”, read this article.
Thirdly, a LastError object is made to store the received error that can be accessed via “context.LastError”. Head over here to read up on the different properties.
There are several interesting things that can be done here. But most interesting are the log-to-eventhub policy and the send-one-way-request policy. The eventhub policy will allow us to log an event into the eventhub, more on it later.
The one way request policy allows us to create a new request without waiting for a response. In the on-error policy, this can be used to send out an alert to the REST API of your choosing. Microsoft has a good example of sending a one way request to a Slack chat room if a Status Code is 500 or more. Note that the on-error policy does not detect calls that failed in the backend. It will only apply this if something happened in the APIM service itself.
Logging
By default, Azure offers 3 different kinds of logging.
Storage in a Storage account to store your data so that it can audited or manually inspected. No processing happens with the data.
Datastreams to Event Hubs. Event Hubs acts as an event ingestor; it takes in telemetry data from different sources and distributes it according to a pub sub model. Event Hubs are the first step into Big Data and are a bit out of scope for these posts. A good intro can be found here, using Event Hub, Power BI and Stream Analytics you will make a real-time data dashboard.
And last but not least, Log Analytics. Log Analytics provides monitoring services (on a separate Portal than Azure) for cloud and on-premise resources to maintain availability and performance. The collected data can be analyzed through the use of query searches, visualizations, alerts and solutions.
For these series of blog posts, we will use Log Analytics as it is more focused on monitoring, alerting and availability whereas Event Hubs caters more to applications and IoT while Storage simply stores your data.
Collecting your data
To start collecting your data, go to your resources and select Diagnostics Logs.
From there, turn on Diagnostics Logs and check Send to Log Analytics. You will be asked to configure Log Analytics by creating a new Workspace. Give it a fancy new name and fill in the rest of the options. On the previous screen, check the logs available under LOG and 1 minute under METRIC. Repeat for all your resources that you want to monitor. This guide will be focusing on the APIM Service, the Function and the Logic App diagnostic logs.
Now that your Resources have been linked to a workspace, head on over there and click the OMS-portal button.
You’ll be greeted by your homepage which contains some info on the Workspace. From here on out, you can choose to add a new visualization in the View Designer , start a Query
, view your Dashboard
, access the Marketplace for Solutions
and view Usage
.
Go to queries and take a look at the logs Log Analytics has gathered. The following list shows a breakdown of the data
- APIM
- AzureMetrics
- Other
- Successful
- Failed
- Unauthorized
- Total
- AzureDiagnostics
- Logging of actual calls made
- Functions
- AzureMetrics
- AverageMemoryWorkingSet
- BytesReceived
- BytesSent
- FunctionExecutionCount
- FunctionExecutionUnits
- Http5xx
- MemoryWorkingSet
- Logic Apps
- AzureMetrics
- ActionsCompleted
- ActionsFailed
- ActionsSkipped
- ActionsStarted
- ActionsSucceeded
- BillableActionExecutions
- BillableTriggerExecutions
- RunFailurePercentage
- RunThrottledEvents
- RunsCancelled
- RunsCompleted
- RunsFailed
- RunsStarted
- RunsSucceeded
- TotalBillableExecution
- TriggerThrottledEvents
- TriggersCompleted
- TriggersFailed
- TriggersFired
- TriggersSkipped
- TriggersStarted
- TriggersSucceeded
- AzureMetrics
- AzureMetrics
- AzureMetrics
Every single one of these is logged every minute, with the exception of APIM AzureDiagnostics that gets logged whenever a call is made. Sifting through these is an impossible task unless you are actively searching for a specific moment and not what Log Analytics is made for.
Based on the data we have, we are going to create a view that shows a breakdown of all calls made to the APIM service with AzureMetrics, a top region where our APIM is called from and an alert when too many unauthorized requests have been received.
The query language consists of two parts and is divided by the first |sign. The first part filters all the data from Log Analytics while the latter describes the operations to be performed on the data. For more info on the possibilities, go here.
Go to the View Designer, click + Tile and add a donut.
Give it a good name and description. For the query, you can use the following:
Resource={{Name of your APIM Service}} && Type=AzureMetrics && MetricName!=TotalRequests &&TimeGenerated:[NOW-7DAYS..NOW] | measure sum(Total) as Count by MetricName
The first part will take all the Metric logs from your API service, in the last 7 days and sum the total, grouped by metric name.
You should end up with something like the above. For more customization, take the Donut MultiQuery and query only for a specific MetricName. Click save and you will be taken back to your homepage where the View is now visible!
For the alert, go to Query Search and start searching for the sum of all unauthorized requests.
Resource=TESTIFITWORKS && Type=AzureMetrics && MetricName=UnauthorizedRequests | measure count(Total) as UnAuthorized by MetricName
After searching, click the alert button above.
You will be taken to the following screen where you will be warned that the current query needs an aggregate and interval. Add “interval 1Minute” at the end of your query and complete the rest of the form.
Tips
- TimeGenerated does not have to be checked. The portal does this automatically for every query and allows you to pick a date range.
- Joins are limited. They do not currently support queries that include the IN keyword, the Measure command or the Extend command if it targets a field from the right query.
- Log Analytics is not real-time. Keep this in mind while setting up alerts that there may be a delay. The example used here notified me 45 minutes after the final breach.
- In case an error is caught in the APIM Service, you can send a One-way-request to Log Analytics HTTP Data Collector API.
- OMS can be used far more extensively to monitor your on-premise systems as well.
- I used the JSONView addon
Conclusion
Our hybrid integration story is now a little bit more error-proof thanks to error handling in APIM and Logic Apps. For an alternative way to error handle Logic Apps, Steven Van Eycken also wrote a great post on it.
Our resources are now logging to a centralized system where we can set up some views and alerts that allow us to monitor the health of our infrastructure. It allows us to learn more about who, what and how customers are accessing our resources. And if combined with on-premise resources, it could even be used to handle proactively.
Happy bug hunting!
Jochim