wiki

Codit Wiki

Loading information... Please wait.

Codit Blog

Posted on Saturday, February 27, 2016 12:00 AM

Luis Delgado by Luis Delgado

Micro-services architectures are gaining popularity as a software architectural pattern. There are many aspects to think about when considering a micro-services architecture: scalability is one of them. Let's contrast scalability with common alternatives.

Vertical scalability

With vertical scalability, you scale the capacity of your application by increasing hardware capacity. This is why it is named "vertical": you add more CPU, more memory, more disk IOPS, but effectively, the architecture of your app and infrastructure does not change. This is a viable scalability pattern but it has an obvious hard-stop: there is so much hardware you can add, up to a certain point.

Horizontal scalability

With horizontal scalability, instead of scaling up by adding more hardware capacity, you architect your application so that it scales out by adding more instances of it. This can be accomplished by adding more VMs with the application installed, more application instances inside a cloud service, or more containers... you get the idea. You don't need beefy, expensive hardware for horizontal scalability, you can get along with small machines, and add many of them. This scalability pattern usually requires adjustments in the application architecture. For example, since a client request may be served by any machine (out of many), the application typically has to be stateless, or if state is needed, it needs to be stored somewhere else.

Scalability with micro-services

Whereas the concept of scaling horizontal seems appealing, remember that every instance you "clone" to scale horizontally is running a complete instance of your application. This might be undesirable, as your application might have different scalability needs. Typically, the load of an application is not evenly-distributed among all the services it provides. For example, a careful analysis of telemetry data might show that the bottleneck in your application are the authentication services, but all other services inside your app are performing well. If you scale horizontally, you will be scaling out the authentication services... along with everything else that does not need to be scaled out. This is a waste of resources.

A micro-services architecture takes an application and splits it into independent, working, functional units, called "services". Don't get mislead by the word "micro" in "micro-services". The split of services does not need to be "microscopic". You can split the services within your application in any arbitrary way you want. Typically, the more atomic the services are, the more value you will get from this architecture, but it need not be the case every time.

With that out of the way, let's go back to our example. With a micro-services architecture, your app will run different services as independent units, each with its own runtime, codebase, processing thread(s), etc. Since the bottleneck in our app is the authentication routine, you can scale out that service only, and leave the rest alone. With micro-services, you make a more effective use of the horizontal scalability pattern.

When considering a micro-services architecture, there are many more factors to analyse beyond scalability. But mixing the micro-services architecture with horizontal scalability typically gives you better capacity elasticity than using a monolithic architecture.

Categories: Architecture
Tags: Scalability
written by: Luis Delgado

Posted on Tuesday, December 3, 2013 4:00 PM

Glenn Colpaert by Glenn Colpaert

This blog post will talk about a problem I experienced with Microsoft Clustering on Vmware. And why the BizTalkMsgBoxDb entered 'Recovery Pending' mode.

For an environment I recently installed at the customer, we experienced some weird behavior with the BizTalkMsgBoxDb.
After some time (or whenever a failover happened) the BizTalkMsgBoxDb entered in recovery pending mode.
 

image

This had an impact on the entire environment, making the BizTalk environment unavailable.

 

 

The Problem

After some investigation on my side, I passed this problem to the infrastructure engineer to investigate the root cause of this problem.

The infrastructure team discovered that this was a problem related to the shared disk setup on the cluster environment.

 

The environment at the customer was setup in the following manner:

One virtual cluster with two virtual servers (each on a different VMware ESX version), with a shared cluster disk (VMware VMFS).

image

The actual problem with this setup is that it is NOT supported by Microsoft and actually does not work properly. It will work for some time until a failover happens or the passive node checks the status of the cluster disk. When one of those 2 actions happened on the cluster environment, the BizTalkMsgBoxDb entered ‘Recovery Pending’ mode.

 

For above configuration the VMware KB 1037959 states:

Supported in Cluster in a Box (CIB) configurations only. For more information, see the Considerations for Shared Storage Clustering section in this article.

 

So the above configuration would only work if the cluster of virtual machines is setup on a single host with the same ESX/ESXi and connected to the same storage.
If we would apply the above configuration (Cluster in a Box), this would only protect us against failures at operating system and application level, but not against hardware failures, so this option was out of the question.

 

The Solution

The solution to the problem described above was rather simple: move from an unsupported environment to a supported environment.

This was done by replacing the VMFS protocol with the RDM (Raw Device Mapping) protocol.

image

 

RDM is a mapping file in a separate VMFS volume that acts as a proxy for a raw physical storage device. The RDM allows a virtual machine to directly access and use the storage device. The RDM contains metadata for managing and redirecting disk access to the physical device.

The file gives you some of the advantages of direct access to a physical device while keeping some advantages of a virtual disk in VMFS. As a result, it merges VMFS manageability with raw device access.

 

More information on supported configurations for Microsoft Clustering on VMware vSphere can be found there: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1037959

 

Cheers,

Glenn Colpaert

Categories: BizTalk
written by: Glenn Colpaert

Posted on Tuesday, August 22, 2017 4:40 PM

Tom Kerkhove by Tom Kerkhove

In this second article on Azure Event Grid, we'll have a look at what I'd like to see being added to Azure Event Grid.

With a nice foundation of Event Publishers & Handlers, we can expect a vast amount of new ones being added that will be available out of the box in the future.

Currently Azure Event Grid only supports events in JSON, but in the future they will support other data formats such as binary.

Let's have a look at what I'd like to see being added.

High-Level Event Grid Monitoring

I'm curious to see what the operations & monitoring story will be for Azure Event Grid.

In the following sections I will refer to other Azure services/technologies that provide similar features but I'd love to see Azure Event Grid expose those capabilities out of the box instead of integrating with Azure Application Insights for example.

This would allow us to have one centralized monitoring experience for everything related to Azure Event Grid instead of having this in another service (aka "Dependency"), since this service is running as an infrastructure service supporting other services.

High-Level Component Overview

Personally, I think it would be great to have a dashboard that shows me all the Event Publishers, Topics, Subscriptions & Handlers that are connected with each other.

My vision on this is comparable with the monitoring experience that Azure Data Factory provides:

While this is only the pipeline overview, it clearly indicates how each data set, service and pipeline are connected which each other. If you go to the Monitoring-dasbhoard it also provides you an overview of all processed data slices.

It would be nice to have a similar experience for Azure Event Grid where every failed event is listed so we can view the body of the event. This also enables us to troubleshoot why the event failed and if it's related to the content of the event or not. That said, since Azure Event Grid is a high-volume service I'm not hoping for this one. However, it would be nice to have as a premium feature at least.

Another interesting feature to have is a real-time sense of the throughput of all the events in the grid, something similar to Netflix' Vizceral (GitHub).

Performance metrics per Event Handler

Next to the high-level component overview it would be great to have some integrated performance gauges.

These gauges allow us to gain insights on the processing performance of Event Handlers allowing us to pinpoint scalability problems.

This can be comparable to what Azure Application Insights "Application Map" offers where you can see the amount of requests, success rate and failures:

Adding metadata to events

Introducing a metadata-node in the event payload would be great as well. This would allow us to specify additional context information about the event, while it's not business specific information.

By doing this; we can add telemetry information, such as correlation ids, allowing us to correlate all our telemetry across all the Event Publishers, Topics, Handlers and downstream systems.

Taking it a step further, it would be nice to use "Application Insights Analytics" (aka Kusto). This provides us to search for these events and correlate the route it took in Azure Event Grid.

Integration with Azure Data Factory

Thé Event Handler I'm looking forward to is Azure Data Factory. As of today, Azure Data Factory only supports a slicing model where it will trigger your pipeline every 1 hour, day, week, etc while in several scenarios this is not the best use-case.

It would be good if we could use Azure Event Grid to forward events for newly uploaded blobs and send that to the Data Factory Handler to trigger your pipeline. This can not only make the data processing flow feel more natural and also the performance could increase while we divide the processing in smaller pieces instead of running one big pipeline.

Summary

While Azure Event Grid is still in preview, it's always good to think about ways it can be improved and how we will operate this service. We've talked about a few features I'd like to see being added which are mainly focused to monitoring the whole infrastructure and how we can correlate this back to our other telemetry.

My biggest wish is having a high-level overview of the Event Grid components and how they are connected (which Azure Functions also lacks).

My second request would be to have an out of the box centralized monitoring experience and not being forced to use Azure Application Insights. This would mean that we are fully depending on Application Insights which adds an unnecessary dependency; which is also not that cheap, certainly not with the telemetry this service will generate.

Does this mean that I don't want to have integration with Azure Application Insights? No! Just not as the built-in way to operate Azure Event Grid.

This is of course early thinking, my vision on this can change once I use this more.

Thanks for reading,

Tom Kerkhove.

Posted on Monday, August 21, 2017 10:47 AM

Tom Kerkhove by Tom Kerkhove

Azure Event Grid is here - In this first article we'll have a look at what it is, dive into the details and discuss certain new scenarios.

Last week Microsoft announced Azure Event Grid (Preview), an event-driven service that allows you to stitch together all your components and design event-driven architectures.

Next to the built-in support for several Azure services you can also provide your own custom topics and custom webhooks that fix your needs.

By using a combination of filters and multicasting, you can create a flexible event routing mechanism that fits your needs by for example sending event A to one handler, while event B is being multicasted to multiple handlers. Read more about this here.

Azure resources can act as Event Publishers where they send a variety of events to Event Grid. By using Event Subscriptions you can then subscribe to those events and send them to an Event Handler.

The main scenarios for Azure Event Grid are serverless architectures, automation for IT/operations and integration:

  • Serverless Architectures - Trigger a Logic App when a new blob is uploaded
  • Operations - Listen & react on what happens in your subscription by subscribing to Azure Subscription changes
  • Integration - Extend existing workflows by triggering a Logic App once there is a new record in your database
  • Custom - Create your own by using application topics (aka custom topics)

The pricing for Azure Event Grid is fairly simple - You pay $0.60 per million operations and you get the first 100k operations per month for free. Operations are defined as event ingress, advanced match, delivery attempt, and management calls. Currently you only pay $0.30 since it's in public preview, more information on the pricing page.

Basically you can see Azure Event Grid as an extension service that allows you to integrate Azure Services with each other more closely while you also have the flexibility to plug in your own custom topics.

Let's have a closer look at what it has to offer.

Diving into Azure Event Grid

Event Handling at Scale

Azure Event Grid is designed as an high scalable eventing backplane which comes with some serious performance targets:

  • Guaranteed sub-second end-to-end latency (99th percentile)
  • 99.99% availability
  • 10 million events per second, per region
  • 100 million subscriptions per region
  • 50 ms publisher latency for batches of 1M

These are very big numbers which also indirectly have impact on the way we design our custom event handlers. They will need to be scalable and protect themselves from being overwhelmed and should come with a throttling mechanism.

But then again, designing for the cloud typically means that each component should be highly scalable & resilient so this should not be an exception.

Durable Message Delivery

Every event will be pushed to the required Event Handler based on the configured routing. For this, Azure Event Grid provides durable message delivery with an at-least-once delivery.

By using retries with exponential backoff, Event Grid keeps on sending events to the Event Handler until it acknowledges the request with either an HTTP 200 OK or HTTP 202 Accepted.

The Event Handler needs to be capable of processing the event in less than one minute, otherwise Event Grid will consider it as failed and retry it. This means that all Event Handlers should be idempotent to avoid creating invalid state in your system.

However, if your Event Handler is unable to process the event in time and Event Grid has been retrying for up to 24h, 2h in public preview, it will expire the event and stop retrying.

In summary, Event Grid guarantees an at-least-once delivery for all your events but you as an Event Handler are still in charge of being capable of processing the event in time. This also means that it should be able to preserve performance when they are dealing with load spikes.

It is also interesting to see what really happens with the expired events. Do they really just go away or will there be a fallback event stream to which they are forwarded for later processing? In general, I think expiration of events will work but in certain scenarios I see a case where having the fallback event stream is a valuable asset for mission critical event-driven flows.

You can read more on durable message delivery here.

How about security?

Azure Event Grid offers a variety of security controls on all levels:

  • Managing security on the Event Grid resource itself is done with Role-based Access Control (RBAC). It allows you to define granular control to the correct people. It's a good practice to use the least-priviledge principle, but that is applicable to all Azure resources. More information here.
  • Webhook Validation - Each newly registered webhook needs to be validated by Azure Event Grid first. This is to prove that you have ownership over the endpoint. The service will send a validation token to the webhook, which the webhook implementer needs to send back as a validation. It's important to note that only HTTPS webhooks are supported. More information here.
  • Event Subscription uses Role-based Access Control (RBAC) on the Event Grid resource where the person creating a new subscription needs to have the Microsoft.EventGrid/EventSubscriptions/Write permissions.
  • Publishers need to use SAS Tokens or key authentication when they want to publish an event to a topic. SAS tokens allow you to scope the access you grant to a certain resource in Event Grid for a certain amount of time. This is similar to the approach Azure Storage & Azure Service Bus use.

The current security model looks fine to me, although it would be nice if there would be a concept of SAS tokens with a stored access policysimilar to Azure Storage. This would allow us to issue tokens for a certain entity, while still having the capability to revoke access in case we need this, i.e. when a token was compromised.

An alternative to SAS stored access policies would be to be able to create multiple authorization rulessimilar to Azure Service Bus, where we can use the key approach for authentication while still having the capability to have more granular control over whom uses what key and being able to revoke it for one publisher only, instead of revoking it for all publishers.

You can read more on security & authentication here.

Imagine the possibilities

Integration with other Azure services

As of today there are only a few Azure services that integrate with Azure Event Grid but there are a lot of them coming.

Here are a couple of them that I would love to use:

  • Use API Management as a public-facing endpoint where all events are transformed and sent over to Azure Event Grid. This would allow us to use API Management as a webhook proxy between the 3rd party and Azure Event Grid. More on this later in the post
  • Streamlined event processing for Application Insights custom events where it acts as an Event Publisher. By doing this we can push them to our data store so that we can use it in our Power BI reporting, instead of having to export all telemetry and setting up a processing pipeline for that, as described here
  • Real-time auditing & change notifications for Azure Key Vault
    • Publish events when a new version of a Key or Secret was added to notify dependent processes about this so they can fetch the latest version
    • Real-time auditing by subscribing to changes on the access policies
  • Sending events when alerts in Azure Monitor are triggered would be very useful. In the past I've written about how using webhooks for processing alerts instead of emails are more interesting as you can trigger an automation workflow such as Logic Apps. If an alert would send an event to Azure Event Grid we can take it even a step further and create dedicated handlers per alert or alert group. You can already achieve this with Logic Apps & Service Bus Topics as of today but with Event Grid this comes out of the box and makes it more easy to create certain routings
  • Trigger an Azure Data Factory when an event occurs, i.e. when a blob was added to an Azure Storage container
  • Send an event when Azure Traffic Manager detects a probe that is unhealthy

New way of handling webhook events?

When we want to provide 3rd parties to send notifications to a webhook we need to provide a public endpoint which they can call. Typically, these just take the event and queue them for later processing allowing the 3rd party to move on as we handle the event at our own pace.

The "problem" here is that we still need to host an API middleware somewhere; be it an Azure Function, Web App, Api App, etc; that just handles this message. Even if you use Azure API Management, you still need to have the middleware running behind the API Management proxy since you can't push directly to a topic.

Wouldn't it be nice if we can get rid of that host and let API Management push the requests directly to Azure Event Grid so that it can fan-out all the external notifications to the required processors?

That said, this assumes that you don't do any validation or other business logic before the webhook middleware pushes to the topic for processing. If you need this capability, you will have to stick with hosting your own middleware I'm afraid.

Unified integration between APIs

Currently when you are using webhooks inside your infrastructure the Event Publishers are often calling webhooks directly creating a spaghetti infrastructure. This is not manageable since each Event Publisher needs to have the routing logic inside their own component.

By using Azure Event Grid we can route all the events through Azure Event Grid and use it as an event broker, or routing hub if you will, and thus decoupling Event Publisher from the corresponding Event Handlers.

By doing this we can easily change the way we route events to new Event Handlers by simply changing the routing, not the routing logic in the Event Publishers.

Depending on the monitoring Azure Event Grid will provide, it can also provide a more generic approach in how we monitor all the event handling instead of using the monitoring on each component. More on this in my next blog.

Depending on the load, you can of course also use Azure Service Bus Topics but all depends on the load you are expecting. As always, it depends on the scenario; to pick which technology is best for the scenario.

Summary

Azure Event Grid is a unique service that has been added to Microsoft Azure and brings a lot to the table. It promises big performance targets and will enable new scenarios, certainly in the serverless landscape.

I'm curious to see how the service will evolve and what publishers & handlers will be coming soon. Personally, I think it's a big announcement and will give it some more thinking on how we can use it when building platforms on Microsoft Azure.

Want to learn more yourself? Here's a good Cloud Cover episode that will give you a high-level overview of Azure Event Grid or read about the concepts of Event Grid.

What features would you like to see being added to the service? In what scenarios do you see Event Grid as a good fit? Feel free to mention them in the comments!

Thanks for reading,

Tom Kerkhove.