Codit Wiki

Loading information... Please wait.

Codit Blog

Posted on Tuesday, August 22, 2017 4:40 PM

Tom Kerkhove by Tom Kerkhove

In this second article on Azure Event Grid, we'll have a look at what I'd like to see being added to Azure Event Grid.

With a nice foundation of Event Publishers & Handlers, we can expect a vast amount of new ones being added that will be available out of the box in the future.

Currently Azure Event Grid only supports events in JSON, but in the future they will support other data formats such as binary.

Let's have a look at what I'd like to see being added.

High-Level Event Grid Monitoring

I'm curious to see what the operations & monitoring story will be for Azure Event Grid.

In the following sections I will refer to other Azure services/technologies that provide similar features but I'd love to see Azure Event Grid expose those capabilities out of the box instead of integrating with Azure Application Insights for example.

This would allow us to have one centralized monitoring experience for everything related to Azure Event Grid instead of having this in another service (aka "Dependency"), since this service is running as an infrastructure service supporting other services.

High-Level Component Overview

Personally, I think it would be great to have a dashboard that shows me all the Event Publishers, Topics, Subscriptions & Handlers that are connected with each other.

My vision on this is comparable with the monitoring experience that Azure Data Factory provides:

While this is only the pipeline overview, it clearly indicates how each data set, service and pipeline are connected which each other. If you go to the Monitoring-dasbhoard it also provides you an overview of all processed data slices.

It would be nice to have a similar experience for Azure Event Grid where every failed event is listed so we can view the body of the event. This also enables us to troubleshoot why the event failed and if it's related to the content of the event or not. That said, since Azure Event Grid is a high-volume service I'm not hoping for this one. However, it would be nice to have as a premium feature at least.

Another interesting feature to have is a real-time sense of the throughput of all the events in the grid, something similar to Netflix' Vizceral (GitHub).

Performance metrics per Event Handler

Next to the high-level component overview it would be great to have some integrated performance gauges.

These gauges allow us to gain insights on the processing performance of Event Handlers allowing us to pinpoint scalability problems.

This can be comparable to what Azure Application Insights "Application Map" offers where you can see the amount of requests, success rate and failures:

Adding metadata to events

Introducing a metadata-node in the event payload would be great as well. This would allow us to specify additional context information about the event, while it's not business specific information.

By doing this; we can add telemetry information, such as correlation ids, allowing us to correlate all our telemetry across all the Event Publishers, Topics, Handlers and downstream systems.

Taking it a step further, it would be nice to use "Application Insights Analytics" (aka Kusto). This provides us to search for these events and correlate the route it took in Azure Event Grid.

Integration with Azure Data Factory

Thé Event Handler I'm looking forward to is Azure Data Factory. As of today, Azure Data Factory only supports a slicing model where it will trigger your pipeline every 1 hour, day, week, etc while in several scenarios this is not the best use-case.

It would be good if we could use Azure Event Grid to forward events for newly uploaded blobs and send that to the Data Factory Handler to trigger your pipeline. This can not only make the data processing flow feel more natural and also the performance could increase while we divide the processing in smaller pieces instead of running one big pipeline.


While Azure Event Grid is still in preview, it's always good to think about ways it can be improved and how we will operate this service. We've talked about a few features I'd like to see being added which are mainly focused to monitoring the whole infrastructure and how we can correlate this back to our other telemetry.

My biggest wish is having a high-level overview of the Event Grid components and how they are connected (which Azure Functions also lacks).

My second request would be to have an out of the box centralized monitoring experience and not being forced to use Azure Application Insights. This would mean that we are fully depending on Application Insights which adds an unnecessary dependency; which is also not that cheap, certainly not with the telemetry this service will generate.

Does this mean that I don't want to have integration with Azure Application Insights? No! Just not as the built-in way to operate Azure Event Grid.

This is of course early thinking, my vision on this can change once I use this more.

Thanks for reading,

Tom Kerkhove.

Tip: On Tuesday 19 December, Codit is organizing an Azure Event Grid webinar. Register here.

Posted on Friday, January 5, 2018 9:04 AM

Tom Kerkhove by Tom Kerkhove

Things change, and so does the cloud. New services are being added and integration between services is being improved, but services also become deprecated. We need to embrace change and design for it.

Our industry has shifted quite a lot in the past recent years where we moved from spinning up our own servers on-premises to run our software by hosting more and more in the cloud.

This brings a lot of benefits where agility is one of them. By moving away from yearly releases to monthly or weekly releases product teams can get new features and services faster out of the door to receive feedback more easily. This is very good to quickly evolve your product and see how your consumers are using it allows you to adapt or release bug fixes more quickly.

This is exactly what Microsoft Azure and other cloud platforms are doing. Every blink of an eye they release new features! Keeping up with all latest and greatest is sometimes like drinking from a water hose, you can manage to do it but not for long! Some might say that things are even going too fast but that's a topic on its own.

The key learning of the journey I've seen so far is that things change, and you'd better be prepared.

Introduction of new services

Over time, ecosystems can expand by the addition of new services that can change the way you think about the systems that you are building or fill in the gaps that you now need to work around.

Azure Event Grid is one the newest services in Microsoft Azure that leverage unique capability - Support for sending notifications in event-driven architectures. This ties in with the recent "Serverless" trend where everything needs to be event-driven and only care about the logic that needs to run, not how it's running. Event Grid was the last piece of the puzzle to go fully event-driven which can make us question our current approach to existing systems.

Better integration between services

Another aspect of change is that services are easier to integrate with each other over time. This allows you to achieve certain aspects without having to do the heavy lifting yourself.

An example of this is Azure Logic Apps & Azure Table Storage. If you wanted to use these in the past, you had to build & deploy your own custom Table Storage API App for that because it was not there out-of-the-box. Later on, they added a connector to the connector portfolio that allows you have the same experience without having to do anything and allowed you to switch verify easily.

Azure AD Managed Service Identity (MSI) is another good example which makes authentication with Azure AD very easy, simplifying authentication with Azure Key Vault. No need to worry about storing authentication information on your compute nodes anymore, MSI will handle it for you! And while this makes it easier for you, it's also more secure since you don't have the additional risk of storing the information somewhere, it's now being handled by the ecosystem and not your problem anymore! It's not about completely removing security risks, it's about limiting them.

You've got to move it, move it.

But then comes the day that one of the services on which you depend is no longer being invested in any more or even worse, being deprecated. Next thing you know, you need to migrate to another (newer) service or, if you're very lucky, there is no migration path.

This is not a walk in the park because it comes with a lot of important questions:

  • Does it have the same feature set?
    • If not, do we need to migrate it to multiple services or look at using an offering from another vendor/community?
  • What is the new pricing story? Will it be more expensive?
  • What is the current status of the newer service? Is it stable enough (yet)?
  • How about the protocols that are being used for both the old services as for the new alternatives?
    • Does it support the same protocols or are they proprietary?
    • Can we benefit from using open standards instead?
    • Does it bring any (new) vendor lock-ins?
  • Do I have to revise my ALM story or does it follow a similar approach?
  • And many more

Unfortunately, 2017 was the year where Azure Access Control Service (ACS) was officially being deprecated and existing customers have until November 7, 2018, before the service is being shut down. This might sound like a long time before it goes away but it takes a certain amount of effort to migrate off of a service onto a new one because you need to evaluate alternatives, plan for the migration, implement changes, re-test everything and push it to the masses so it's fair to say that it takes a certain amount of time.

ACS, in particular, is an interesting case because they provide a decent migration guide and the blog post gives you guidance as well, but that does not mean that you're off the hook. While you can migrate to Azure AD or Azure AD B2C, these alternatives do not support all authentication protocols that ACS did. Luckily there are also communities that have (OSS) technology available such as IdentityServer but that's not a guarantee that it has the same capabilities than what you are migrating from.

Is ACS an exception? Certainly not. Remember Azure Remote App? Gone. Azure Power BI Embedded? Deprecated and you should migrate to Power BI.

This is far from a rant and building systems are not hard, it's maintaining them. And at a certain point in time, you need to make hard decisions which unfortunately sometimes impacts customers.

More information on Power BI Embedded can be found here as well.

Deprecated? No, but you'd better use our vNext

Next to deprecation, some services are being improved by launching a brand new major version of the service itself that has an improved version of its precursor, a service upgrade if you will. This means that the service is still around, but that it has changed so dramatically that you will need to migrate as well.

Azure Data Factory is a good example of this, which I've written about recently, where you can still use Azure Data Factory v1 but v2 has arrived and will be the way forward. This means that you can still use the service that you like but have to migrate since there are potentially a few breaking changes because the infrastructure supporting it has changed.

You can see a service upgrade a bit like a light version of a deprecated service - Your current version is going away, but you can still stick around and use the new version. If you're lucky, you don't need to migrate or use one of the provided migration tools to do this for you. However, you still need to make sure that everything still works and that you make the switch, but you get new features for doing that.

Embracing Change

There are a variety of ways that can impact the architecture of your application and we need to design for change because we will need it.

Another interesting aspect from the ACS lifecycle is that, if you've been around for a while, you might have noticed that the service didn't get any investments in the last couple of years, but neither did Azure Cloud Services. Do we need to panic? No. But it's safe to say that Cloud Services are going away in the future as well and it's always good to look around and see if there are any alternatives. Do we need to switch as soon as possible? No.

Are only old services going away? No. Thanks to Agile it is very easy to deliver an MVP and see what the feedback is, but if nobody likes it or no business need gets fulfilled it is probably going away. A good example for this is Azure BizTalk Services that was around for a year but was killed particularly fast because nobody really liked it and Azure Logic Apps is the successor of that, which people like more.

It is crucial to find a balance between cutting-edge technology & battle-tested services. Every service brings something to the table, but is it really what you need or do you just want to use a new shiny technology/service? Compare all candidates and see what benefits & trade-offs they have and use the right service for the job.

I'm proud to say that I'm still using Azure Cloud Services, despite the lack of investment by Microsoft. Why? Because it gives me what I need and there is no alternative that is similar to what Cloud Services gives for our scenario. However, this does not mean that we will use it forever and we keep an eye open for the development of other services.

When new technologies or services arise, it's always good to have a look and see what the power of it is via a small spike or POC but be cautious before you integrate it in your application. But is it worth to switch (already)? Here are a few questions you could ask yourself:

  • What does it bring over the current solution?
  • What is the performance of it?
  • What is the risk/impact of it?
  • What is the monitoring story around it?
  • What is the security story around it?
  • Can we do automated deployments?

Embrace change. Make sure that you can easily change things in your architecture without your customers knowing about them.

How? Well it always depends on your application. To give you one example - Make sure that your public API infrastructure is decoupled from your internal infrastructure and that you use DNS for everything. Azure API Management is a perfect fit for this because it decouples the consumers from the backend part giving you control over things like advanced routing, easy security, etc regardless of your physical backend. If you decide to decompose your API that is hosted on a Web App into multiple microservices running in Kubernetes or Azure Functions, you can very easily do that behind the scenes while your customers are still calling the same operations on your API Proxy.

Certainly, do this if you are working with webhooks that are being called by 3rd parties. You can ask consumers to call a new operation, which you should avoid, but with webhook registrations, you cannot. One year ago we decided that all webhooks should be routed through Azure API Management so that we benefit from the routing aspect, but also allow us to still secure our physical API since webhooks not always support security as it should.


This article is far from a rant, but more to create awareness that things are moving fast and we need to find a balance between cutting-edge technology & battle-tested services.

Use a change-aware mindset when designing your architecture, because you will need it. Think about the things that you depend on, but also be aware that you can only do this to a certain degree.

In my example above I talked about using Azure API Management as a customer-facing endpoint for your API infrastructure. Great! But what if that one goes away? Then you'll have to migrate everything because you can only be as cautious as you can to a certain degree because in the end, you'll need to depend on something.

Thanks for reading,


Posted on Monday, August 21, 2017 10:47 AM

Tom Kerkhove by Tom Kerkhove

Azure Event Grid is here - In this first article we'll have a look at what it is, dive into the details and discuss certain new scenarios.

Last week Microsoft announced Azure Event Grid (Preview), an event-driven service that allows you to stitch together all your components and design event-driven architectures.

Next to the built-in support for several Azure services you can also provide your own custom topics and custom webhooks that fix your needs.

By using a combination of filters and multicasting, you can create a flexible event routing mechanism that fits your needs by for example sending event A to one handler, while event B is being multicasted to multiple handlers. Read more about this here.

Azure resources can act as Event Publishers where they send a variety of events to Event Grid. By using Event Subscriptions you can then subscribe to those events and send them to an Event Handler.

The main scenarios for Azure Event Grid are serverless architectures, automation for IT/operations and integration:

  • Serverless Architectures - Trigger a Logic App when a new blob is uploaded
  • Operations - Listen & react on what happens in your subscription by subscribing to Azure Subscription changes
  • Integration - Extend existing workflows by triggering a Logic App once there is a new record in your database
  • Custom - Create your own by using application topics (aka custom topics)

The pricing for Azure Event Grid is fairly simple - You pay $0.60 per million operations and you get the first 100k operations per month for free. Operations are defined as event ingress, advanced match, delivery attempt, and management calls. Currently you only pay $0.30 since it's in public preview, more information on the pricing page.

Basically you can see Azure Event Grid as an extension service that allows you to integrate Azure Services with each other more closely while you also have the flexibility to plug in your own custom topics.

Let's have a closer look at what it has to offer.

Diving into Azure Event Grid

Event Handling at Scale

Azure Event Grid is designed as an high scalable eventing backplane which comes with some serious performance targets:

  • Guaranteed sub-second end-to-end latency (99th percentile)
  • 99.99% availability
  • 10 million events per second, per region
  • 100 million subscriptions per region
  • 50 ms publisher latency for batches of 1M

These are very big numbers which also indirectly have impact on the way we design our custom event handlers. They will need to be scalable and protect themselves from being overwhelmed and should come with a throttling mechanism.

But then again, designing for the cloud typically means that each component should be highly scalable & resilient so this should not be an exception.

Durable Message Delivery

Every event will be pushed to the required Event Handler based on the configured routing. For this, Azure Event Grid provides durable message delivery with an at-least-once delivery.

By using retries with exponential backoff, Event Grid keeps on sending events to the Event Handler until it acknowledges the request with either an HTTP 200 OK or HTTP 202 Accepted.

The Event Handler needs to be capable of processing the event in less than one minute, otherwise Event Grid will consider it as failed and retry it. This means that all Event Handlers should be idempotent to avoid creating invalid state in your system.

However, if your Event Handler is unable to process the event in time and Event Grid has been retrying for up to 24h, 2h in public preview, it will expire the event and stop retrying.

In summary, Event Grid guarantees an at-least-once delivery for all your events but you as an Event Handler are still in charge of being capable of processing the event in time. This also means that it should be able to preserve performance when they are dealing with load spikes.

It is also interesting to see what really happens with the expired events. Do they really just go away or will there be a fallback event stream to which they are forwarded for later processing? In general, I think expiration of events will work but in certain scenarios I see a case where having the fallback event stream is a valuable asset for mission critical event-driven flows.

You can read more on durable message delivery here.

How about security?

Azure Event Grid offers a variety of security controls on all levels:

  • Managing security on the Event Grid resource itself is done with Role-based Access Control (RBAC). It allows you to define granular control to the correct people. It's a good practice to use the least-priviledge principle, but that is applicable to all Azure resources. More information here.
  • Webhook Validation - Each newly registered webhook needs to be validated by Azure Event Grid first. This is to prove that you have ownership over the endpoint. The service will send a validation token to the webhook, which the webhook implementer needs to send back as a validation. It's important to note that only HTTPS webhooks are supported. More information here.
  • Event Subscription uses Role-based Access Control (RBAC) on the Event Grid resource where the person creating a new subscription needs to have the Microsoft.EventGrid/EventSubscriptions/Write permissions.
  • Publishers need to use SAS Tokens or key authentication when they want to publish an event to a topic. SAS tokens allow you to scope the access you grant to a certain resource in Event Grid for a certain amount of time. This is similar to the approach Azure Storage & Azure Service Bus use.

The current security model looks fine to me, although it would be nice if there would be a concept of SAS tokens with a stored access policysimilar to Azure Storage. This would allow us to issue tokens for a certain entity, while still having the capability to revoke access in case we need this, i.e. when a token was compromised.

An alternative to SAS stored access policies would be to be able to create multiple authorization rulessimilar to Azure Service Bus, where we can use the key approach for authentication while still having the capability to have more granular control over whom uses what key and being able to revoke it for one publisher only, instead of revoking it for all publishers.

You can read more on security & authentication here.

Imagine the possibilities

Integration with other Azure services

As of today there are only a few Azure services that integrate with Azure Event Grid but there are a lot of them coming.

Here are a couple of them that I would love to use:

  • Use API Management as a public-facing endpoint where all events are transformed and sent over to Azure Event Grid. This would allow us to use API Management as a webhook proxy between the 3rd party and Azure Event Grid. More on this later in the post
  • Streamlined event processing for Application Insights custom events where it acts as an Event Publisher. By doing this we can push them to our data store so that we can use it in our Power BI reporting, instead of having to export all telemetry and setting up a processing pipeline for that, as described here
  • Real-time auditing & change notifications for Azure Key Vault
    • Publish events when a new version of a Key or Secret was added to notify dependent processes about this so they can fetch the latest version
    • Real-time auditing by subscribing to changes on the access policies
  • Sending events when alerts in Azure Monitor are triggered would be very useful. In the past I've written about how using webhooks for processing alerts instead of emails are more interesting as you can trigger an automation workflow such as Logic Apps. If an alert would send an event to Azure Event Grid we can take it even a step further and create dedicated handlers per alert or alert group. You can already achieve this with Logic Apps & Service Bus Topics as of today but with Event Grid this comes out of the box and makes it more easy to create certain routings
  • Trigger an Azure Data Factory when an event occurs, i.e. when a blob was added to an Azure Storage container
  • Send an event when Azure Traffic Manager detects a probe that is unhealthy

New way of handling webhook events?

When we want to provide 3rd parties to send notifications to a webhook we need to provide a public endpoint which they can call. Typically, these just take the event and queue them for later processing allowing the 3rd party to move on as we handle the event at our own pace.

The "problem" here is that we still need to host an API middleware somewhere; be it an Azure Function, Web App, Api App, etc; that just handles this message. Even if you use Azure API Management, you still need to have the middleware running behind the API Management proxy since you can't push directly to a topic.

Wouldn't it be nice if we can get rid of that host and let API Management push the requests directly to Azure Event Grid so that it can fan-out all the external notifications to the required processors?

That said, this assumes that you don't do any validation or other business logic before the webhook middleware pushes to the topic for processing. If you need this capability, you will have to stick with hosting your own middleware I'm afraid.

Unified integration between APIs

Currently when you are using webhooks inside your infrastructure the Event Publishers are often calling webhooks directly creating a spaghetti infrastructure. This is not manageable since each Event Publisher needs to have the routing logic inside their own component.

By using Azure Event Grid we can route all the events through Azure Event Grid and use it as an event broker, or routing hub if you will, and thus decoupling Event Publisher from the corresponding Event Handlers.

By doing this we can easily change the way we route events to new Event Handlers by simply changing the routing, not the routing logic in the Event Publishers.

Depending on the monitoring Azure Event Grid will provide, it can also provide a more generic approach in how we monitor all the event handling instead of using the monitoring on each component. More on this in my next blog.

Depending on the load, you can of course also use Azure Service Bus Topics but all depends on the load you are expecting. As always, it depends on the scenario; to pick which technology is best for the scenario.


Azure Event Grid is a unique service that has been added to Microsoft Azure and brings a lot to the table. It promises big performance targets and will enable new scenarios, certainly in the serverless landscape.

I'm curious to see how the service will evolve and what publishers & handlers will be coming soon. Personally, I think it's a big announcement and will give it some more thinking on how we can use it when building platforms on Microsoft Azure.

Want to learn more yourself? Here's a good Cloud Cover episode that will give you a high-level overview of Azure Event Grid or read about the concepts of Event Grid. Tip: Follow our Azure Event Grid webinar on Tuesday 19 December to learn the ins and outs!

What features would you like to see being added to the service? In what scenarios do you see Event Grid as a good fit? Feel free to mention them in the comments!

Thanks for reading,

Tom Kerkhove.

Posted on Sunday, December 10, 2017 1:08 PM

Tom Kerkhove by Tom Kerkhove

Microsoft announced Azure Data Factory v2 at Ignite bringing that enables more data integration scenarios and brings SSIS into the cloud.

Azure Data Factory is one of those services in Azure that is really great but that doesn't get the attention that it deserves.

It is a hybrid data integration service in Azure that allows you to create, manage & operate data pipelines in Azure. Basically, it is a serverless orchestrator that allows you to create data pipelines to either move, transform, load data; a fully managed Extract, Transform, Load (ETL) & Extract, Load, Transform (ELT) service if you will.

I've been using Data Factory a lot in the past year and it makes it very easy to create & manage data flows in the cloud. It comes with a wonderful monitoring experience which could be an example for other services like Azure Functions & Azure Event Grid where this would be beneficial.

However, Azure Data Factory was not perfect.

The drawbacks of Azure Data Factory

There were a couple of drawbacks & missing features when using the service:

  • Only Supports Data Slicing - The only way to schedule your data pipeline was to run every x minutes, hours or days and process the data that was in that time slice. You couldn't trigger it on demand or whatsoever.
  • No Granular Scheduling Control - No granular control on when the pipeline should be triggered in terms of calendar scheduling ie. only run the pipeline during the weekend.
  • Limited Operational Experience - Besides the Monitor-portal, the monitoring experience was very limited. It only supported sending email notifications that were triggered under certain criteria while it did not provide built-in metrics nor integration with Azure Monitor.
  • JSON All The Things - The authoring experience was limited to writing everything in JSON. However, there was also support for Visual Studio, but even there it was only to edit JSON files.
  • Learning Curve - The learning curve for new people was pretty steep. This is primarily because it was using mainly JSON and I think having a code-free experience here would make things a lot easier.

Last but not least, the most frightening factor was radio silence. And for a good reason...

Enter Azure Data Factory 2.0.

Azure Data Factory 2.0

During Ignite, Microsoft announced Azure Data Factory 2.0 that is now in public preview.

Azure Data Factory 2.0 takes data integration to the next level and comes with a variety of triggers, integration with SSIS on-prem and in Azure, integration with Azure Monitor, control flow branching and much more!

Let's have a look at a couple of new features.

Introduction of Integration Runtime

A new addition is the concept of an Integration Runtime (IR). It represents a compute infrastructure component that will be used by an Azure Data Factory pipeline will use to offer integration capabilities as close as possible to the data you need to integrate with.

Every integration runtime provides you the capability to move data, execute SSIS packages and dispatch & monitor activities and come in three different types - Hosted in AzureSelf-Hosted (either in the cloud or on-premises) or Azure-SSIS.

Here is an overview of how you can mix and match them. 

Basically, the Azure Data Factory instance itself is only in charge of storing the metadata that describes how your data pipelines will look like while at execution time it will orchestrate the processing to the Integration Runtime in specific regions to handle the effective execution.

This allows you to more easily work across regions while the execution is as close as possible.

As far as I can see, the self-hosted Integration Runtime also enables you to integrate with data that is behind a firewall without having to install an agent like you had to do in the past since everything is happening over HTTP.

Another big advantage here is that you can now run SSIS packages as part of your integration pipelines allowing you to re-use existing business intelligence that was already there, but now with the power of the cloud.

You can read more about the various Integration Runtimes in this article.

New pipeline triggers

Triggers, triggers, triggers! I think this is what excited me the most because because Data Factory only supported building data pipelines for scenarios where data slicing was used.

If you had scenarios where this was not the case, then there was no (decent) Data Factory pipeline that could help you.

The first interesting trigger: On-demand execution via a manual trigger. This can be done via .NET, PowerShell, REST or Python and it can be useful when you want to trigger a data pipeline at the end of a certain process, regardless of what the time is.

A second trigger is the scheduler trigger that allows you to define a very granular schedule for when the pipeline should be triggered. This can range from every hour to every workday at 9 AM. This allows you to still have the simple data-slicing model if you prefer that, or define more advanced scheduling if that fits your needs.

For example, we had to run pipelines only during the workweek. With v1, this is not possible and we have pipeline failures every Saturday & Sunday. With Scheduler Triggers we can change this approach and define that it should only be triggered during the week.

Another great addition is that you can now pass parameters to use in your pipeline. This can be whatever information you need, just pass it when you trigger it.

In the future, you will also be able to trigger a pipeline when a new file has arrived. However, by using the manual trigger, you could already set this up with an Azure Event Grid & Logic App as far as I see.

Last but not least - One pipeline can now also have multiple triggers. So, in theory, you could have a scheduler trigger but also trigger it manually via a REST endpoint.

It's certainly good stuff and you can find a full overview of all supported triggers here.

Data Movement, Data Transformation & Control Flow Activities

In 2.0 the concept of Activities has been seperated into three new concepts: Data MovementData Transformation Activities & Control Flow Activities.

Control Flow Activities allows you to create more reactive pipelines in that sense that you can now react on the outcome of the previous activity. This allows you to execute an activity, but only if the previous one had a specific state. This can be success, error or skipped.

This is a great addition because it allows you to compensate or rollback certain steps when the previous one failed or notify people in case it's required.

Control Flow Activities also provide you with more advanced flow controls such as For Each, Wait, If/Else, Execute other pipelines and more!

Here's a visual summary:

This tutorial gives you a nice run-through of the new control flow activities.

Authoring Experience

In the past, one of the biggest pains was authoring pipelines. Everything was in JSON and there was no real alternative besides the rest API.

In v2 however, you can use the tool that gets your job done by choosing from a variety of technologies going from .NET & Python, to pure REST or script it with PowerShell!

You can also use ARM templates that have embedded JSON files to automatically deploy your data factory.

But what I like the most is the sneak peek of the visual tooling that Mike Flasko gave at Ignite 2017:

It enables you to author pipelines by simply dragging & dropping activities in the way your business process is modeled. This abstracts away the JSON structure behind it, allowing people to jump more easily on the Data Factory band wagon.

By having this visual experience it also gives you a clear overview of how all the services tie together and are also a form of documentation to a certain degree. If a new person joins the team he can easily see the big picture.

However, this is not available yet and is only coming later next year.

Mapping data with Data Flow

One feature that is not there yet, but is coming early 2018, is the Data Flow activity that allows you to define data mappings to transform your datasets in your pipeline.

This feature is already in v1 but the great thing is that for this one you will also be able to use the code-free authoring experience where it will help you create those mappings and visualize what they will look like.

We currently use this in v1 and I have to say that it is very nice, but not easy to get there if you need to do this in JSON. This visualizer will certainly help here!

Improved Monitoring experience

As of October, the visual monitoring experience was added to the public preview which is very similar to the v1 tooling.

For starters, it lists all your pipelines and all their run history allowing you to get an overview of the health of your pipelines:

If you're interested in one particular run, you can drill deeper and see the status of each activity. Next to that, if one has failed you can get more information on what went wrong:

Next to that, you can also filter on certain attributes so that you can see only the pipelines that you're interested in.

Another great aspect is that Azure Data Factory v2 now integrates with Azure Monitor and now comes with built-in metrics such as run, activity and trigger outcomes. This allows you to configure Azure Alerts based on those and can integrate with your overall alert handling instead of only supporting email notifications. This is a very big plus for me personally!

Diagnostic logs can now also be stored in Azure Storage, send to Azure Event Hubs & analyzed in Operations Management Suite (OMS) Log Analytics!

Read more about the integration with Azure Monitor & OMS here.

Taking security to the next level

One of the most important things in software is security. In the past, every linked service had its passwords linked to it and Azure Data Factory handled this for you.

In v2, however, this approach has changed.

For starters - When you provision a new Azure Data Factory, it will automatically register a new managed Azure AD Application in the default Azure AD subscription.

This enables you not only to copy data from/to Azure Data Lake Store, it also enables you to integrate with Azure Key Vault.

By creating an Azure Key Vault linked service, you can store the credentials of all your other linked services in a vault. This gives you full control of managing the authentication keys for the external services and giving you the capability to have automatic key rolling without breaking your data factories.

Authentication with Azure Key Vault is fully managed by Data Factory based on the Azure AD Application that was created for you. The only thing you need to do is grant your AD Application access on the vault and create a linked service in your pipeline.

More information about handling credentials in Data Factory can be found in this article or read more about data movement security here.

Migration Path to v2

As of today you can already start creating new pipelines for Azure Data Factory v2 or migrate your v1 pipelines over to v2. However, this is currently a manual process and not all features from v1 are currently available such as the Data Flow.

In 2018 they will provide a tool that can migrate your v1 pipelines to v2 for you so if it's not urgent I'd suggest to sit back and wait for it to land.

Making Data Factory more robust over time

While I'm a fan of the recent changes to Azure Data Factory, I think it can be improved by adding the following features to make the total experience more robust:

  • The concept of pipeline versioning where all my pipeline definitions, regardless of how they are created, have a version stamped on it that is being displayed in the Azure/Monitor portal. That way, we can easily see if issues are related to a new version that was deployed or if something else is going on.
  • As far as I know, correlation ids are not supported yet in Azure Data Factory and would be a great addition to improve the overall operational experience even more. It would allow you to provide end-to-end monitoring which can be interesting if you're chaining multiple pipelines, or integrate with other processes outside Data Factory. In the monitoring portal, you can currently see the parameters but would be nice if you could filter on a specific correlation id and see all the related pipelines & activities for that.
  • While they are still working on the code-free authoring portal, I think they should provide the same experience in Visual Studio. It would allow us to have best of both words - A visualizer to author a pipeline, jump to the code behind for more advanced things and integrate it with source control without having to leave Visual Studio.
  • Integration with Azure Data Catalog would be really great because then we can explore our internal data catalog to see if we have any valuable data sources and connect to them without having to leave the authoring experience.

But we have to be reasonable here - Azure Data Factory v2 was only recently launched into public preview so these might be on their radar already and only come later.


The industry is moving away from using one-data-store-to-rule-them-all and is shifting to a Polyglot Persistence approach where we store the data in the data store that is best suited. With this shift comes a need to build integration pipelines that can automate the integration of all these data stores and orchestrate all of this.

Azure Data Factory was a very good start, but as I mentioned it was lacking on a couple of fronts.

With Azure Data Factory 2.0 it feels like it has matured into an enterprise-ready service that allows us to achieve this enterprise-grade data integration between all our data stores, processing, and visualization thanks to the integration of SSIS, more advanced triggers, more advanced control flow and the introduction of Integration Runtimes.

Data integration is more important than ever and Azure Data Factory 2.0 is here to help you. It was definitely worth the radio silence and I am looking forward to migrating our current data pipelines to Azure Data Factory 2.0 which allows us to simplify things.

Want to learn more about it? I recommend watching the "New capabilities for data integration in the cloud" session from Ignite.

Thanks for reading,

Tom Kerkhove.