Codit Wiki

Loading information... Please wait.

Codit Blog

Posted on Wednesday, November 15, 2017 5:51 PM

Tom Kerkhove by Tom Kerkhove

Azure Key Vault is hard but that's because you need to understand & implement the authentication with Azure AD. That's why Azure AD Managed Service Identity (MSI) now makes this a lot easier for you. There is no reason anymore not to use Azure Key Vault.

As you might know, I'm a big fan of Azure Key Vault - It allows me to securely store secrets and cryptographic keys while still having granular control on whom has access and what they can do.

Another benefit is that since all my secrets are centralized, it is easy to provide automatic rolling of authentication keys by simply updating the secrets during the process. If an application gets compromised or somebody has bad intentions, we can simply revoke their access and the secrets they have will no longer work.

If you want to learn more, you can read more in this article.

However, Azure Key Vault is heavily depending on Azure AD for handling the authentication & authorization and.

This means that in order to use Azure Key Vault, you not only need to understand how you use it, you also need to understand how AD works and what the authentication scheme is - And it ain't easy.

It is also hard to justify using Azure Key Vault as a secure store for all your secrets because instead of storing some of your secrets in an Azure Key Vault, you now need to store your AD authentication information instead. This can be via an authentication key or, preferably, a certificate that is being installed on your compute node instead.

Some actually see this as making the exposure bigger, which is true to a certain degree, because you are now basically storing the keys to the kingdom.

To conclude - Azure Key Vault itself is super easy to use, but the Azure AD part is not.

Introducing Azure AD Managed Service Identity

Azure AD Managed Service Identity (MSI) is a free turnkey solution that simplifies AD authentication by using your Azure resource that is hosting your application as an authentication proxy, if you will.

When enabling MSI, it will create an Azure AD Application for you behind the scenes that will be used as a "proxy application" which represents your specific Azure resources.

Once your application authenticates on the local authentication endpoint, it will authenticate with Azure AD by its proxy application.

This means that instead of creating an Azure AD Application and granting it access to your resource, in our case Key Vault, you will instead only grant the proxy application access.

The best thing is - This is all abstracted for you which makes things very easy. You as a developer, just need to turn on MSI, grant the application access and you're good to go.

This turn key solution makes it super easy for developers to authenticate with Azure AD without knowing the details.

As Rahul explains in his post, you can use the AzureServiceTokenProvider from the Microsoft.Azure.Services.AppAuthentication NuGet package and let the magic do the authentication for you:

It would even be better if this would be built into the KeyVaultClient in the future so that it's more easy to discover and able to turn it on without any hassle.

Big step forward, but we're not there yet

While this is currently only in public preview, it's a big step forward for making authentication with AD dead simple but we're not there yet.

  • AD Application Naming - One of the downsides is that it creates a new AD Application for you, with the same name as your Azure resource. This means that you are not able to pick an existing application or give it a descriptive name. This can be a blocker if you're using naming conventions.
  • Support for limited resources - Currently MSI is only supported for Azure VMs, App Services & Functions. There are more services to come but if you're hoping for Azure Cloud Services, this is not going to happen unfortunately. A full overview is available in the documentation.
  • Native support in Key Vault client - As mentioned before, it would be great if the Azure Key Vault SDK would support MSI out of the box without the need of doing anything ourselves from a coding perspective or need to be aware of the Microsoft.Azure.Services.AppAuthentication
  • Feature Availability - It's still in preview, if you even care about that


With the introduction of Managed Service Identity there are no more reasons why you should not be using Azure Key Vault for your application anymore. It makes it a lot easier and you should aim to move all your secrets to Azure Key Vault.

It is great to see this evolution and have an easy way to do the authentication without making it complicated.

But Azure Key Vault is not the only service that integrates with AD that works well with MSI, other services like Azure Data Lake & SQL support this as well. You can get a full overview here.

I am very thrilled about Azure AD Managed Service Identity and will certainly use this, but there are some points for improvement.

Thanks for reading,


Categories: Azure, Technology
Tags: Key Vault
written by: Tom Kerkhove

Posted on Wednesday, July 22, 2015 1:36 PM

Tom Kerkhove by Tom Kerkhove

Security is more important than ever and no day goes by without a company being hacked, a breach has been detected in some 3th party plugins or whatsoever.

We - as developers & IT Pros - are responsible for building hardened applications and securely store sensitive data as if it were our own.

In this blog post I'll talk about Azure Key Vault and how it can help you store keys and secrets such as connection strings in the cloud.

Security is and will always be very important. In the past years we've seen how Snowden revealed the activities of the NSA, how a big company as Sony can be hacked, how governments spy on each other, etc. Next to that we also have the new technologies and concepts like the Internet-of-Things, these also introduce new concerns and problems to tackle.

These events create more awareness concerning privacy, security & data ownership while end users are still using passwords like '123456' according to CNet, good luck with that.

The applications that we, as developers/ITPros, build are responsible for protecting those users their information as much as required, whatever it takes. Alas, building secure applications is not easy and requires planning & implementation from the start - It's not something that you just add at the end of development. Unfortunately, some applications still have to deal with threats such as SQL injection as Troy Hunt mentions on DotNetRocks or even storing passwords as plain text, luckily we have Have-I-Been-Pwned to notify us for these kind of breaches.

Have I Been Pwned?

There are additional aspects we need to secure in our solution i.e. where will we store the configuration values, in our web.config? How about our API keys & connection strings? While considering where to store it, how do we protect it from humans such as operators? Can we shield the information from them? When you need to add support for encryption or signing there is the additional burden of storing these keys.

It would be easy if all these sensitive secrets are stored in one central secure place.
This is just the start, hopefully these are questions you've asked yourself before.

This is exactly where Azure Key Vault comes in and helps us with some of these concerns, let's have a look how!

Introducing Azure Key Vault

Azure Key Vault is a service that enables you to store & manage cryptographic keys and secrets in one central secure vault. All the sensitive data is stored on physical hardware security modules (HSM) - FIPS 140-2 Level 2 certified - inside the datacenter where the data will be encrypted by VMs or directly on the HSM, more on this later.

A vault owner can create a Key Vault gaining full access & control over the vault. In a later release the vault owner will have an audit trail available to see who accessed their secrets & keys. They are now in full control of the key lifecycles as well, they can roll a new version of the key, back it up, etc.

A vault consumer can perform actions on the assets inside the Key Vault when the vault owner grants him/her access and depending on the permissions granted, we will discuss this more in a sec.

This enables us to give our customers full control on their sensitive data - They can decide how their key lifecycle looks and whom has access to it. Based on the audit logs they are aware of what the consumers are doing and if they are still trustworthy.

On the other hand, developers are now no longer responsible for storing sensitive data such as API tokens, certificates & encryption keys. Operators will also no longer be able to see sensitive data in the database, web.config, etc.

Feature Overview

Let's dig a little deeper in the features is provides and their constraints.

Before we do so, it's important to know that all keys & secrets are versioned allowing you to retrieve the latest or stick to a specific version. These versions are used when you f.e. change the value of a secret.


A secret is a sequence of bytes limited to 10 kB to which you can assign any value, this can be a certificate, string or whatever you want.

The consumer can save or read back values based on the name of the secret, if they have the required permissions. It basically is a Key-Value store that encrypts your data and stores it in the HSM.

It's important to know that consumers will receive value of the secret as plain-text. This means that they can do anything with these values without the vault owner knowing what they are doing, the trust boundary ends when the data is sent back and the audit log has been updated.

On the other hand, if the type of data you are sharing allows rerolling new versions the consumer will have to come back every x minutes/hours/days to fetch the latest value. You are making them dependent and the chance of losing control. Because this is something you need to consider as well, how are they storing your secrets? In cache? Database? How are these secured? Rolling secrets is a good practice you should consider.


A key is a cryptographic RSA 2048 key that consumers can use for typical key operations such as encrypt, decrypt, sign, verify, etc. Key Vault will handle all these operations for the consumers because they can't read back the value.

All keys are encrypted and stored in physical HSMs but come in two flavors:

  • Software Keys are using Azure VMs to handle operations on the keys. They are pretty cheap but less secure. These keys are typically used for dev/test scenarios.
  • HSM Keys are performing key operations directly on the HSM and thus more secure. However, these keys are more expensive and require you to use a Premium-tier vault.

A key has a higher latency than a secret, if you need to frequently use the key it is recommended to store it as a secret.

Audit Logs (Coming soon)

In the near future Azure Key Vault will also provide audit logs of whom accessed your vault and how often. These logs allow you to act based on what is happening, f.e. revoking access to someone who doesn't need access anymore or is very suspicious.

Bring-Your-Own-Key (BYOK)

Key Vault also allows you to transfer keys from your on-premises HSM up to Azure or back to your datacenter by using a secure HSM-to-HSM transfer. As an example, you can create keys on-premises and once your application goes into production in Azure you can transfer and use that key in Key Vault.


© Microsoft

If you want to know more about bring-your-own-key, I recommend this article


Azure Key Vault leverages enterprise-grade authentication & authorization by integrating with Azure Active Directory where you grant a person or application in your directory access to the vault with a specific set of permissions. However, be aware of the fact that these permissions are granted on the vault-level.

Here is a nice overview of how the authentication process works -

Key Vault Authentication Flow

© Microsoft

When you provision a Key Vault you need to change the Access Control Lists (ACL), this can be done with a simple PowerShell script.

Set-AzureKeyVaultAccessPolicy -VaultName 'Codito' -ServicePrincipalName $azureAdClientId -PermissionsToKeys encrypt -PermissionsToSecrets get

The consumer can than authenticate with Azure Active Directory by using his Account Id & Secret or his Account Id & Certificate. You then use the granted token and give it to Key Vault along with the operation you want to perform.

If you want to revoke access or simple restrict a consumer you can run the same script with less or no permissions.

This means that you can re-use your existing active directory, unfortunately this is a requirement in order to use Key Vault.


Let's have a look at some of the scenarios where you should use Azure Key Vault. As I mentioned before, it's not a silver bullet but helps you store sensitive data as good as possible.

Internal vault

First scenario is a simple one - Some applications have to use or communicate with 3th party systems or parties.

Here are two examples :

  • A database needs a connection string to know where the database is located and how it should authenticate with it.
  • An external service where you need to identify yourself in order to gain access by using a token or password such Twilio.

Where do you store these things? In a database? Nope because you don't know where it is. A common location to store it is the web.config or app.config however this is insecure and an operator can steal this data and sell it so other people can send text messages in your name.

You could use Azure Key Vault as an internal vault containing this data for you. When you then need to authenticate with Twilio you can ask your vault for your API token and use it. Ideally you would cache it and let it expire after x minutes, get it, cache it, you get the picture.

Sharing sensitive data with a third party

Another scenario is where a third party grants you access to their assets, in this example a database.

As mentioned in the previous section there are a lot of ways to store a connection string but this means that the 3th party needs to trust you with that information and they have no clue on how you store it. Here we are just storing it in the app settings in plain text.
Basic Scenario without Key Vault

© Microsoft

However, the customer could give you the same information by creating & sharing it as part of their Key Vault. This makes certain that the data is stored in a secure manner and they have an audit trail of how you interact with the service. If they don't like what you are doing, they can still revoke your access.
Sharing senstivite data with third party scenario

© Microsoft

Important thing to note is that when you as the consumer get the value of the secret, you get it as plain text and the customer has to trust you with it. You can still save it in a file or cache it or whatsoever. On the other hand, the customer is more confident of how the secret data is stored and they have full control over it. If it were a rollable key they could implement an automatic roll system as we will see in a minute.

Multi-tenancy scenario

Key Vault can be used in a multi-tenancy scenario as well where we use the first to scenarios to build a trustworthy relationship with the customer. They can share their sensitive data, here a Azure Storage key, by allowing us to retrieve it from their Azure Key Vault. We, as a service, store the Azure Active Directory authentication for consuming their vault in an internal vault.

Multi-Tenantcy scenario

I'll walk you through the process of how it could work :

  1. The customer provisions a new Azure Key Vault
  2. They create an Azure Active Directory entity for us and set the ACLs on the Key Vault
  3. Codito signs up for our service giving us the AD Id & secret and the names of the secrets we need.
  4. We store the authentication data in our internal
  5. The service stores the names of the customers secrets in a datastore, here Azure SQL Database
  6. Our service authenticates with Azure AD to gain a access token
  7. We request the value for the secrets by passing the access token

Automatically roll keys

Last but not least there is the scenario where you want to automatically re-roll your keys without breaking your running applications. Dushyant Gill actually write a very nice article on how you can automatically roll your Azure Storage Key without breaking any applications.

Storing the vault authentication secrets

While these are only some of the possible scenarios they share a common issue - How do you store your authentication data to your Key Vault.

Well that's a hard one, as mentioned before you have two types of authentication - With a password key or a certificate. Personally, using a certificate seems like the way to go. It's easier to securely store this than a password key and easier to shield from people as well, they have to know where to look as well.

Although this does not get rid of the exposure entirely, it limits the exposure and stores most of the data in a more secure way.

Integration with Azure services

You can use Azure Key Vault to store your keys and use them in other Azure services.

  • SQL Server Encryption in Azure VM (Preview) - When using SQL Server Enterprise you can use Azure Key Vault as a SQL Server connector as an extensible key management provider. This allows you to use a key from Key Vault for Transparent Data Encryption (TDE), Column Level Encryption (CLE) & Backup encryption. This is also a feature you can use on-premises as well. More information here.

  • Azure Storage client side encryption (Preview) - You can now encrypt data before uploading to Azure Storage or decrypt while downloading. The SDK allows you to use keys from your existing Key Vault so you can manage them as you want. More information here.

  • VM Encryption with CloudLink - CloudLink allows you to encrypt and decrypt VMs while using Key Vault as a key repository. More information here

And there is even more, a full list can be found here.

Vault Management & Tooling

Management of your vault such as provisioning a new one or setting the ACLs can be done with PowerShell scripts or using the Azure CLI for Linux & Mac. Here is a PowerShell script that outline some of the Key Vault cmdlets you can use.

If you go to the portal you can provision a new Key Vault by clicking New > Management > Key Vault > oh wait, it's not ready yet!

Portal - Provision a Azure Key Vault

Fair enough, in the end it's a secondary service that is focused for enterprises, scripting such a thing are a good practice.

From a consumers perspective you can use the REST API, .Net libraries on NuGet or the preview SDK for Node.js with more in the works.

Vault Isolation

A Key Vault is dedicated to one specific region and thus you will not be able to consult data from within a different region. All the secrets and keys will be stored in physical HSMs in that specific region, the data will never leave that geographic region.

Certain countries have laws demanding that data should never leave the region, the same goes for compliances. When you deploy your application across regions this means you will have a Key Vault per region with the same structure of keys and secrets. The keys will all be different but it can happen that a secret contains the same value across regions such as a Twilio API key.

Thinking about disaster recovery

This limitation can cause some headaches when you are planning for disaster recovery. If your deployments in one region go down you still want to offer an alternative to your customers.

A possibility to cope with this is to set up manual synchronization for the secrets that are not region-specific. As an example, if we have a Twilio API key and an Azure Storage account key in our vault we would only want to synchronize the API key so we have to update one "master" vault.

Vault Replication

Unfortunately, if you are heavily using keys there is no option for DR.

If you are limited to one region this will not be applicable for your scenario.

Thinking about pricing

So who pays for everything? It's pretty simple.

If you're the owner of the vault than you pay for everything while vault consumers don't have to pay for anything.

This means that if you have a chatty consumer the cost for the vault is increasing without having control over it. Luckily the price are defined per 10,000 operations and are really low.

At the time of writing, you will be charged €0.0224 per 10,000 operations on a software key or secret while for HSM keys you also have to pay €0.7447 for each key and version of a key in your vault.

If you want to have a complete overview, here's an overview.

Azure Key Vault is now general available!

As of the 24th of June, Key Vault is now general available meaning that you can use it in production environments and is backed with a 99.9% SLA and Azure Support Plan.

You can read the announcement here.


Azure Key Vault has a lot to offer and helps developers store sensitive data as good as possible while the data owner has full control and proof of who is using their data.

However, Azure Key Vault is not a silver bullet and it was only build for secrets & keys but it helps us a lot. In my opinion, every new project running in Azure should use Key Vault for optimal security around these kind of sensitive data and setup automatically rerolling of authentication keys where possible.

There was also an interesting session at Ignite I recommend if you want to know more about Key Vault.

The question is not if you will be hacked, but when.

Thanks for reading,


Categories: Azure
Tags: Key Vault
written by: Tom Kerkhove

Posted on Sunday, December 10, 2017 1:08 PM

Tom Kerkhove by Tom Kerkhove

Microsoft announced Azure Data Factory v2 at Ignite bringing that enables more data integration scenarios and brings SSIS into the cloud.

Azure Data Factory is one of those services in Azure that is really great but that doesn't get the attention that it deserves.

It is a hybrid data integration service in Azure that allows you to create, manage & operate data pipelines in Azure. Basically, it is a serverless orchestrator that allows you to create data pipelines to either move, transform, load data; a fully managed Extract, Transform, Load (ETL) & Extract, Load, Transform (ELT) service if you will.

I've been using Data Factory a lot in the past year and it makes it very easy to create & manage data flows in the cloud. It comes with a wonderful monitoring experience which could be an example for other services like Azure Functions & Azure Event Grid where this would be beneficial.

However, Azure Data Factory was not perfect.

The drawbacks of Azure Data Factory

There were a couple of drawbacks & missing features when using the service:

  • Only Supports Data Slicing - The only way to schedule your data pipeline was to run every x minutes, hours or days and process the data that was in that time slice. You couldn't trigger it on demand or whatsoever.
  • No Granular Scheduling Control - No granular control on when the pipeline should be triggered in terms of calendar scheduling ie. only run the pipeline during the weekend.
  • Limited Operational Experience - Besides the Monitor-portal, the monitoring experience was very limited. It only supported sending email notifications that were triggered under certain criteria while it did not provide built-in metrics nor integration with Azure Monitor.
  • JSON All The Things - The authoring experience was limited to writing everything in JSON. However, there was also support for Visual Studio, but even there it was only to edit JSON files.
  • Learning Curve - The learning curve for new people was pretty steep. This is primarily because it was using mainly JSON and I think having a code-free experience here would make things a lot easier.

Last but not least, the most frightening factor was radio silence. And for a good reason...

Enter Azure Data Factory 2.0.

Azure Data Factory 2.0

During Ignite, Microsoft announced Azure Data Factory 2.0 that is now in public preview.

Azure Data Factory 2.0 takes data integration to the next level and comes with a variety of triggers, integration with SSIS on-prem and in Azure, integration with Azure Monitor, control flow branching and much more!

Let's have a look at a couple of new features.

Introduction of Integration Runtime

A new addition is the concept of an Integration Runtime (IR). It represents a compute infrastructure component that will be used by an Azure Data Factory pipeline will use to offer integration capabilities as close as possible to the data you need to integrate with.

Every integration runtime provides you the capability to move data, execute SSIS packages and dispatch & monitor activities and come in three different types - Hosted in AzureSelf-Hosted (either in the cloud or on-premises) or Azure-SSIS.

Here is an overview of how you can mix and match them. 

Basically, the Azure Data Factory instance itself is only in charge of storing the metadata that describes how your data pipelines will look like while at execution time it will orchestrate the processing to the Integration Runtime in specific regions to handle the effective execution.

This allows you to more easily work across regions while the execution is as close as possible.

As far as I can see, the self-hosted Integration Runtime also enables you to integrate with data that is behind a firewall without having to install an agent like you had to do in the past since everything is happening over HTTP.

Another big advantage here is that you can now run SSIS packages as part of your integration pipelines allowing you to re-use existing business intelligence that was already there, but now with the power of the cloud.

You can read more about the various Integration Runtimes in this article.

New pipeline triggers

Triggers, triggers, triggers! I think this is what excited me the most because because Data Factory only supported building data pipelines for scenarios where data slicing was used.

If you had scenarios where this was not the case, then there was no (decent) Data Factory pipeline that could help you.

The first interesting trigger: On-demand execution via a manual trigger. This can be done via .NET, PowerShell, REST or Python and it can be useful when you want to trigger a data pipeline at the end of a certain process, regardless of what the time is.

A second trigger is the scheduler trigger that allows you to define a very granular schedule for when the pipeline should be triggered. This can range from every hour to every workday at 9 AM. This allows you to still have the simple data-slicing model if you prefer that, or define more advanced scheduling if that fits your needs.

For example, we had to run pipelines only during the workweek. With v1, this is not possible and we have pipeline failures every Saturday & Sunday. With Scheduler Triggers we can change this approach and define that it should only be triggered during the week.

Another great addition is that you can now pass parameters to use in your pipeline. This can be whatever information you need, just pass it when you trigger it.

In the future, you will also be able to trigger a pipeline when a new file has arrived. However, by using the manual trigger, you could already set this up with an Azure Event Grid & Logic App as far as I see.

Last but not least - One pipeline can now also have multiple triggers. So, in theory, you could have a scheduler trigger but also trigger it manually via a REST endpoint.

It's certainly good stuff and you can find a full overview of all supported triggers here.

Data Movement, Data Transformation & Control Flow Activities

In 2.0 the concept of Activities has been seperated into three new concepts: Data MovementData Transformation Activities & Control Flow Activities.

Control Flow Activities allows you to create more reactive pipelines in that sense that you can now react on the outcome of the previous activity. This allows you to execute an activity, but only if the previous one had a specific state. This can be success, error or skipped.

This is a great addition because it allows you to compensate or rollback certain steps when the previous one failed or notify people in case it's required.

Control Flow Activities also provide you with more advanced flow controls such as For Each, Wait, If/Else, Execute other pipelines and more!

Here's a visual summary:

This tutorial gives you a nice run-through of the new control flow activities.

Authoring Experience

In the past, one of the biggest pains was authoring pipelines. Everything was in JSON and there was no real alternative besides the rest API.

In v2 however, you can use the tool that gets your job done by choosing from a variety of technologies going from .NET & Python, to pure REST or script it with PowerShell!

You can also use ARM templates that have embedded JSON files to automatically deploy your data factory.

But what I like the most is the sneak peek of the visual tooling that Mike Flasko gave at Ignite 2017:

It enables you to author pipelines by simply dragging & dropping activities in the way your business process is modeled. This abstracts away the JSON structure behind it, allowing people to jump more easily on the Data Factory band wagon.

By having this visual experience it also gives you a clear overview of how all the services tie together and are also a form of documentation to a certain degree. If a new person joins the team he can easily see the big picture.

However, this is not available yet and is only coming later next year.

Mapping data with Data Flow

One feature that is not there yet, but is coming early 2018, is the Data Flow activity that allows you to define data mappings to transform your datasets in your pipeline.

This feature is already in v1 but the great thing is that for this one you will also be able to use the code-free authoring experience where it will help you create those mappings and visualize what they will look like.

We currently use this in v1 and I have to say that it is very nice, but not easy to get there if you need to do this in JSON. This visualizer will certainly help here!

Improved Monitoring experience

As of October, the visual monitoring experience was added to the public preview which is very similar to the v1 tooling.

For starters, it lists all your pipelines and all their run history allowing you to get an overview of the health of your pipelines:

If you're interested in one particular run, you can drill deeper and see the status of each activity. Next to that, if one has failed you can get more information on what went wrong:

Next to that, you can also filter on certain attributes so that you can see only the pipelines that you're interested in.

Another great aspect is that Azure Data Factory v2 now integrates with Azure Monitor and now comes with built-in metrics such as run, activity and trigger outcomes. This allows you to configure Azure Alerts based on those and can integrate with your overall alert handling instead of only supporting email notifications. This is a very big plus for me personally!

Diagnostic logs can now also be stored in Azure Storage, send to Azure Event Hubs & analyzed in Operations Management Suite (OMS) Log Analytics!

Read more about the integration with Azure Monitor & OMS here.

Taking security to the next level

One of the most important things in software is security. In the past, every linked service had its passwords linked to it and Azure Data Factory handled this for you.

In v2, however, this approach has changed.

For starters - When you provision a new Azure Data Factory, it will automatically register a new managed Azure AD Application in the default Azure AD subscription.

This enables you not only to copy data from/to Azure Data Lake Store, it also enables you to integrate with Azure Key Vault.

By creating an Azure Key Vault linked service, you can store the credentials of all your other linked services in a vault. This gives you full control of managing the authentication keys for the external services and giving you the capability to have automatic key rolling without breaking your data factories.

Authentication with Azure Key Vault is fully managed by Data Factory based on the Azure AD Application that was created for you. The only thing you need to do is grant your AD Application access on the vault and create a linked service in your pipeline.

More information about handling credentials in Data Factory can be found in this article or read more about data movement security here.

Migration Path to v2

As of today you can already start creating new pipelines for Azure Data Factory v2 or migrate your v1 pipelines over to v2. However, this is currently a manual process and not all features from v1 are currently available such as the Data Flow.

In 2018 they will provide a tool that can migrate your v1 pipelines to v2 for you so if it's not urgent I'd suggest to sit back and wait for it to land.

Making Data Factory more robust over time

While I'm a fan of the recent changes to Azure Data Factory, I think it can be improved by adding the following features to make the total experience more robust:

  • The concept of pipeline versioning where all my pipeline definitions, regardless of how they are created, have a version stamped on it that is being displayed in the Azure/Monitor portal. That way, we can easily see if issues are related to a new version that was deployed or if something else is going on.
  • As far as I know, correlation ids are not supported yet in Azure Data Factory and would be a great addition to improve the overall operational experience even more. It would allow you to provide end-to-end monitoring which can be interesting if you're chaining multiple pipelines, or integrate with other processes outside Data Factory. In the monitoring portal, you can currently see the parameters but would be nice if you could filter on a specific correlation id and see all the related pipelines & activities for that.
  • While they are still working on the code-free authoring portal, I think they should provide the same experience in Visual Studio. It would allow us to have best of both words - A visualizer to author a pipeline, jump to the code behind for more advanced things and integrate it with source control without having to leave Visual Studio.
  • Integration with Azure Data Catalog would be really great because then we can explore our internal data catalog to see if we have any valuable data sources and connect to them without having to leave the authoring experience.

But we have to be reasonable here - Azure Data Factory v2 was only recently launched into public preview so these might be on their radar already and only come later.


The industry is moving away from using one-data-store-to-rule-them-all and is shifting to a Polyglot Persistence approach where we store the data in the data store that is best suited. With this shift comes a need to build integration pipelines that can automate the integration of all these data stores and orchestrate all of this.

Azure Data Factory was a very good start, but as I mentioned it was lacking on a couple of fronts.

With Azure Data Factory 2.0 it feels like it has matured into an enterprise-ready service that allows us to achieve this enterprise-grade data integration between all our data stores, processing, and visualization thanks to the integration of SSIS, more advanced triggers, more advanced control flow and the introduction of Integration Runtimes.

Data integration is more important than ever and Azure Data Factory 2.0 is here to help you. It was definitely worth the radio silence and I am looking forward to migrating our current data pipelines to Azure Data Factory 2.0 which allows us to simplify things.

Want to learn more about it? I recommend watching the "New capabilities for data integration in the cloud" session from Ignite.

Thanks for reading,

Tom Kerkhove.