wiki

Codit Wiki

Loading information... Please wait.

Codit Blog

Posted on Thursday, April 27, 2017 5:22 PM

Toon Vanhoutte by Toon Vanhoutte

This blog post dives into the details of how you can achieve batching with Logic Apps. Batching is still a highly demanded feature for a middle-ware layer. It's mostly introduced to reduce the performance impact on the target system or for functional purposes. Let's have a closer look.

Scenario

For this blog post, I decided to try to batch the following XML message.  As Logic Apps supports JSON natively, we can assume that a similar setup will work quite easily for JSON messages.  Remark that the XML snippet below contains an XML declaration, so pure string appending won't work.  Also namespaces are included.

Requirements

I came up with the following requirements for my batching solution:

  • External message store: in integration I like to avoid long-running workflow instances at all time. Therefore I prefer messages to be stored somewhere out-of-the-process, waiting to be batched, instead of keeping them active in a singleton workflow instance (e.g. BizTalk sequential convoy).

  • Message and metadata together: I want to avoid to store the message in a specific place and the metadata in another one. Keep them together, to simplify development and maintenance.

  • Native Logic Apps integration: preferably I can leverage an Azure service, that has native and smooth integration with Azure Logic Apps. It must ensure we can reliably assign messages to a specific batch and we must be able to remove them easily from the message store.

  • Multiple batch release triggers: I want to support multiple ways to decide when a batch can be released.
    > # Messages: send out batches containing each X messages
    > Time: send out a batch at a specific time of the day
    > External Trigger: release the batch when an external trigger is receive

Solution

After some analysis, I was convinced that Azure Service Bus queues are a good fit:

  • External message store: the messages can be queued for a long time in an Azure Service Bus queue.

  • Message and metadata together: the message is placed together with its properties on the queue. Each batch configuration can have its own queue assigned.

  • Native Logic Apps integration: there is a Service Bus connector to receive multiple messages inside one Logic App instance. With the peak-lock pattern, you can reliably assign messages to a batch and remove them from the queue.

  • Multiple batch release triggers:
    > # Messages: In the Service Bus connector, you can choose how many messages you want to receive in one Logic App instance

    > Time
    : Service Bus has a great property ScheduledEnqueueTimeUtc, which ensures that a message becomes only visible on the queue from a specific moment in time. This is a great way to schedule messages to be releases at a specific time, without the need for an external scheduler.

    > External Trigger
    : The Logic App can be easily instantiated via the native HTTP Request trigger

 

Implementation

Batching Store

The goal of this workflow is to put the message on a specific queue for batching purpose.  This Logic App is very straightforward to implement. Add a Request trigger to receive the messages that need to be batched and use the Send Message Service Bus connector to send the message to a specific queue.

In case you want to release the batch only at a specific moment in time, you must provide a value for the ScheduledEnqueueTimeUtc property in the advanced settings.

Batching Release

This is the more complex part of the solution. The first challenge is to receive for example 3 messages in one Logic App instance. My first attempt failed, because there is apparently a different behaviour in the Service Bus receive trigger and action:

  • When one or more messages arrive in a queue: this trigger receives messages in a batch from a Service Bus queue, but it creates for every message a specific Logic App instance. This is not desired for our scenario, but can be very useful in high throughput scenarios.

  • Get messages from a queue: this action can receive multiple messages in batch from a Service Bus queue. This results in an array of Service Bus messages, inside one Logic App instance. This is the result that we want for this batching exercise!

Let's use the peak-lock pattern to ensure reliability and receive 3 messages in one batch:

As a result, we get this JSON array back from the Service Bus connector:

The challenge is to parse this array, decode the base64 content in the ContentData and create a valid XML batch message from it.  I tried several complex Logic App expressions, but realized soon that Azure Functions is better suited to take care of this complicated parsing.  I created the following Azure Fuction, as a Generic Webhook C# type:

Let's consume this function now from within our Logic App.  There is seamless integration with Logic Apps, which is really great!


As an output of the GetBatchMessage Azure Funtion, I get the following XML :-)

Large Messages

This solution is very nice, but what with large messages? Recently, I wrote a Service Bus connector that uses the claim check pattern, which exchanges large payloads via Blob Storage. In this batching scenario we can also leverage this functionality. When I have open sourced this project, I'll update this blog with a working example.  Stay tuned for more!

Conclusion

This is a great and flexible way to perform batching within Logic Apps. It really demonstrates the power of the Better Together story with Azure Logic Apps, Service Bus and Functions. I'm sure this is not the only way to perform batching in Logic Apps, so do not hesitate to share your solution for this common integration challenge in the comments section below!

I hope this gave you some fresh insights in the capabilities of Azure Logic Apps!
Toon

Categories: Azure
Tags: Logic Apps
written by: Toon Vanhoutte

Posted on Wednesday, April 26, 2017 4:17 PM

Pim Simons by Pim Simons

Since the introduction of BizTalk 2013 R2, Microsoft has supplied an out of the box JSON encoder pipeline component. I’ve used this component many times in the past, but recently ran into an issue while using this component.

The issue popped up at one or our projects, where we had to deliver a JSON file according to the specifications of an external party. The schema had multiple fields defined as decimal, but for some reason some of the decimals came out as strings. The difference is that the decimal value does not have quotes surrounding the actual value.
To recreate the issue, I created a very simple schema (which is specified below) and a send pipeline containing only the out of the box JSON Encoder.

I've chosen to base this scenario on receiving an XML file and sending a JSON file. For this I created a simple messaging-only solution with a file-based Receive Port and file-based Send Port, where the routing is done based on BTS.ReceivePortName. To test this setup I used the following test message.



This is where the issue shows itself. The JSON that is sent by BizTalk is not equal to the expected JSON output. See the comparison and the highlighted difference below.

This is very strange behavior, since both Level1/Field1 and Level1/Field2 are specified as a decimal, and yet Field1 is parsed as a string and Field2 is parsed as a decimal.
The important thing to note is that I have an element called “Field1” on multiple levels in the schema, the first has the type string, the second one has the type decimal.
What appears to be happening is that if you have multiple nodes on different levels in your schema the JSON Encoder always takes the type of the first occurrence of a node with the same name. In our case the first time ”Field1” occurs in our schema it is defined as a string and this is why in our output the second occurrence of the “Field1” node is incorrectly written as a string.
To prove this behavior I renamed the second occurrence of the “Field1” node to “Field3”, this time the output was as expected.

This obviously can be fixed very easily by renaming the fields. However I often find myself in the situation that the XSD cannot be changed as it is defined by an external party. It turns out that the out of the box JSON Encoder uses an old version of the Newtonsoft.Json library which I cannot find in the the Newtonsoft.Json respository on GitHub, so it probably is a Microsoft fork of the Newtonsoft.Json library.

This was all developed and tested on a BizTalk 2016 machine, but I suspect this bug has been present since the introduction of the out of the box JSON Encoder pipeline component with BizTalk 2013R2.

To solve this issue I had to write my own custom JSON Encoder pipeline component where I used the latest version of the Newtonsoft.Json library.

In fact, this issue has been raised to Microsoft via the BizTalk Server uservoice pages. You can find the topic here. If you agree, go there, and show your support by voting for this issue. 

Categories: BizTalk
written by: Pim Simons

Posted on Tuesday, April 25, 2017 11:02 AM

Stijn Degrieck by Stijn Degrieck

"Europe is far too dependent on Microsoft." I thought I accidentally clicked on an old article, perhaps from the end of the last century. At that time, Microsoft was in trouble for abusing its dominant market position to stave off competition. It was the start of a series of legal battles both in the States and in Europe, culminating in the Windows Media Player saga. You know, that thing you may have used to watch video on a pc, if you didn’t skip it entirely because you belong to the YouTube generation. Microsoft was fined a massive sum by Europe in 2004, but continued to resist strongly until 2012. In the end, they subsided. Or that is what we would like to believe.

Back to today. According to a group of research journalists, the intensive collaboration with Microsoft makes Europe vulnerable, for instance because our data is in the hands of an American company. And we would regret that, now that our American allies seem less steadfast. A German Euro parliament member called for immediate action to force the mighty Microsoft to its knees. By comparing IT with aviation, where Europe broke Boeing’s dominance with the launch of the Airbus, he called for an "ICT Airbus". Nice one liner, and maybe a beautiful dream for European chauvinists, but utter nonsense in the end.

The world in the 1970s cannot be compared to the here and now. Of course, technological innovations were made and we pushed forward, but the rate of change was lower and the impact was much smaller. Moore's Law, anyone?

Changing a sector is not the same as overthrowing a whole economy. It shows little insight into our connected and globalized society to propose such a change of mind. And it's out of touch with reality: in spite of earlier attempts to control Microsoft, it is still one of the world's largest (IT) companies. Like it or not, the whole world has been running on Windows for 30 years.

Another question is whether Microsoft is really such a patriotic American company. Ultra large companies like Facebook, Google and Amazon do not only transcend geographic boundaries, but mental boundaries as well. Wasn’t Facebook called 'the largest country in the world' because it has more 'residents' than China? Globalization on that scale questions all the old paradigms, which our politicians love for obvious reasons.

Large companies tend to be very committed to their 'citizens'. They have an eye for local needs and expectations. For example, Microsoft has worldwide data centers to ensure quality of service and data protection. The company was recently proved right in a lawsuit by a magistrate in New York. He had summoned the company to supply data (e-mails) from an Irish-based server as part of an investigation. Microsoft won the plea, with the full support of the Irish government.

To the current CEO Satya Nadella, a man born in India, Microsoft is not so much a business as an ecosystem. He wants to build the world's best cloud platform, open to anyone, at any time and any location. And he does what he can to fulfill that promise. For example, Microsoft's employees are leading the ranking on Github, an online platform for open source developers who share code with the community. No one has more active developers on that platform than Microsoft. Not even Facebook and Google. And still, we tend to fear Microsoft.

Fear is a bad counselor and protectionism is a weak strategy. The only question that really matters to Europe is: how do we make sure that the next Microsoft, Google or Facebook has its roots in European soil? That is, if you see yourself as a European rather than a world citizen.

Note: This opinion was first published on SmartBiz.be on 20 April 2017 (in Dutch). 

Categories: Opinions
Tags: Microsoft
written by: Stijn Degrieck

Posted on Monday, April 17, 2017 3:18 PM

Luis Delgado by Luis Delgado

Dates are always important, but in the context of IoT projects they are even more relevant. The reason for this is because IoT clients are mostly human-less terminals, machines with no understanding of time. For example, if a client application shows the end-user a wrong date, the user will sooner or later see the problem and correct it. Machines will never identify a date as being incorrect, so the problem can become endemic to your solution and go without notice for a long time.

Having incorrect dates will screw up your data. Not knowing the point in time at which a data observation was recorded will render any historical and time-series analysis useless. Hence, we at Codit spend significant time making sure that the definition, serialization and interpretation of time is correct from the very beginning of the IoT value chain. The following are some basic principles for achieving this.

Add a gateway timestamp to all data observations

In general, we assume that data observations generated by machines will be accompanied by a timestamp generated by the originating machine. This is generally true. However, we have noted that the clocks of machines cannot be trusted. This is because, in general, operators of equipment place little importance to the correctness of a machine’s internal clock. Typically, machines do not need to have precise clocks to deliver the function they were designed for. We have seen machines in the field transmit dates with the wrong time offset, the wrong day, and even the wrong year. Furthermore, most machines are not connected to networks outside their operational environment, meaning they have no access to an NTP server to reliably synchronize their clocks.

If you connect your machines to the Internet through a field gateway, we highly recommend you to add a receivedInGateway timestamp upon receiving a data point at the gateway. Gateways have to be connected to the Internet, they have access to NTP clocks and can generally provide reliable DateTime timestamps.

A gateway timestamp can even allow you to rescue high-resolution observations that are plagued by a machine with an incorrect clock. Suppose, for example, that you get the following data in your cloud backend:

You can see that the originating machine’s clock is wrong. You can also see that the datetime stamps are being sent with sub-second precision. You cannot trust the sub-second precision at the "receivedInGateway" value because of network latency. However, you can safely assume the sub-second precision at the machine is correct, and you can use the gateway’s timestamp to correct the wrong datetimes for high-precision analysis (in this case, the .128 and .124 sub-second measurements).

Enforce a consiste DateTime serialization format

Dates can become very complicated very quickly. Take a look at the following datetime representations:

  • 2017–04–15T11:40:00Z: follows ISO8601 serialization format
  • Sat Apr 15 2017 13:40:00 GMT+0200 (W. Europe Daylight Time): typical way dates are serialized in the web
  • 04/15/2017 11:40:00: date serialization in American culture
  • 15/04/2017 13:40:00GMT+0200

All of these dates represent the same point in time. However, if you get a mixture of these representations in your data set, your data scientists will probably spend a significant amount of hours cleaning the datetime mess inside your data set.

We recommend our customers to standardize their datetime representations using the ISO8601 standard:

YYYY-MM-DDTHH:mm:ss.sssZ

This is probably the only datetime format that the web has defined as de facto, and is even documented by the ECMA Script body HERE.

Note the "Z" at the end of the string. We recommend customer to always transmit their dates in Zulu time. This is because analytics is done easier when you can assume that all time points belong to the same time offset. If that were not the case, your data team will have to write routines to normalize the dates in the data set. Furthermore, Zulu time does not suffer from time jumping scenarios for geographies that switch summer time on and off during the year.

(By the way, for those of you wondering, Zulu time, GMT and UTC time are, for practical purposes, the same thing. Also, none of them observe daylight saving changes).

At the very least, if they don’t want to use UTC time, we ask customers to add a correct time offset to their timestamps:

2017-04-15T13:40:00+02:00

However, in the field, we typically find timestamps with no time offset, like this:

2017-04-15T13:40:00

The problem with datetimes without a time offset is that, by definition, they have to be interpreted as local time. This is relatively easy to manage when working on a client/server application, where you can use the local system time (PC or Server). However, since a lot of IoT is related to analytics, it will be close to impossible to determine the correct point of time of a data observation whose timestamp does not include a time offset.

Make sure that your toolset supports the DateTime serialization format

This might sound trivial, but sometimes you do find quirky implementations of the ISO8601 among software vendors. For instance, as of this writing, Microsoft Azure SQL Server partially supports ISO8601 as serialization format for DateTime2 types. However, this applies only to the ISO8601 literal format. The compact format of ISO8601 is not supported by SQL Server. So if you do depend on SQL for your analytics and storage, make sure you don’t standardize on ISO8601 compact form.

Conclusion

Dates are easy for humans to interpret, but they can be quite complex to deal with in computer systems. Don’t let the trivialness of dates (from a human perspective) fool you into underestimating the importance of defining proper DateTime standardized practices. In summary:

  • Machine clocks cannot be trusted. If you are using a field gateway, make sure you add a gateway timestamp.
  • Standardize on a commonly-understood datetime serialization format, such as the ISO8601
  • Make sure your date serialization includes a time offset.
  • Prefer to work with Zulu/UTC/GMT times instead of local times.
  • Ensure your end-to-end tooling supports the datetime serialization format you have selected.
Categories: Technology
Tags: IoT
written by: Luis Delgado

Posted on Friday, April 14, 2017 1:27 PM

Tom Kerkhove by Tom Kerkhove

As you might have noticed, a few months ago Codit Belgium moved to a new brand office in Zuiderpoort near the center of Ghent.

Because of that, we've built an internal visitor system running on Azure.
Keep on reading to learn all about it!

As you might have noticed, a few months ago Codit Belgium moved to a new brand office in Zuiderpoort near the center of Ghent.

One of the center pieces, and my favorite, is our Codit Wall of Employees: 


For these new offices Codit had a need for a visitor system that allows external people to check-in, notify employees that their visitor arrived, etc. The biggest requirement was the ability to list all the external people currently in the office for scenarios such as when there is a fire.

That's how Alfred came to life, our personal butler that assists you when you arrive in our office.

Thanks to our cloudy visitor platform in Microsoft Azure, codenamed Santiago, Alfred is able to assist our visitors but also provide reporting on whom is in the building, sending notifications, etc.

We started off with our very own Codit Hackaton - Dedicated teams were working features and got introduced to new technologies and more experienced colleagues were teaching others how to achieve their goal.

Every Good Backend Needs A Good Frontend

For Alfred, we chose to use a Universal Windows Platform (UWP) app that is easy to use for our visitors. To avoid that people are messing with our Surface we are even running it in Kiosk-mode.

Behind the scenes, Alfred just communicates with our backend via our internal API catalog served by Azure API Management (APIM going forward).

This makes sure that Alfred can easily authenticate via a subscription key towards Azure API Management where after Azure APIM just forwards the request to our physical API by authenticating with a certificate. This allows us to fully protect our physical API while consumers can still easily authenticate with Azure APIM.

The API is the façade to our "platform" that allows visitors to check-in and check-out, send notifications upon check-in, provide a list of all offices and employees, etc. It is hosted as a Web App sharing the same App Service Plan on which our Lunch Order website is running to optimize costs.

We are using Swagger to document the API for a couple of reasons:

  1. It is crucial that we provide a self-explanatory API that enables developers to see what the API offers at a glance and what to expect. As of today, only Alfred is using it but if a colleague wants to build a new product on top of the API or needs to change the platform, everything should be clear.
  2. Using Swagger enables us to make the integration with Azure API Management easier as we can create Products by importing the Swagger.

Storing Company Metadata in Azure Document DB

The information about the company is provided by Azure Document DB where we use a variety of documents that describe what offices we have, whom is working at Codit, what their preferred notification configuration is, etc.

We are using a simple structure where each type of information that we store has a dedicated document of a specific type that we link to each other grouped in one collection. By using only one collection we can group all the relevant company metadata in one place and save costs since Azure bills for RUs per collection.

As an example, we currently have an Employee-document for myself where we have a dedicated Notification Configuration-document that describes the notification I've configured. If I were to have notifications configured for both Slack and SMS messages, that means there will be two documents stored.

This allows us to easily remove and add documents for each configured notification configuration for a specific employee of using one document dedicated per employee and updating specific sections which makes it more cumbersome.

As of today, this is all static information but in the future, we will provide a synchronization process between Azure Document DB and our Azure AD. This will remove the burden of keeping our metadata up-to-date so that when somebody joins or leaves Codit we don't have to manually update it.

Housekeeping For Our Visitors

For each new visitor that arrives we want to make their stay as comfortable as possible. To achieve this, we do some basic housekeeping now, but plan to extend this in the future.

Nowadays when a visitor is registered we keep persisting an entry in Azure Table Storage for that day & visitor so that our reporting knows whom entered our office. After that we track a custom event in Azure Application Insights with some context about the visit and publish the event on an Azure Service Bus Topic. This allows us to be very flexible in how we process such an event and if somebody wants to extend the current setup they can just add a new subscription on the topic.

Currently we handle each new visitor with a Logic App that will fetch the notification configuration for the employee he has a meeting with and notify him on all the configured ways we support; that can be SMS, email and/or Slack.

Managing The Platform

For every software product, it comes without saying that it should also be easy to maintain and operate the platform once it is running. To achieve this, we use a combination of Azure Application Insights, Azure Monitor and Logic Apps.

Our platform telemetry is being handled by Azure Application Insights where we send specific traces, track requests, measure dependencies and log exceptions, if any. This enable us to have one central technical dashboard to operate the platform where we can use Analytics-feature to dive deeper into issues. In the future we will even add Release Annotations to our release pipeline to easily detect performance impact on our system.

Each resources has a certain set of Azure Alerts configured in Azure Monitor that will trigger a webhook that is hosted by an Azure Logic App instance. This consolidates all the event handling in one central place and provides us with the flexibility to handle them how we want, without having to change each alert's configuration.

Securing what matters

At Codit; building secure solutions is one of our biggest priorities, if not the biggest. To achieve this, we are using Azure Key Vault to store all our authentication keys such as Document DB key, Service Bus keys, etc. so that only the people and applications can access them while keeping track of when and how frequent they access them.

Each secret is automatically being regenerated by using Azure Automation where every day we will create new keys and store the new key in the secret. By doing this the platform will always use the latest version and leaked information becomes invalid allowing us to reduce the risk.

One might say that this platform is not considered a risk for leaking information but we've applied this pattern because in the end, we store personal information about our employees and it is a good practice to be as secure as possible. Applying this approach takes a minimal effort, certainly if you do this early in the project.

Security is very important, make sure you think about it and secure what matters.

Shipping With Confidence

Although Alfred & Santiago are developed as a side-project, it is still important that everything we build is production ready and have confidence that everything is still working fine. To achieve this, we are using Visual Studio Team Services (VSTS) that hosts our Git repository. People can come in, work on features they like and create a pull request once they are ready. Each pull request will be reviewed by at least one person and automatically built by VSTS to make that it builds and no tests are broken. Once everything is ready to go out the door we can easily deploy to our environments by using release pipelines.

This makes it easier for new colleagues to contribute and providing an easy way to deploy new features without having to perform manual steps.

This Is Only The Beginning

A team of colleagues were willing to spend some spare time to learn from each other, challenge each other and have constructive discussions to dig deeper into our thinking. And that's what lead to our first working version, ready as a foundation and to which we can start adding new features, try new things and make Alfred more intelligent.

Besides having a visitor system that is up and running, we also have a platform available where people can consume the data to play around with, to test certain scenarios with representable data. This is great if you ask me because then you don't need to worry about the demo data, just focus on the scenario!

To summarize, this is our current architecture but I'm sure that it is not final. 

Personally, I think that a lot of cloud projects, if not all, will never be "done" but instead we should be looking for trends, telling us how we can improve to optimize it and keep on continuously improve the platform.

Don't worry about admitting your decision was not the best one - Learn, adapt, share.

Thanks for reading,

Tom Kerkhove

Categories: Technology
written by: Tom Kerkhove