Steven De Lausnay

14 July 2022

all Codit insights

Takeaways from the Data & AI Summit 2022 by Databricks

Last week, Databricks held the latest version of their Data & AI summit, bringing us the latest news in the world of data and AI. Here are some interesting points to remember when implementing a data solution.

1. Data Lakehouses are here to stay

As the theme centered around Lakehouses, several sessions focused on the concept and how it is evolving/improving. The Lakehouse is the standard when building an end-to-end data platform for data engineers, data analysts, and data scientists. Recently, we have seen a lot of improvements making this technical implementation even better on Databricks. One interesting feature was the implementation of the Photon query engine within Databricks, which gives you the fastest query response so far. This paper explains the engine in further detail. At Codit, our reference architecture for a data platform is also based on the Lakehouse and is fully compatible with the data mesh principles to build even larger data platforms. With these improvements, we are capable of building better and faster solutions for our customers, using a simplified architecture.

2. Delta Lake 2.0

The complete Lakehouse principle is based on the use of Delta Lake, which brings the structure back into your Data Lake. During the summit, it was announced that as of now, Lake will become open source, with the aim of further adoption in the market. In the past, Delta Lake was highly linked with Databricks, to the extent that you could only use all of its features if you were using Databricks. However, we are now seeing that other platforms, such as Azure Synapse, are adopting Delta Lake. Starting with Delta Lake 2.0, others will also be able to implement all features, making it less dependent on Databricks. With this evolution, we hope that we can also onboard smaller companies onto a Lakehouse for their data platform.

3. Improved data governance

When building a data platform, governance is of utmost importance, especially when your dataset is growing exponentially. Knowing which data you have and understanding its quality is key for a successful data platform implementation. Databricks has now implemented Unity Catalog, a centralized unified governance solution for all data & AI assets. This means you can build a catalogue of your files, tables, dashboards, but also your ML Models. Additional features such as a fully automated lineage of all your developed workloads (SQL, R, Python, Scala) are included with this, which you can read more about here. To further extend this part, Databricks has joined up with Monte Carlo to improve overall Data observability. Inspired by the proven best practices of application observability in DevOps, data observability is an organization’s ability to fully understand the health of the data in their system. Data observability, just like its DevOps counterpart, uses automated monitoring, alerting, and triaging to identify and evaluate data quality issues. For further details on this, read more here.

4. Latest version of MLflow

If you want to do machine learning on top of your Lakehouse, then MLflow is the end-to-end machine learning tool for the complete ML lifecycle. During the summit, MLflow 2.0 was announced, bringing even more capabilities which will accelerate your ML solutions. The biggest new component in MLflow 2.0 is the introduction of pipelines. With this feature, it will become even easier for your models to move from development into production. The ability to implement MLOps for our customers with fewer workarounds is a big improvement. The model monitoring capabilities are also improved, making it easier to check the performance of your models and evaluate if they need to be retrained.

5. Spark Connect

Spark was already well known as a unified engine for large-scale data analysis, especially with the automatic scaling to handle even larger data sets. However, we see that the demands are changing and edge computing is becoming more important in solutions. Spark is often too large to run on smaller edge devices, but now Databricks has announced Spark Connect. This is a client and server interface for Apache Spark, based on the DataFrame API that will decouple the client and server. With this feature, developers are able to build solutions and gain access to Spark from any device.

The summit was full of a lot more features and announcements to help us build even better data solutions for our customers. If you are interested in starting or building on your data journey, contact us for a chat on the future of data platforms and how they can help your organization become more data-driven. You can also watch the summit on-demand here.

*Cover image from Databricks.

Want to know more about the future of data platforms?

Chat with us!

Subscribe to our RSS feed

The Data Maturity Journey

Watch the video to see how Codit can help transform your data into actionable insights

A Simple Chat App with LangChain

In the final blog post of this series, we will have a look at how to make an AI-driven app in LangChain.

Why You Need Azure to Secure Your AI Solution

Watch the full video to ensure fairness, security, transparency and accountability in your AI deployment.

Brussels Airlines’ Digital Transformation Takes Off

IoT Takes Bühler Group from Field to Fork

Going the Distance with Cloud-Connected Industrial Sensors

Swiss Re leverages Cloud Technology and Data Services for its Digital Risk Intelligence Solutions

Soudal is Digitally Transforming Sales in the Chemical Industry

Creating New Revenue Streams in Logistics by Connecting Data

Brussels Airlines’ Digital Transformation Takes Off

IoT Takes Bühler Group from Field to Fork

Going the Distance with Cloud-Connected Industrial Sensors

Swiss Re leverages Cloud Technology and Data Services for its Digital Risk Intelligence Solutions

Soudal is Digitally Transforming Sales in the Chemical Industry

Creating New Revenue Streams in Logistics by Connecting Data

Takeaways from the Data & AI Summit 2022 by Databricks

1. Data Lakehouses are here to stay

2. Delta Lake 2.0

3. Improved data governance

4. Latest version of MLflow

5. Spark Connect

Related articles

Hi there,
how can we help?

Let's talk

Let's talk

Thanks, we'll be in touch soon!

Call us

Send blog to my inbox

Thanks, we've sent the link to your inbox

Your download should start shortly!

What can we connect for you?

Brussels Airlines’ Digital Transformation Takes Off

IoT Takes Bühler Group from Field to Fork

Going the Distance with Cloud-Connected Industrial Sensors

Swiss Re leverages Cloud Technology and Data Services for its Digital Risk Intelligence Solutions

Soudal is Digitally Transforming Sales in the Chemical Industry

Creating New Revenue Streams in Logistics by Connecting Data

Takeaways from the Data & AI Summit 2022 by Databricks

1. Data Lakehouses are here to stay

2. Delta Lake 2.0

3. Improved data governance

4. Latest version of MLflow

5. Spark Connect

Related articles

Hi there,how can we help?

Let's talk

Let's talk

Thanks, we'll be in touch soon!

Call us

Send blog to my inbox

Thanks, we've sent the link to your inbox

Your download should start shortly!

Stay in Touch - Subscribe to Our Newsletter

Great you’re on the list!

What can we connect for you?

Hi there,
how can we help?