Azure Data Explorer: from the field
Updated: Mar 24, 2021
Azure Data Explorer (ADX) is a wonderful service that enables data engineers to take a simple, streamlined approach to big data exploration. Forget what you think you know about this service! On the surface ADX is a powerful near real-time data ingestion and exploration service. IoT, log data, telemetry, and the like that flows through a service like Azure Event Hubs can be ingested into ADX and explored blazingly fast. However, ADX is not relegated to the world of "streaming" data flows and is perfectly capable of exploring your data lake or serving your reporting platform. The goal of this article is to introduce the reader to the service and provide helpful links to get up to speed quickly. At the end, I'll show a recent use that was delivered for a customer.
What is Azure Data Explorer?
A highly scalable data storage and query service built on Kusto EngineV3.
ADX supports KQL and T-SQL, data visualizations, and machine-learning.
Why would I want to use ADX?
To explore large of amounts of streaming data at blazing speeds.
Quickly explore data in a single service with visuals and ML capabilities.
Query 1 billion records in under 1 second.
When should I use ADX over Synapse Analytics or Azure Databricks?
For gaining insights from your streaming log and telemetry data.
When your team is familiar with KQL or open to learning an intuitive language.
For less-mature data engineering teams who aren't ready to leverage Spark or MPP.
Getting started with ADX is easy, check out these tutorials:
As you can see from the tutorials provided, ADX is a data storage and query engine which also provides rich visualizations to the data engineer. You create databases, tables, and schemas to define your data. You can ingest your data from Azure Event Hubs and simply query it using the Kusto Query Language. But we have only scratched the surface.
In our tutorials we ingested data via Azure Event Hubs but we have additional methods for ingesting and working with data outside of the service. Azure Data Factory Copy activity can be used to ingest data into ADX either inferring the schema or using a pre-defined schema mapping created in ADX. This can be useful for reference data that helps complete the data story in ADX. ADX also support external tables and can be pointed at your data lake. This allows for the service to query this data without storing it. Again, great for reference data that may be resting in your data lake.
ADX works well with semi-structured data. There is a new experience called one-click ingestion. Using this new experience (ingesting data is in preview), we can ingest data from a storage container or blob and the service assists with the mapping schema. This greatly improves the experience of working with semi-structured data in the service and typically functionality limited to SQL or Spark.
Additionally, for those who love SQL, ADX supports T-SQL. This allows die-hard SQL experts to leverage their knowledge of the language. ADX translates the SQL queries to Kusto queries during execution. Kusto also provides the explain statement which will show the translation from SQL to KQL for developers, helping move adoption to the KQL language. There are limitations to using SQL with Kusto.
Finally, ADX supports ML and forecasting. Without being a data scientist or learning new services you can quickly perform tasks such as anomaly detection and time-series analysis.
And now, a story from the field
Critical app telemetry and logs can assist the organization in reacting to a web app's health and functionality. Support needs information in near real-time to properly service customers.
Containerized web application
Log Analytics is a great centralized logging solution. However, when your solution requires querying massive amounts of log data in near real-time, it does not fit the use case. Log Analytics today takes roughly 2-5 minutes to ingest data. This number is constantly improving but in our case, we need the data faster. ADX also provides a lot of extra functionality to aid in the exploration of your data.
To solve this problem, the first step is to determine how to flow logs from the containerized applications to Azure Event Hubs. In this case, a SeriLog sidecar was used which replicated SeriLog activity to Azure Event Hubs allowing ADX to ingest the logs. A new Event Hub, namespace, and consumer group was created for ADX.
We created a new ADX service making sure to enable streaming ingestion in the advanced options, and later, creating a streaming ingestion policy.
The logs are JSON and the structure is predictable. Within the ADX service, we created a new database and table with a mapping schema. The logs could then be explored but the data story was incomplete. Within the logs are GUIDs that need resolved to processes in another area of the application. This data existed in a table in a SQL MI instance which was being ingested into the organizations Data Lake. Using the data in the data lake, an external table was created to map these GUIDs to the required reference data.
Finally, analysts are able to write queries in ADX and export those queries for consumption in Power BI. The overall result of this design is that logs are realized in ADX within 500ms and able to be queried by support.
In this article we explored the Azure Data Explorer service and its many features. This service is very powerful and enables data engineers to perform a multitude of tasks within a single service. ADX tackles big data workloads with ease with its scale-out Kusto clusters under the hood. And finally, we showed a recent use case where a customer is using Azure Data Explorer and finding great value.