Microsoft Azure Data Factory

Microsoft is the world’s largest software company. Founded in 1975 and headquartered in Redmond, it has become a household name, primarily due to its Windows operating system and Office suite. Aside from these products, Microsoft has a vast range of enterprise software and cloud offerings including its own database, browser, various servers and ERP solutions. In recent years, Microsoft has focused its business on cloud-based solutions such as Azure. AI and machine learning have also become increasingly important in product development.

Azure is Microsoft’s cloud platform offering over 200 different products and services on a global physical infrastructure consisting of data centers in over 60 countries. Officially introduced in 2010, Azure – and cloud business in general – has become a very important and successful strategic pillar for Microsoft, making up 38 percent of Microsoft’s operating income. The various services offered by Microsoft are known for being very accessible to companies of all sizes and industries and can be integrated quite easily with each other. In addition to Microsoft’s own services, certified partners offer a huge variety of solutions and services through the constantly growing Azure Marketplace.

As of today, Microsoft is the second largest cloud provider in the world (behind Amazon AWS). With a recently reported annual revenue growth rate of 59 percent (AWS: 32 percent), Microsoft seems to address today’s business demands quite well.

Microsoft Azure has a competitive and broad offering in the data and analytics space, ranging from PaaS solutions for data and big data management and analytics to multiple AI and machine learning offerings to specialized SaaS solutions, such as Azure Purview.

Azure Data Factory is Microsoft’s offering for hybrid data integration. Hybrid means it integrates with on-premises data sources, as well as supporting multi-cloud data integration. Therefore, its main capability is to build, manage and run ETL and ELT processes at any scale, using ‘code-free’ interactive user interfaces (Microsoft claims it is suitable for ‘citizen integrators’) or by writing custom code. All Azure Data Factory capabilities can (more or less) easily be automated as they are all exposed through APIs. This perfectly supports DevOps and DataOps scenarios, where automation is key. For balance, it should be noted here that this is also true of most cloud-based PaaS services from all vendors. Most modern non-cloud solutions also provide extensive API support.

Azure Data Factory itself does not provide any data storage or analytical capabilities on its own. Therefore, it is intended to be support Microsoft’s other core data products, such as Azure SQL Server, Azure Synapse (Microsoft’s DWH, BI and analytics platform), Azure Data Lake and Azure Machine Learning, or even run standalone.

Azure Data Factory provides over 90 built-in connectors, ranging from all of Microsoft’s own Azure data services; Big Data sources such as Amazon Redshift and Google BigQuery; enterprise data warehouses including Oracle Exadata, Teradata and Snowflake; ERP solutions like SAP; and SaaS apps such as Salesforce, Marketo, ServiceNow and SAP C4C.

Azure Data Factory covers all aspects of ETL and ELT processes. From integration of data, data sources and systems, to the design of simple to more complex interdependent data pipelines, the definition of data transformation and processing of any kind, the scheduling and triggering of data pipelines as well as detailed monitoring and logging capabilities. For data processing itself, many built-in capabilities are available, from simple filtering to AI workloads. In addition to the inbuilt capabilities and connectors, external data processing capabilities can be integrated quite easily.

User & Use Cases

Azure Data Factory is used predominantly for ‘Data Warehousing/BI’ (84 percent), ‘Data Integration’ (74 percent), and ingesting and processing data from a ‘Data Lake’ (65 percent). Since Azure Data Factory is Microsoft’s “go-to” component for ETL purposes within the Azure cloud, it can be assumed that most users use it in conjunction with Azure SQL Server, Azure Synapse, Azure Data Lake or other data processing products within Azure.

Interestingly, and although it is Microsoft’s primary data integration component, only 29 to 42 percent of users say that they use Azure Data Factory for tasks such as ‘Data Pipelining’, Data Preparation for Business Users’, ‘Data Warehouse Automation’ and ‘Cloud Integration’. We assume that some customers do have these use cases but failed to mention them explicitly in favor of selecting the umbrella term ‘Data Integration’.

Current use

n=31

Total number of users per company

n=31

Total number of administrators per company

n=26

Company size (number of employees)

n=31

Want to see the whole picture?

BARC’s Vendor Performance Summary contains an overview of The Data Management Survey results based on feedback from Microsoft Azure Data Factory users, accompanied by expert analyst commentary.

Contact us to purchase the Vendor Performance Summary

Microsoft Azure Data Factory

Peer Groups Business Software Generalists (data management), Data Pipelining Products, Products to Support DW Automation
VendorMicrosoft Corporation
Number of responses31
ProductMS Azure Data Factory
OfficesWorldwide
Employees163,000 approx.
Customers75 million approx.
Revenues (2020)$143 billion
Websitewww.microsoft.com