Microsoft Azure Data Factory
Microsoft is the world’s largest software company by market capitalization. It was founded in 1975 and is headquartered in Redmond, Washington in the United States. It has become a household name primarily through the Windows operating system and Office application suite. Microsoft also has a vast range of enterprise software and cloud offerings including its own database, browser, various servers and ERP solutions. In recent years, Microsoft has focused its business on cloud-based solutions such as Azure. AI and machine learning have also become important in product development.
Azure is Microsoft’s cloud platform that offers more than 200 different products and services on a global physical infrastructure consisting of data centers in over 60 countries. Azure has driven much of Microsoft’s recent growth and has become a successful strategic pillar for Microsoft, with 2022 fiscal year revenue of $34 billion USD. Microsoft is the second largest cloud provider in the world, behind Amazon Web Services (AWS). With a recently reported annual revenue growth rate of 59 percent, Microsoft enjoys strong business demand for its software and services.
Companies of all sizes and industries utilize Microsoft Azure. Products scale efficiently and components can be integrated easily with each other. In addition to Microsoft’s own services, certified partners offer a huge variety of solutions and services through the constantly growing Azure Marketplace. Microsoft Azure has a competitive and broad offering in the data and analytics space, ranging from PaaS solutions for data and big data management and analytics to multiple AI and machine learning offerings to specialized SaaS solutions, such as Azure Purview for data governance.
Azure Data Factory is Microsoft’s offering for hybrid data integration, meaning it integrates with on-premises data sources and supports multi-cloud data integration. Its main capability is to build, manage and run ETL and ELT processes at any scale, with options to use ‘code-free’ interactive interfaces or write custom code. All Azure Data Factory capabilities can be automated through APIs, which is useful for DevOps and DataOps scenarios. It should be noted that nearly all cloud-based PaaS offerings share these abilities and most modern non-cloud solutions also provide extensive API support.
Azure Data Factory itself does not provide any data storage or analytical capabilities on its own. Rather, it supports Microsoft’s other core data products, such as Azure SQL Server, Azure Synapse (Microsoft’s data warehouse, BI and analytics platform), Azure Data Lake and Azure Machine Learning.
Azure Data Factory provides over 90 built-in connectors to data sources and targets. These include Microsoft’s own Azure data services; big data sources such as Amazon Redshift and Google BigQuery; enterprise data warehouses such as Oracle Exadata, Teradata and Snowflake; ERP solutions such as SAP; and SaaS applications such as Salesforce, Marketo, ServiceNow and SAP C4C.
Azure Data Factory covers all aspects of ETL and ELT processes. These include integration of data sources and systems; design of simple or complex, interdependent data pipelines; definition of data transformation and processing of any kind; scheduling and triggering of data pipelines; and detailed monitoring and logging capabilities. For data processing itself, Azure Data Factory has many built-in capabilities, from simple filtering to AI workloads. In addition to its inbuilt capabilities and connectors, external data processing capabilities can be integrated quite easily.
Azure Data Factory is a general ETL/ELT platform that can be used as a stand-alone service. However, it works particularly well when used in conjunction with one or more of Microsoft’s core data products, in particular Azure Synapse and Azure Data Lake.
Microsoft offers Azure Data Factory as a pay-as-you-go service. While pipeline orchestration and execution run on a serverless infrastructure and are charged according to discrete activities and runtime, data processing is priced by the time and scale of computations (vCore/hour) and required intermediate storage, normally a minimal part of the total costs. So, an inactive or unused Azure Data Factory set-up does not incur runtime or license costs. In July 2023, Microsoft announced Microsoft Fabric, which combines Azure Data Factory (in preview mode) with tools such as Azure Synapse Analytics and Power BI into a single offering.
User & Use Cases
Customers use Azure Data Factory predominantly for data integration (77 percent) and data warehousing / BI (54 percent). Unsurprisingly, many also use the platform for data warehouse automation (38 percent), data lakehouse creation (31 percent) and real-time/near-real-time data processing (23 percent). Since Azure Data Factory is Microsoft’s ‘go-to’ component for ETL within the Azure cloud, it can be assumed that most customers use it in conjunction with Azure SQL Server, Azure Synapse, Azure Data Lake or some other Azure product. Three quarters of respondents said they use Azure Data Factory several times a day.
Just under two thirds of responding companies using Azure Data Factory are large companies (with more than 2,500 employees) and almost a third are mid-sized (100 to 2,500 employees). Microsoft has likely utilized its existing relationships through its Office suite or Window Operating System to increase its Azure user base, as seen in its rapid growth. The mean of 486 consumers per company dwarfs the means for the Data Warehouse Automation and Data Pipelining Products peer groups, which have 128 and 169 per company respectively. In addition to Azure Data Factory having many large customers, the emphasis on automation and a ‘no-code’ environment has likely encouraged business users to use it and also contributes to such a relatively high mean.
Want to see the whole picture?
BARC’s Vendor Performance Summary contains an overview of The Data Management Survey results based on feedback from Microsoft Azure Data Factory users, accompanied by expert analyst commentary.Contact us to purchase the Vendor Performance Summary