Amazon Glue

Amazon was founded in 1994 in Washington State and is now headquartered in Seattle, WA. Originally an online book store, the company is now a broad-based technology provider of e-commerce, cloud computing and streaming services. The company has a variety of subsidiaries including Amazon Web Services, Whole Foods Market and MGM Entertainment. Amazon has become a household name through its online marketplace and more recently its streaming service. However, Amazon Web Services (AWS) also leads the cloud computing industry, accounting for 32 percent of the market and working with more than 90 percent of Fortune 100 organizations.

Amazon Web Services (AWS) offers 200 services including data storage, computing, and machine learning and artificial intelligence applications. The AWS Marketplace features over 4,000 data products allowing third-party sellers to connect with potential buyers. AWS serves government organizations, large and small enterprises, education institutions and individuals. It has over 1 million active users, 10 percent of those being large-scale organizations.
Amazon Glue was launched in 2017 as an event-driven, serverless computing platform. It supports ETL, ELT, batch and streaming workloads. Users can utilize three interfaces through AWS Glue Studio: a non-code drag-and-drop interface, a Jupyter Notebook-based interface or a job script editor that supports Python and Spark.
Like most AWS services, Amazon Glue has a pay-as-you-go pricing model. Customers pay only for the services they use, as they use them, and can get discounted rates with increased usage. Amazon bills compute workloads by the second, based on hourly rates. Products integrate simply within the platform. Amazon S3 offers object storage through a web interface with different tiers of storage depending on storage duration, storage size and access frequency.
A number of products make up the Amazon Glue Platform: Amazon Glue Studio hosts the no-code data pipeline builder. AWS Glue Data Catalog uses crawlers to gather metadata from sources and populate a centralized data catalog for data discovery and management. AWS Glue DataBrew is a visual data preparation tool that makes it easier for data analysts and data scientists to clean and normalize data to prepare it for analytics and machine learning (ML).
AWS Lambda provides Amazon Glue the serverless, event-driven cloud compute service that abstracts away server management. It runs code in response to events and manages compute resources autonomously. It can be used in 200 AWS services and SaaS applications. Amazon Glue has a number of automation and self-learning features. Most recently in July 2023, AWS Glue Studio integrated with Amazon CodeWhisperer, a generative AI coding assistant for ETL jobs. Together, these capabilities offer a comprehensive suite for building, managing and optimizing data pipelines in AWS environments. Midsized companies in particular use Amazon Glue to transform and deliver data to AWS repositories as well as the applications and analytical tools they support.
Support for Amazon Glue comes through the AWS Partner Network, which consists of a reported 100,000 partners that build, market and sell AWS products and systems to customers. Documentation and training for the platform are offered through the Amazon website.

User & Use Cases

All Amazon Glue customers responding to this year’s survey were midsized companies with 100-2,500 employees. This has two implications on the results presented. First, the response rate suggests that Amazon Glue has had significant penetration among midsized companies. Midsized companies have sufficient resources to purchase Amazon Glue, perhaps more than smaller companies. They also are likely to adopt new technologies: 57 percent of respondents use Amazon Glue to manage data lakehouses and 43 percent use it for data mesh. In contrast, large companies are more likely to use dedicated and long-standing ETL tools, even if they use other AWS services.

Data integration is the dominant use case, especially for (AWS) data lakehouse environments. In addition, 3 out of 10 companies use Amazon Glue for its data catalog or data intelligence capabilities. While not many companies use Glue solely as a catalog, this capability offers a valuable addition to support both business and technical users in the development, operation and control of data and data pipelines.

The second implication is that, if we extrapolate from these 21 respondents, Amazon Glue has less traction with larger companies. Its mean number of consumers, 27, is an order of magnitude less than the average numbers in its peer groups, which range from 128 to 252.

Want to see the whole picture?

BARC’s Vendor Performance Summary contains an overview of The Data Management Survey results based on feedback from Amazon Glue users, accompanied by expert analyst commentary.

Contact us to purchase the Vendor Performance Summary

Amazon Glue

Peer Groups Data Catalogs, Data Intelligence Platforms, Data Pipelining Products, Data warehousing automation products
Number of responses21
ProductAmazon Glue
Employees1.54 million approx.
Customers310 million approx.