Amazon Redshift

Amazon Web Services (AWS) is a subsidiary of Amazon. It offers cloud-based services across the world on a metered pay-as-you-go basis. The foundation of today’s AWS data and analytics services was laid in 2006 with a relaunch of the AWS platform, initially featuring the Amazon S3 (Amazon Simple Storage Service) and EC2 (Elastic Compute Cloud) services, which are still essential components of the platform today. Amazon now offers a set of over 200 global cloud-based products for data processing, storage, analysis and more.

Its partner landscape is immense, as is the number of AWS customers. Amazon’s Availability Zones are located around the world. Amazon currently serves 81 Availability Zones (logical network of data centers) in 25 regions. AWS analytics services can integrate with the wider AWS platform. The services are closely integrated with each other. Amazon Redshift is a relational massively parallel processing data warehouse with columnar storage and OLAP functionality based on PostgreSQL, which integrates seamlessly with Amazon Data Lake based on S3. The service extends the S3 object storage. Amazon Web Services offers its various services as building blocks. Therefore, Amazon Redshift can integrate with other AWS services, including for example its machine learning service Amazon SageMaker, which is now also available directly from within Amazon Redshift through SQL.

Data can be loaded from S3 into Amazon Redshift and prepared, stored and queried in an optimized way for BI/analytics workloads and unloaded back to S3 to be consumed from there. Amazon Redshift Spectrum (a Redshift feature) enables queries on combined data from Redshift and S3 (Data Lake) or direct queries to S3. Queries on file formats such as CSV, Parquet, Avro and JSON are supported, thus avoiding unnecessary data copies or data movement.

Amazon Redshift RA3 compute instances now also include a hardware-based Advanced Query Accelerator (AQUA), which is designed to provide a significant performance boost of up to 10 times for queries. Newly introduced advanced data sharing capabilities allow the integration of multiple Amazon Redshift data warehouse instances without the need to copy or move data. This can help to improve reliability through virtual data replication and also supports highly-distributed DWH or application scenarios.

AWS offers many data and analytics services alongside Amazon Redshift. While functional overlaps exist, each service is tailored to a specific use case. The company believes in closely mapping services to very specific use cases and giving users the flexibility to choose the best service for their specific use case.

Amazon EMR is a flexible and cost effective framework for analytical big data processing using technologies such as Hadoop, Spark, Presto and others. Amazon Athena is a serverless service that provides data exploration and ad hoc query capabilities on data lakes, geospatial data and service logs in S3. Overall, Amazon Redshift is suitable for classic data warehousing but also complex SQL processing for reporting, business intelligence and advanced analytics purposes.

User & Use Cases

Amazon Redshift is predominantly used for data warehousing and BI (88 percent) while one in two companies use Redshift functionality for data integration, data warehouse automation and advanced analytics. This broad scope of analytical tasks shows that Amazon’s dedicated service for data warehousing is now being extended to cover BI and advanced analytics workloads. In contrast, only 25 percent are using Redshift as a data lake so it can be seen as an analytical engine rather than a simple data store. It is interesting that 41 percent of users claim to do self-service analytics or data discovery. The latter is surprisingly high as the Amazon Redshift service does not directly cover this area and requires the utilization of and integration with other Amazon services.

Redshift is mostly used by smaller and medium-sized companies (75 percent). This may be because many larger companies already have strategic partnerships in place with software providers and EDWH vendors. In addition, the trend of migrating to the cloud has not been followed by all companies yet.

Small and above all ‘new’ companies tend to avoid large infrastructure investments and use flexible, demand-oriented cloud offerings. However, one quarter of the Amazon Redshift users responding to this survey were from large companies.

Current use

n=32

Total number of users per company

n=32

Total number of administrators per company

n=23

Company size (number of employees)

n=31

Want to see the whole picture?

BARC’s Vendor Performance Summary contains an overview of The Data Management Survey results based on feedback from Amazon Redshift users, accompanied by expert analyst commentary.

Contact us to purchase the Vendor Performance Summary

Amazon Redshift

Peer Groups Analytical Database Products, Business Software Generalists (data management), Data Warehouse Technologies
VendorAmazon Web Services
Number of responses32
ProductAmazon Redshift
OfficesWorldwide
EmployeesNot disclosed
CustomersWorldwide
Revenues (2020)$45.37 billion
Websitewww.aws.amazon.com/redshift