Amazon Redshift
Amazon Web Services (AWS) is a subsidiary of Amazon. It offers cloud-based services across the world on a metered pay-as-you-go basis. The foundation of today’s AWS data and analytics services was laid in 2006 with a relaunch of the AWS platform, initially featuring the Amazon S3 (Amazon Simple Storage Service) and EC2 (Elastic Compute Cloud) services, which are still essential components of the platform today. Amazon now offers a set of over 200 global cloud-based products for data processing, storage, analysis and more. Its partner landscape is immense, as is the number of AWS customers. Amazon’s Availability Zones are located around the world. Amazon currently serves 87 Availability Zones (logical network of data centers) in 27 regions.
The services are closely integrated with each other. Amazon Redshift is a relational massively parallel processing data warehouse with columnar storage and OLAP functionality based on PostgreSQL, which integrates seamlessly with Amazon Data Lake based on S3. The service extends the S3 object storage. Amazon Web Services offers its various services as building blocks. Therefore, Amazon Redshift can integrate with other AWS services, including for example its machine learning service Amazon SageMaker, which is now also available directly from within Amazon Redshift through SQL.
Data can be loaded from S3 into Amazon Redshift and prepared, stored and queried in an optimized way for BI/analytics workloads and unloaded back to S3 to be consumed from there. Amazon Redshift Spectrum (a Redshift feature) enables queries on combined data from Redshift and S3 (Data Lake) or direct queries to S3. Queries on file formats such as CSV, Parquet, Avro and JSON are supported, thus avoiding unnecessary data copies or data movement. This is an essential feature in AWS’s quest to build a modern data and analytics architecture that is not only designed for a specific use case but is open, flexible and scalable.
Amazon Redshift RA3 compute instances also include a hardware-based Advanced Query Accelerator, which is designed to provide a significant performance boost of up to 10 times for queries. Newly introduced advanced data sharing capabilities allow the integration of multiple Amazon Redshift data warehouse instances without the need to copy or move data. This can help to improve reliability through virtual data replication and also supports highly-distributed DWH or application scenarios.
To load data into Amazon Redshift, users can employ the COPY command or leverage cloud ETL services from AWS itself or one of several third parties from the broader AWS partner landscape.
AWS offers many data and analytics services alongside Amazon Redshift. While functional overlaps exist, each service is tailored to a specific use case. The company believes in closely mapping services to very specific use cases and giving users the flexibility to choose the best service for their needs.
Amazon EMR is a flexible and cost effective framework for analytical big data processing using technologies such as Hadoop, Spark, Presto and others. Amazon Athena is a serverless service that provides data exploration and ad hoc query capabilities on data lakes, geospatial data and service logs in S3. Overall, Amazon Redshift is suitable for classic data warehousing but also complex SQL processing for reporting, business intelligence and advanced analytics purposes.

User & Use Cases
Amazon Redshift features an analytic database with broad functional coverage that enables multiple use cases. It is used for data warehousing by 67 percent of customers and advanced analytics by 56 percent. Respondents to this survey also reported using it for data integration (56 percent) and data warehouse automation (44 percent).
This broad scope of analytical tasks shows that Amazon’s dedicated service for data warehousing has been extended to cover BI and advanced analytics workloads. In contrast, only 33 percent are using Redshift as a data lake so it can be seen as an analytical engine rather than a simple data store, although 56 percent of users stated that they use Amazon as data storage for data provisioning. It is interesting that 44 percent of users claim to do self-service analytics or data discovery. The latter is surprisingly high as the Amazon Redshift service does not directly cover this area and requires the utilization of, and integration with, other Amazon services. Maybe this is more to support data experts to build advanced analytics solutions.
Redshift is mainly used by medium-sized companies (56 percent). A low median and mean in the average number of users indicate small to medium usage scenarios. The extent of usage is, hence, mainly in several divisions or one division, but none of the respondents to this year’s survey claim to use the platform company-wide. On that basis, we can conclude that Amazon is a valuable analytical engine for specific, dedicated use case scenarios.
Use cases
n=27

Extend of usage in the company
n=15

Total number of users per company
n=27

Total number of developers per company
n=15

Number of users using Amazon Redshift
n=27

Want to see the whole picture?
BARC’s Vendor Performance Summary contains an overview of The Data Management Survey results based on feedback from Amazon Redshift users, accompanied by expert analyst commentary.
Contact us to purchase the Vendor Performance Summary- Register for a free sample Vendor Performance Summary download
- If you have any questions, feel free to contact us