BARC’s  Hadoop and Data Lakes survey showed that almost a third of participants use commercial products or Hadoop distributions to implement Hadoop projects. Further analysis of the software selection process shows that 61 percent choose in-house deployments of Hadoop distributions. Only a few rely on managed services (11 percent), cloud platforms (9 percent) or appliances (10 percent). Analysis and visualization is clearly the domain of commercial tools (64 percent). A closer look at the tool categories reveals other interesting insights.


Tools for implementing Hadoop by software category (n=141)


Commercial tools and Hadoop distributions clearly dominate the categories of data integration and data quality (48 percent), system management (41 percent) and, especially, advanced analytics and visualization (64 percent). Data storage is the only area where the use of Apache Hadoop as an open source framework is very high in comparison to commercial tools or Hadoop distributions. Data storage with the Hadoop Distributed File System (HDFS) is one of the original basic functions of Hadoop and is therefore better known.

About half of participants use no tools for streaming (53 percent) or governance and security (46 percent). There appear to be no clearly defined products for these categories.

 

Hadoop and Data Lakes Report

Use Cases, Benefits and Limitations

Request the free report now