Abstract
The quantity of digital data is growing exponentially, and the task to efficiently process such massive data is becoming increasingly challenging. Recently, academia and industry have recognized the limitations of the predominate Hadoop framework in several application domains, such as complex algorithmic computation, graph, and streaming data. Unfortunately, this widely known map-shuffle-reduce paradigm has become a bottleneck to address the challenges of big data trends. The demand for research and development of novel massive computing frameworks is increasing rapidly, and systematic illustration, analysis, and highlights of potential research areas are vital and very much in demand by the researchers in the field. Therefore, we explore one of the emerging and promising distributed computing frameworks, Apache Hama. This is a top level project under the Apache Software Foundation and a pure bulk synchronous parallel model for processing massive scientific computations, e.g. graph, matrix, and network algorithms. The objectives of this contribution are twofold. First, we outline the current state of the art, distinguish the challenges, and frame some research directions for researchers and application developers. Second, we present real-world use cases of Apache Hama to illustrate its potential specifically to the industrial community.
| Original language | English |
|---|---|
| Pages (from-to) | 4190-4205 |
| Number of pages | 16 |
| Journal | Journal of Supercomputing |
| Volume | 73 |
| Issue number | 9 |
| DOIs | |
| State | Published - 1 Sep 2017 |
Keywords
- Apache Hama
- Bsp
- Bulk synchronous parallel
- Distributed computing
- Hadoop
- Mapreduce
Fingerprint
Dive into the research topics of 'Investigating Apache Hama: a bulk synchronous parallel computing framework'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver