Gao, X.; Wang, H., and Li, X., 2019. Application of open-source big-data framework in marine information processing. In: Li, L.; Wan, X.; and Huang, X. (eds.), Recent Developments in Practices and Research on Coastal Regions: Transportation, Environment and Economy. Journal of Coastal Research, Special Issue No. 98, pp. 187–190. Coconut Creek (Florida), ISSN 0749-0208.

Open-source big-data systems can use a variety of processing technologies. For workloads that only require batch processing, Hadoop, which is less time sensitive and less expensive than other solutions, would be a good choice. For workloads that only require stream processing, Storm can support a wider range of languages and achieve very low latency processing, but the default configuration can produce duplicate results and cannot guarantee order. Samza's tight integration with YARN and Kafka provides greater flexibility, easier-to-use multiteam usage, and simpler replication and state management. The most suitable solution depends mainly on the state of the data to be processed, the time required for processing, and the desired result. Specifically, using a full-featured solution or a solution that focuses primarily on a project requires a careful trade-off. As it matures and is widely accepted, similar issues need to be considered when evaluating any emerging and innovative solutions.

This content is only available as a PDF.
You do not currently have access to this content.