Map Reduce

NISHI KUMARI

Associate Project Manager @ HuQuo

Published Oct 7, 2024

MapReduce is a Java-based, distributed execution framework within the Apache Hadoop Ecosystem.

A MapReduce program is composed of a map procedure , which performs filtering and sorting (such as sorting students by first name into queues, one queue for each name), and a reduce method, which performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance .

Recommended by LinkedIn

WAT IS SPARK

Ashish Ranjan 1 year ago

Apache Spark

Kumar Preeti Lata 3 months ago

Apache Spark

Vanshika Munshi 4 months ago

The model is a specialization of the split-apply-combine strategy for data analysis.It is inspired by the map and reduce functions commonly used in functional programming ,although their purpose in the MapReduce framework is not the same as in their original forms.T he key contributions of the MapReduce framework are not the actual map and reduce functions (which, for example, resemble the 1995 Message Passing Interface standard's reduce and scatter operations), but the scalability and fault-tolerance achieved for a variety of applications due to parallelization. As such, a single-threaded implementation of MapReduce is usually not faster than a traditional (non-MapReduce) implementation; any gains are usually only seen with multi-threaded implementations on multi-processor hardware.The use of this model is beneficial only when the optimized distributed shuffle operation (which reduces network communication cost) and fault tolerance features of the MapReduce framework come into play. Optimizing the communication cost is essential to a good MapReduce algorithm.

MapReduce libraries have been written in many programming languages, with different levels of optimization. A popular open-source implementation that has support for distributed shuffles is part of Apache Hadoop . The name MapReduce originally referred to the proprietary Google technology, but has since been genericized . By 2014, Google was no longer using MapReduce as their primary big data processing model,and development on Apache Mahout had moved on to more capable and less disk-oriented mechanisms that incorporated full map and reduce capabilities.

Map Reduce

NISHI KUMARI

Associate Project Manager @ HuQuo

Recommended by LinkedIn

More articles by this author

Insights from the community

Others also viewed

Spark Vs Hadoop Map Reduce

WHAT IS SPARK

WHAT IS SPARK

5 Fundamentals of Apache Spark

Spark

Lets learn Spark...

Understanding the MapReduce Workflow: A Detailed Guide

Getting started with Apache Spark

Apache Spark vs. Hadoop MapReduce

Explore topics

Recommended by LinkedIn

Artificial Neural Networks

Nov 16, 2024

Manual Testing

Nov 15, 2024

MEAN Stack Developer

Nov 14, 2024

End - User Computing

Nov 13, 2024

Web Developer

Nov 12, 2024

Chief Operating Officer

Nov 11, 2024

LangChain

Nov 9, 2024

Kafka

Nov 6, 2024

Maven

Nov 5, 2024

What is business communication?

Nov 4, 2024

Insights from the community

Others also viewed

Spark Vs Hadoop Map Reduce

WHAT IS SPARK

WHAT IS SPARK

5 Fundamentals of Apache Spark

Spark

Lets learn Spark...

Understanding the MapReduce Workflow: A Detailed Guide

Getting started with Apache Spark

Apache Spark vs. Hadoop MapReduce

Explore topics