About the Course
Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.MapReduce (the algorithm) and an implementation of MapReduce. Hadoop MapReduce is an implementation of the algorithm developed and maintained by the Apache Hadoop project. It is helpful to think about this implementation as a MapReduce engine, because that is exactly how it works. You provide input (fuel), the engine converts the input into output quickly and efficiently, and you get the answers you need.
Hadoop MapReduce includes several stages, each with an important set of operations helping to get to your goal of getting the answers you need from big data. The process starts with a user request to run a MapReduce program and continues until the results are written back to the HDFS. MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
A MapReduce program is composed of a Map() procedure (method) that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() method that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies).
Big data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, technqiues and frameworks.
This course covers the importance of Big Data, how to setup Node Hadoop pseudo clusters, work with the architecture of clusters, run multi-node clusters on Amazons EMR, work with distributed file systems and operations including running Hadoop on HortonWorks Sandbox and Cloudera. Students will also learn advanced Hadoop development, MapReduce concepts, using MapReduce with Hive and Pig, and know the Hadoop ecosystem among other important lessons.
What We learn
- Become literate in Big Data terminology and Hadoop.
- Understand the Distributed File Systems architecture and any implementation such as Hadoop Distributed File System or Google File System
- Use the HDFS shell
Currently there is no syllabus details available for this course.