Implement one executable Hadoop MapReduce job that counts, UK

Implement one executable Hadoop MapReduce job that counts

UK Tutor Service

CST4070 Applied Data Analytics Tools, Practical Big Data Handling, Cloud Distribution - Middlesex University

Big Data

General information - You are required to submit your work via the dedicated Unihub assignment link by the specified deadline. This link will 'timeout' at the submission deadline. Your work may not be accepted as an email attachment if you miss this deadline. Therefore, you are strongly advised to allow plenty of time to upload your work prior to the deadline.

You are required to solve the Tasks illustrated below. Each Task should be accompanied by:

a. A short introduction where you describe the problem and your high level solution.

b. Your step-by-step process supported by screenshots. Each screenshot needs to be accompanied by a short explanatory text.

c. Eventually, if necessary, conclude each task with brief summary of what you have done.

Tasks - Follow the lab instructions to install Apache Hadoop into a virtual server running on Linux Ubuntu Server. Once you have Apache Hadoop installed and running, execute the following Task tasks.

Task 1 - Implement one executable Hadoop MapReduce job that counts the total number of words having an even and odd number of characters. As an example, if the text in input is Hello world , the output should be even:0, odd:2 , because both Hello and world contain an odd number of characters. Whereas, if the input us My name is Alice the output should be even: 3, odd: 1.

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 2 - Implement one executable Hadoop MapReduce job that receives in input a .csv table having the structure 'StudentId, Module, Grade' and returns in output the minimum and maximum grade of each student along as her total number of modules she has passed.

Therefore, if your input is:

StudentId	Module	Grade
S001	Statistic	75
S002	Statistic	72
S001	Big Data	78
S003	Big Data	66
S001	Programming	70
S002	Programming	55
S001	Machine Learning	65
S002	Machine Learning	61

Your output need to be:

StudentId	MinGrade	MaxGrade	Modules
S001	65	78	4
S002	55	72	3
S003	66	66	1

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 3 - Implement one executable Hadoop MapReduce job that receives in input two .csv tables having the structure:

User: UserId, Name, DOB

Follows: UserIdFollower, UserIdFollowing

The MapReduce job needs to perform the following SQL query:

select U.UserId, U.Name as NameFollower, F.Name as NameFollowing

from User as U

join Follows as F on U.UserId = F.UserId

where F.DOB <= '2002-03-01'

Therefore, if the two original tables are:

UserId	Name	DOB
U001	Alice	2005-01-05
U002	Tom	2001-02-07
U003	John	1998-06-02
U004	Alex	2006-02-01

UserIdFollower	UserIdFollowing
U001	U002
U001	U003
U002	U001
U002	U004
U003	U001
U004	U001

The final table needs to be

UserId	NameFollower	NameFollowing
U001	Alice	Tom
U001	Alice	John

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Excel in your Course/Assignments

UK Tutor service is helping students, not just for improving grades but also to provide them better learning of subject and concepts behind the solutions of problems. They are serving world class Live assistance which may help you to excel in course.

Study in UK – United Kingdom
Search solved assignments/homework

Management

Accounting

Economics

Finance

Statistics

Engineering

Computers

Programming

Essays

Papers

English

MATH

Physics

Chemistry

Biology

Implement one executable Hadoop MapReduce job that counts

Excel in your Course/Assignments

Post Feedback

Tutor service in UK?

Popular Tags