Describe the problem and your high level solution. Your

Post New Homework

CST4070 Applied Data Analytics - Tools, Practical Big Data Handling, Cloud Distribution - Middlesex University

Assignment - Big Data

You are required to submit your work via the dedicated Unihub assignment link by the specified deadline. This link will ‘timeout' at the submission deadline. Your work may not be accepted as an email attachment if you miss this deadline. Therefore, you are strongly advised to allow plenty of time to upload your work prior to the deadline.

You are required to solve the tasks illustrated below. Each task should be accompanied by:

A short introduction where you describe the problem and your high level solution. Your step-by-step process supported by screenshots. Each screenshot needs to be accompanied by a short explanatory text.

Eventually, if necessary, conclude each task with brief summary of what you have done.

Your submission needs to be unique

When solving your tasks, you are required to name your files by using your first name (e.g., if your name is Alice, you may name your task 1 file as ) so to make your submission unique. Obviously, also your explanatory text needs to be unique.

Tasks

Follow the lab instructions to install Apache Hadoop into a virtual server running on Linux Ubuntu Server. Once you have Apache Hadoop installed and running, execute the following tasks.

Task 1

Implement one executable Hadoop MapReduce job that counts the total number of words having an even and odd number of characters. As an example, if the text in input is

Hello world , the output should be

, because both

and

world contain an odd number of characters. Whereas, if the input us
My name is Alice the output should be .

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 2

Implement one executable Hadoop MapReduce job that receives in input a .csv table having the structure 'StudentId, Module, Grade' and returns in output the minimum and maximum grade of each student along as her total number of modules she has passed.

Therefore, if your input is:

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs

to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 3

Implement one executable Hadoop MapReduce job that receives in input two .csv tables having the structure:

User: UserId, Name, DOB
Follows: UserIdFollower, UserIdFollowing

The MapReduce job needs to perform the following SQL query:

Therefore, if the two original tables are:

The final table needs to be

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Attachment:- Applied Data Analytics.rar

Post New Homework
Captcha

Looking tutor’s service for getting help in UK studies or college assignments? Order Now