SET11121 - Data Wrangling - Edinburgh Napier University
Learning outcome 1: Critically evaluate the tools and techniques of the data storage, interfacing, aggregation and processing
Learning outcome 2: Select and apply a range of specialised data types, tools and techniques for data storage, interfacing, aggregation and processing
Learning outcome 3: Employ specialised techniques for dealing with complex data sets
Learning outcome 4: Design, develop and critically evaluate data driven applications in Python
Task A
You will need to perform a literature review on recent approaches to word embeddings such as Word2Vec, GloVe, ELMo etc. You will need to pick 3 approaches, discuss how they work and critically compare them. Your references should come from international venues (such as conferences and journals). You can look for papers at Google Scholar or at the university library (online).
Your report must adhere to citation guidelines. You can use any reference style you prefer, however we strongly advise you to use APA
Part B
Using the provided dataset, you will need to develop and evaluate abusive language detection models for the given dataset. Your proposed model must be a Convolutional Neural Network with an appropriate embedding layer as a first layer.
Specifically:
Your developed approach must include a word embedding approach, chosen from the literature (i.e. from Part A).
A convolutional neural network (CNN). You must motivate the choice of the CNN architecture, e.g. it could be proposed in the literature.
Evidence of fine-tuning, i.e. results from a series of experiments that show parameter tuning. Evaluation of the model using appropriate methods.
The goal of this exercise is not to produce a state-of-the-art model. If your chosen model performs poorly by your selected metric, do not worry-this is not what we are testing. Which model you use, and how you evaluate, is up to you. It should however be appropriately motivated and evaluated (you will be tested on those aspects). Your solution should be sensible - you should be able to explain why it tests something of impact to the problem.
Attachment:- Data Wrangling.rar