Module overview
The module provides an introduction to data analytics and data mining. It will combine practical work using R and SQL with an introduction to some of the theory behind standard data mining techniques.
Aims and Objectives
Learning Outcomes
Knowledge and Understanding
Having successfully completed this module, you will be able to demonstrate knowledge and understanding of:
- Understand the basic principles of database design
- Develop a good understanding of the underpinning statistical ideas behind data mining.
- Have a good understanding of algorithms that can be used for classification, assortment, clustering and text analytics.
Learning Outcomes
Having successfully completed this module you will be able to:
- Run basic queries to obtain information from a database
- Use R for data manipulation and visualisation
- Be able to carry out a practical data mining project.
- Be able to apply data mining algorithms using R.
Transferable and Generic Skills
Having successfully completed this module you will be able to:
- Understand how to present results from a complex data analysis to a non-expert.
- Gain practice in working as part of a technical team on a data mining project.
Syllabus
Part 1: Data Analytics
This will cover the extraction of data from a database, preliminary analysis including plotting to support a better understanding of the underlying features and preprocessing. Introduction to reporting in R, including RMarkdown.
Part 2: Statistical Methods for Data Mining
Underlying statistical ideas needed for data mining, including maximum likelihood estimation, linear & logistic regression, principal components analysis and measures of similarity/dissimilarity.
Part 3: Text Mining
Processing text data, analysing word frequency (tf-idf), bag of words, with option to cover topic modelling (LDA – Latent Dirichlet Allocation).
Part 4: Classification
A main focus on trees, boosting algorithms and an introduction to random forests.
Part 5: Association Analysis
A priori algorithm and generation of frequent item sets.
Part 6: Cluster Analysis
K-means algorithm and hierarchical clustering.
Learning and Teaching
Teaching and learning methods
The module will be taught using a mixture of lectures and computer workshops: 2 hours of lectures and 2 hours of computer workshops per week for the duration of Semester 1.
Type | Hours |
---|---|
Workshops | 22 |
Teaching | 22 |
Independent Study | 106 |
Total study time | 150 |
Resources & Reading list
Textbooks
Ritchie C. Relational Database Principles (Letts Educational).
Rolland FD. The Essence of Database. Prentice Hall.
Assessment
Assessment strategy
Assessment will be by coursework assignments and will be a mix of individual and group work.
Summative
This is how we’ll formally assess what you have learned in this module.
Method | Percentage contribution |
---|---|
Coursework | 15% |
Coursework | 15% |
Coursework | 40% |
Coursework | 30% |
Referral
This is how we’ll assess you if you don’t meet the criteria to pass this module.
Method | Percentage contribution |
---|---|
Coursework assignment(s) | 100% |
Repeat Information
Repeat type: Internal & External