Department: PhD in Computer Science
Module Description: This module provides students with an opportunity to gain an in depth understanding of the theories and issues on analytics and big data. In addition to covering Big Data technologies, such as Map Reduce concepts, Hadoop and HDFS. The course will cover how big data is collected, stored (Relational Algebra operators vs SQL syntax, Data Mining using SQL), and analysed (statistical, visualization, classification, and clustering techniques). Students will also be exposed to special types of datasets, including graphs and time series. Students will also learn about the main challenges faced when dealing with big data. Practical case studies will be used for illustration.
Leskovec, J., Rajaraman, A. and Ullman, J. D. (2020). Mining of massive datasets. 3rd edn. Cambridge: Cambridge University Press.
Cattell, R. (2010). Scalable SQL and NoSQL Data Stores. SIGMOD Record, vol. 39 (4). Open resource
Dave, M. and Gianey, H. (2016). Different clustering algorithms for big data analytics: a review. In Proceedings of the International Conference on System Modeling & Advancement in Research Trends (SMART), (pp. 328-333). IEEE. Request item
Domingos, P. (2012). A few useful things to know about machine learning. CACM, vol. 55(10).
Lazer, D., Kennedy, R., King, G. and Vespignani, A. (2014). The parable of Google flu trends: traps in big data analysis. Science, vol. 343(6176), pp. 1203-1205. Request item
Lohr, S. (2014). Google flu trends: the limits of big data. NYTimes, March 28. Open access
Wu, X. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, vol. 14.
Benford's Law (wikipedia)
Frequentist vs. Bayesian Perspectives explained, Jake Vanderplas, UW eScience Institute
http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/
http://jakevdp.github.io/blog/2014/06/06/frequentism-and-bayesianism-2-when-results-differ/
http://jakevdp.github.io/blog/2014/06/12/frequentism-and-bayesianism-3-confidence-credibility/
http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/