Department: PhD in Computer Science
Module Description: This module provides students with an opportunity to gain an in depth understanding of the theories and issues on analytics and big data. The course will cover how big data is collected, stored, and analysed. Students will also learn about the main challenges faced when dealing with big data. Practical case studies will be used for illustration.
There is no textbook.
Cattell, R. (2010). Scalable SQL and NoSQL Data Stores. SIGMOD Record, vol. 39 (4). Open resource
Dave, M. and Gianey, H. (2016). Different clustering algorithms for big data analytics: a review. In Proceedings of the International Conference on System Modeling & Advancement in Research Trends (SMART), (pp. 328-333). IEEE. Request item
Domingos, P. (2012). A few useful things to know about machine learning. CACM, vol. 55(10).
Lazer, D., Kennedy, R., King, G. and Vespignani, A. (2014). The parable of Google flu trends: traps in big data analysis. Science, vol. 343(6176), pp. 1203-1205. Request item
Lohr, S. (2014). Google flu trends: the limits of big data. NYTimes, March 28. Open access
Wu, X. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, vol. 14.
Benford's Law (wikipedia)
Frequentist vs. Bayesian Perspectives explained, Jake Vanderplas, UW eScience Institute
http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/
http://jakevdp.github.io/blog/2014/06/06/frequentism-and-bayesianism-2-when-results-differ/
http://jakevdp.github.io/blog/2014/06/12/frequentism-and-bayesianism-3-confidence-credibility/
http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/
Hans Rosling, The Joy of Stats
Pat Hanaran, Tools for Data Enthusiasts