Department: PhD in Computer Science
Module Description: The main aim of the module is to present the newest developments in the area of natural language processing (NLP) using algorithms and techniques of machine learning (ML). The majority of human knowledge is currently stored in the form of unstructured text. Abstracts, reviews, descriptions, posts, emails, tweets, all create a huge corpus of data which cannot be analyzed manually. Such textual corpora exist in almost all domains of science and technology. Computer methods for text analysis are collectively known as NLP. In the recent years we are witnessing a true revolution in NLP due to the development of machine learning methods designed specifically to tackle NLP challenges. During the lecture the students will learn basic NLP methods (such as tokenization, lemmatization, stemming), basic representation methods (such as one-hot encoding, TF-IDF), as well as corpus-based techniques (such as word and sentence vectors, transformer language models). We will discuss methods and recent directions for researches in sentiment and emotion analysis in text, named entity recognition, machine translation, sequence to sequence learning, and among others.
Jurafsky, D. & Martin, J. H. (2014). Speech and language processing. 2nd edn. Pearson Prentice Hall.
Goldberg, Y. (2017). Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, 10(1), pp. 1–309.
Clark, A., Fox, C. and Lappin, S. (eds.). (2012). The handbook of computational linguistics and natural language processing (Vol. 118). John Wiley & Sons.
Vajjala, S., Majumder, B., Gupta, A. and Surana, H., 2020. Practical natural language processing: a comprehensive guide to building real-world NLP systems. O'Reilly Media.
Bird, S., Klein, E. & Loper, E. (2009). Natural language processing with Python. O'Reilly.
Habash, N.Y., 2010. Introduction to Arabic natural language processing. Synthesis lectures on human language technologies, 3(1), pp.1-187.