Data Science Foundations


Quantitative approaches have a common concern: How can others be confident that our statistical approaches have been brought to bear on appropriate datasets? This course, INF3104H: Data Science Foundations, focuses on the ‘data’ of data science. It develops in students an appreciation for the many ways in which dealing with a dataset can get out-of-hand, and establishes approaches to ensure data science is conducted in ways that engenders trusted findings. It focuses not only on statistical modelling, but also on everything that comes before modelling, and by focusing on those steps places modelling and analysis on a more firm foundation. In assessment, students will conduct end-to-end data science projects using real-world data, enabling them to fulling understand potential pitfalls.

The focus of the learning will be on:

1. actively reading and consider relevant literature;

2. actively using the statistical programming language R in real-world conditions;

3. gathering, cleaning, and preparing datasets; and 4. choosing and implementing statistical models and evaluating their estimates.

Note: PhD students only