Course Descriptions
Please refer to the current undergraduate catalog for full course descriptions. If any discrepancy exists between the descriptions below and those listed in the undergraduate catalog, requirements in the catalog always prevail.
Finding Eligible Data Science Courses
On YES, to select all courses approved for credit in the Data Science minor, select the “Advanced” link next to the search box, select the “Class Attributes” drop-down box on the bottom right of the advanced search page, and then select “Eligible for Data Science” to find all courses.
NOTE: Check YES for the most up-to-date course descriptions, prerequisites, exclusions, credit hours, and (for A&S) AXLE categories.
Introduction to Data Science
DS 1000 Data Science: How Data Shape Our World. Accessible, engaging, applied introduction to data science for students from all colleges and majors. Data summary and data visualization; causality and correlation; sampling, resampling, and uncertainty; prediction with linear regression; classification, clustering, and machine learning; ethics. Topics introduced with real-world datasets using a statistical programming language for hands on experience in data science. Note: DS 1000 will be taught for the first time in the Fall 2021 semester; students can substitute HOD 3200 Introduction to Data Science or PSCI 2300 Introduction to Data Science for Politics with permission from the Director of Undergraduate Data Science.
Computer Programming
see What Programming Course to Take?
DS 1100 / CS 1100 Applied Programming and Problem Solving with Python. Foundations of computing using Python. Programming fundamentals. Designing, debugging, running programs. Scalar, vector, and matrix computations for scientific computing and data science. Numeric and text processing. Basic data visualization techniques. Intended for students other than computer science and computer engineering majors. Note: DS / CS 1100 will be taught for the first time in the Fall 2021 semester.
CS 2201 Program Design and Data Structures (in C++). The study of elementary data structures, their associated algorithms and their application in problems; rigorous development of programming techniques and style; design and implementation of programs with multiple modules, using good data structures and good programming style. Prerequisite: CS 1101 (in Java) or 1104 (in Python).
CS 2204 Program Design and Data Structures for Scientific Computing (in Python). Data structures and their associated algorithms in application to computational problems in science and engineering. Time and memory complexity; dynamic memory structures; sorting and searching; advanced programming and program-solving strategies; efficient software library use. Prerequisite: CS 1104 (in Python).
Introduction to Statistics
DS 2100 Statistics for Data Science. Introduction to descriptive and inferential statistics using computational and resampling approaches from data science. Probability, measurement, random variables, distributions, central tendency, variability, confidence intervals, measures of uncertainty, estimation, prediction, hypothesis testing and inference, replicability, power, effect size, t-tests, correlation, univariate ANOVA, and simple linear regression. Examples from wide range of disciplines. Note: DS 2100 is expected to be taught for the first time in Fall 2021 or Spring 2022.
BME 2400 Quantitative Methods I: Statistical Analysis. Application of modern computing methods to the parametric and nonparametric statistical analysis of biomedical data. Probability, sampling, estimation, analysis of variance, single and multivariable regression, and the principles of hypothesis testing, experimental design and clinical trials are emphasized. No credit for students who have earned credit for BME 3200. Prerequisite: MATH 2300. Corequisite: CS 1101 or 1103 or 1104.
BSCI 3270 Statistical Methods in Biology. An introduction to statistical methods used in the analysis of biological experiments, including the application of computer software packages. Emphasis on testing of hypotheses and experimental design. Topics include descriptive statistics, analysis of variance, regression, correlation, contingency analysis, and the testing of methods for sampling natural populations. Prerequisite: BSCI 1511.
CE 3300 Risk, Reliability, and Resilience Engineering. Fundamental concepts in probability and statistical inference. Counting methods, discrete and continuous random variables, and their associated distributions. Sampling distributions, point estimation, confidence intervals, and hypothesis testing. Applications of probability and statistics to risk, reliability, and resilience of engineering systems. Prerequisite: MATH 2300.
ECON 1500 Economic Statistics. The use of quantitative data in understanding economic phenomena. Probability, sampling, inference, and regression analysis. Not open to students who have earned credit for 1510. Total credit for this course and 1510 will not exceed 3 credit hours. Credit hours reduced from second course taken (or from test or transfer credit) as appropriate. Prerequisite: MATH 1100, 1200, or 1300.
ECON 1510 Intensive Economic Statistics. Quantitative techniques in economic analysis. Probability sampling, inference, and multiple regression. Not open to students who have earned credit for 1500. Total credit for this course and 1500 will not exceed 3 credit hours. Credit hours reduced from second course taken (or from test or transfer credit) as appropriate. Prerequisite: MATH 1100, 1200 or 1300.
MATH 2810 Probability and Statistics for Engineering. Discrete and continuous probability functions, cumulative distributions. Normal distribution. Poisson distribution and Poisson process. Conditional probability and Bayes’ formula. Point estimation and interval estimation. Hypothesis testing. Covariance and correlation. Linear regression theory and the principle of least squares. Monte Carlo methods. Prerequisite: MATH 2300, 2310, or 2501.
MATH 2821 Introduction to Applied Statistics. Review of basic applied statistics. Analysis of variance as a technique for interpreting experimental data. Generalized likelihood ratio principle, confounding, multiple comparisons, introduction to response surface methodology, and nonparameteric methods. Experimental designs: completely randomized, nested, orthogonal contrasts, randomized block, Latin squares, factorial, and fractional factorial. Prerequisite: MATH 2810 or 2820.
PSY 2100 Quantitative Methods. Principles and methods for the statistical analysis of experiments, with emphasis on applications in psychology. Descriptive and inferential statistics. Prerequisite: 1111 section 1, 2, or 3 or 1200; or a major in Child Development, Child Studies, or Cognitive Studies.
PSY-PC 2110 Introduction to Statistical Analysis. Introductory course emphasizes selection, application, and interpretation of measures of relative frequency, location, dispersion, and association. Approaches to statistical inferences are emphasized.
SOC 2100 Statistics for Social Scientists. Descriptive and inferential statistics with social science research applications. Sampling issues; describing data with measures of central tendencies and dispersion; hypothesis testing using categorical and continuous indicators; multivariate techniques for continuous, categorical, and time dependent data. Limited to majors and minors in Sociology, Public Policy Studies, and Communication of Science and Technology, with preference given to Sociology majors and minors.
Data Science Fundamentals
DS 3100 Fundamentals of Data Science. Obtaining, manipulating, processing, cleaning, wrangling, visualizing, and analyzing data; effectively communicating results from data analyses. Imputation, multiple linear and logistic regression, regularization, dimensionality reduction, maximum likelihood, model selection, general linear model. Ethics, privacy, and security in data science. Statistical computing for data science in R. Prerequisites: A course in introductory computer programming (CS 1100, 1101, 1104, or equivalent). A course in introductory statistics (DS 2100, BME 2400, BSCI 3270, CE 3300, ECON 1500 or 1510, MATH 2810 or 2821, PSY 2100, PSY-PC 2110, SOC 2100, or equivalent).
Machine Learning
DS 3262 / CS 3262 Applied Machine Learning. Fundamentals of machine learning with emphasis on practical applications to data science problems. Supervised learning (linear and logistic regression, decision trees, support vector machines, neural networks, and deep learning), unsupervised learning (feature selection, data clustering, dimensionality reduction); ethical principles and social implications of machine learning. Intended for students other than computer science majors. Prerequisites: One of CS 1100, CS 2201, or CS 2204; one of DS 2100, BME 2400, BSCI 3270, CE 3300, ECON 1500 or 1510, MATH 2810 or 2820 or 2821, PSY 2100, PSY-PC 2110, SOC 2100. Note: DS / CS 3262 is expected to be taught for the first time in Spring 2022.
CS 4262 Foundations of Machine Learning. Theoretical and algorithmic foundations of supervised learning, unsupervised learning, and reinforcement learning. Linear and nonlinear regression, kernel methods, support vector machines, neural networks and deep learning methods, instance-based methods, ensemble classifiers, clustering and dimensionality reduction, value and policy iteration. Explainable AI, ethics, and data privacy. Prerequisite: CS 3251; one of MATH 2810, 2820, or 3640; one of MATH 2410, 2500, 2501, or 2600.
ECON 3750 Econometrics for Big Data. Econometric methods for analyzing large datasets. Model selection, regularization, classification, resampling, tree-based methods, and support vector machines. Forecasting stock prices, prediction of housing prices, and determination of wages. Prerequisite: ECON 3010 or 3012; either ECON 3032, 3035, 3050; or MATH 2820L with MATH 2810 or 2820.
MATH 3670 Mathematical Data Science. Linear methods for regression and classification, bias-variance tradeoff, and basis expansions and regularization. Kernel methods, support vector machines, dimension reduction, and clustering algorithms. Prerequisite: one of MATH 2810, 2820, or 3641; and one of MATH 2410, 2501, or 2600.
Electives
A. Intermediate / Advanced Programming, Modeling, Simulation
ASTR 3800 Structure Formation in the Universe. Observational and theoretical aspects of extragalactic astronomy. Measurements of galaxies and of the large-scale structure of the universe from galaxy surveys. Expansion history of universe; roles of dark matter and energy. Growth of density fluctuations in universe due to gravity. Cosmological N-body simulations and formation of dark matter halos. Physics of galaxy formation. Experimental probes of dark matter and energy. Prerequisite: One of PHYS 1501, 1601, or 1911; and one of PHYS 1502, 1602, or 1902; and one of MATH 2400, 2420, or 2610; and one of CS 1101, 1103, or 1104.
BME 4310 Modeling Living Systems for Therapeutic Bioengineering. Computer modeling and simulation in therapeutic bioengineering processes. Building computer models and using modern modeling software tools. Numerical techniques to solve differential equations, and origin of mathematical models for biotransport, biomechanics, tumor/virus growth dynamics, and model-based medical imaging techniques. Prerequisite: MATH 2400 or 2420; CS 1101 or 1103 or 1104; BME 2100.
BSCI 3271 Programming for Biologists. Modern biological research generates rich datasets that cannot be efficiently analyzed without computer programming. In this course, you will learn the fundamentals of Matlab, a widely used programming language in the field. You will gain hands-on coding experience working with real biological datasets. The course will cover a broad range of topics, from fundamental syntax and coding practices to data visualization and analysis. Upon exiting the course, you will be equipped to integrate programming into your own research. This course assumes no coding experience but is also appropriate for students looking to brush up and expand on their existing programming knowledge.
CHBE 4830 Molecular Simulation. Modern tools of statistical mechanics, such as Monte Carlo and molecular dynamics simulation, and variations. Methods, capabilities, and limitations of molecular simulation and applications to simple and complex fluids relevant to the chemical and related processing industries. Prerequisite: CHBE 3200, CHEM 3300.
CHEM 5410 Molecular Modeling Methods. Computer simulation studies of molecules with emphasis on applications to biological molecules and complexes. Background theory, implementation details, capabilities and practical limitations. Prerequisite: CHEM 3300 and 3310. Includes one three hour laboratory per week.
CHEM 5420 Computational Structural Biochemistry. Theoretical and practical aspects of protein sequence alignments, secondary structure prediction, comparative modeling, protein-protein and protein-ligand docking. Structure-based drug design, virtual screening, quantitative structure activity relations, cheminformatics, and pharmacophore mapping in therapeutic development. Prerequisite: CHEM 3310.
EES 4760 Agent and Individual Based Computational Modeling. Applications in natural, social, and behavioral sciences and engineering. Designing, programming, and documenting models. Using models for experiments. Examples from environmental science, ecology, economics, urban planning, and medicine. Familiarity with basic statistics and proficiency in algebra are expected.
MATH 3660 Mathematical Modeling in Economics. Modeling microeconomic problems of supply and demand, profit maximization, and Nash equilibrium pricing. Auctions and bargaining models. Statistical models and data analysis. Computational experiments. Prerequisite: 2300, 2310, or 2501.
ME 4271 Fundamentals of Robotic Manipulators. History and application of robots. Robot configurations including mobile robots. Spatial descriptions and transformations of objects in three-dimensional space. Forward and inverse manipulator kinematics. Task and trajectory planning, simulation and off-line programming. Prerequisite: MATH 2410.
ME 4284 Modeling and Simulation of Dynamic Systems. Incorporates bond graph techniques for energy-based lumped-parameter systems. Includes modeling of electrical, mechanical, hydraulic, magnetic and thermal energy domains. Emphasis on multi-domain interaction. Prerequisite: ME 3234.
ME 4263 Computational Fluid Dynamics and Multiphysics Modeling. Computational modeling of viscous fluid flows and thermal-fluid-structure interaction. Computational techniques including finite-difference, finite-volume, and finite-element methods; accuracy, convergence, and stability of numerical methods; turbulence modeling; rotating machinery; multiphase flows; and multiphysics modeling. Prerequisite: ME 3224.
ME 4275 Finite Element Analysis. Development and solution of finite element equations for solid mechanics and heat transfer problems. Commercial finite element and pre- and post-processing software. Two lectures and one three-hour laboratory each week. Prerequisite: CE 2205, MATH 2420.
PHYS 3790 Computational Physics. Topics in modern physics analyzed exclusively with computer programs. Finite difference approaches to the Schrödinger and Maxwell equations. Solutions of nonlinear equations. Molecular dynamics. Monte Carlo simulations. Growth models and random walks. Prerequisite: Any three of PHYS 2255, 2275, 2290, 3200, 3651.
PSY 4218 Computational Cognitive Modeling. Computational modeling of human perception and cognition. Model implementation, parameter estimation, and statistical model evaluation; developing and testing new models; stochastic processes, simulation, and Monte Carlo methods; high-performance computing. Recommended: prior (or concurrent) completion of PSY 3120, 3760, 3775, 3780, or NSC 3270. Prerequisite: one of CS 1101, 1103, or 1104.
PSY 4219 Scientific Computing for Psychological and Brain Sciences. Computer programming, scientific computing methods, and high performance computing applied to psychological and brain sciences problems, such as experimental control, data analysis and visualization, image and signal processing, optimization, and simulation. Some prior coursework in psychology or neuroscience recommended. Prerequisite: DS 1100/CS 1100 or CS 1101 or 1104.
PSY 4775 Models of Memory. Mathematical and computational models of the cognitive processes underlying human memory. Attribute-based models, instance theories, neural network models, retrieved-context models, executive function, and working memory models. Methods of fitting models to empirical data. Prerequisite: PSY 3775, and one of CS 1101, 1103, or 1104.
SC 3250 Scientific Computing Toolbox. Use of computational tools in multiple science and engineering domains. Simulations of complex physical, biological, social, and engineering systems, optimization and evaluation of simulation models, Monte Carlo methods, scientific visualization, high performance computing, or data mining. Prerequisite: CS 2201 or 2204; MATH 1100 or higher.
SC 3260 High Performance Computing. Parallel computing, grid computing, GPU computing, data communication, high performance security issues, performance tuning on shared-memory-architectures. Prerequisite: CS 2201 or 2204.
B. Intermediate / Advanced Probability, Statistics, and Data Analysis
ASTR 8070 Astrostatistics. Statistical and computational techniques for data-mining and inference in an astronomical context. Probability theory, comparison of frequentist and Bayesian inference. Strategies for data exploration and visualization. Approaches to regression, parameter estimation, and model selection (e.g. Markov chain Monte Carlo). Overview of time-series analysis and deep-learning techniques.
BIOS 6311 Principles of Modern Biostatistics. This is the first in a two-course series designed for students who seek to develop skills in modern biostatistical reasoning and data analysis. Students learn the statistical principles that govern the analysis of data in the health sciences and biomedical research. Traditional probabilistic concepts and modern computational techniques will be integrated with applied examples from biomedical and health sciences. Statistical computing uses software packages STATA and R; prior familiarity with these packages is helpful but not required. Topics include: types of data, tabulation of data, methods of exploring and presenting data, graphing techniques (boxplots, q-q plots, histograms), indirect and direct standardization of rates, axioms of probability, probability distributions and their moments, properties of estimators, the Law of Large numbers, the Central Limit Theorem, theory of confidence intervals and hypothesis testing (one sample and two sample problems), paradigms of statistical inference (Frequentist, Bayesian, Likelihood), introduction to non-parametric techniques, bootstrapping and simulation, sample size calculations and basic study design issues. One hour lab required; Students are required to take 6311L concurrently. Prerequisite: Calculus I.
BIOS 6312 Modern Regression Analysis. This is the second in a two-course series designed for students who seek to develop skills in modern biostatistical reasoning and data analysis. Students learn modern regression analysis and modeling building techniques from an applied perspective. Theoretical principles will be demonstrated with real-world examples from biomedical studies. This course requires substantial statistical computing in software packages STATA and R; familiarity with at least one of these packages is required. The course covers regression modeling for continuous outcomes, including simple linear regression, multiple linear regression, and analysis of variance with one-way, two-way, three-way, and analysis of covariance models. This is a brief introduction to models for binary outcomes (logistic models), ordinal outcomes (proportional odds models), count outcomes (poisson/negative binomial models), and time to event outcomes(Kaplan-Meier curves, Cox proportional hazard modeling). Incorporated into the presentation of these models are subtopic topics such as regression diagnostics, nonparametric regression, splines, data reduction techniques, model validation, parametric bootstrapping, and a very brief introduction to methods for handling missing data. One hour lab required. Students are required to take 6312L concurrently. Prerequisite: Biostatistics 6311 or equivalent; familiarity with STATA and R software packages.
BIOS 6341 Fundamentals of Probability. The first in a two-course series (6341 – 6342), Fundamentals of Probability introduces and explores the probabilistic framework underling statistical theory. Students learn probability theory — the formal language of uncertainty — and its application to everyday statistical concepts and analysis methods. Students will validate analytical solutions and explore limit theorems using R software. This course covers probability axioms, probability and sample space, events and random variables, transformation of random variables, probability inequalities, independence, discrete and continuous distributions, expectations and variances, conditional expectation, moment generating functions, random vectors, convergence concepts (in probability, in law, almost surely), Central Limit Theorem, weak and strong Law of Large Numbers, extreme value distributions, order statistics and exponential family.
BIOS 6342 Contemporary Statistical Inference. The second in a two-course series (6341 – 6342), Contemporary Statistical Inference introduces and explores the fundamental inferential framework for parameter estimation, testing hypotheses, and interval estimation. Students learn classical methods of inference (hypothesis testing), and modes of inference (Frequentist, Bayesian and Likelihood approaches) and their surrounding controversies. Topics include: delta method, sufficiency, minimal sufficiency, exponential family, ancillarity, completeness, conditionality principle, Fisher’s Information, Cramer-Rao inequality, hypothesis testing (likelihood ratios test, most powerful test, optimality, Neyman-Pearson lemma, inversion of test statistics), Likelihood principle, Law of Likelihood, Bayesian posterior estimation, Interval estimation (confidence intervals, support intervals, credible intervals), basic asymptotic and large sample theory, maximum likelihood estimation, resampling techniques (e.g., bootstrap).
BIOS 7362 Advanced Statistical Inference and Statistical Learning. This course is an in-depth examination of modern inferential tools. Topics include High-order asymptotics, Edgeworth expansions, nonparametric statistics, quasi-likelihood and estimating equations theory, multivariate classification methods, re-sampling techniques, statistical learning, methods and theory of high-dimensional data, estimation-maximization (EM) algorithms, and Gibbs sampling. Concepts are illustrated in biomedical applications whenever possible.
BIOS 8366 Advanced Statistical Computing. Course covers numerical optimization, Markov Chain Monte Carlo (MCMC) estimation-maximization (EM), algorithms, Gaussian processes, Hamiltonian Monte Carlo, and data augmentation algorithms with applications for model fitting and techniques for dealing with missing data. Prerequisite: BIOS 6341 and BIOS 6342 or permission of instructor.
BME 4420 Quantitative and Functional Imaging. Quantitative analysis of non-invasive imaging techniques to assess the structure and function of tissues in the body. Applications of computed tomography, positron emission tomography, ultrasound, and magnetic resonance imaging to tissue characterization. Measurement of lesion volume, cardiac output, organ perfusion, brain function, and receptor density. Prerequisite: CS 1101 or 1103 or 1104; PHYS 1602; MATH 2400.
BSCI 5890 Special Topics in Biological Sciences: Big Data For Biologists. The class will focus on data manipulation, visualization, and analysis in R. Focus will also be on multidimensional datasets (e.g., ‘omics datasets, large health datasets). There will be a small amount of lecture but most of the in class time will be spent on problem sets and hands on coding activities.
CE 4320 Data Analytics for Engineers. Programming, analysis, and visualization of real data for the purposes of informing decision making in engineering problems. Statistical modeling in a practical and applied perspective; application of data analytics to bridge the gap between data and decisions; fundamentals of design of experiments. Prerequisite: CE 3300 or MATH 2810 or MATH 2820.
CSET 3410 Telling Stories with Data. Public understanding of complex issues in the age of big-data. Data analysis using Tableau. Narratives created as personal stories, posters, games, maps, sculptures, and sound from large data sets. Emotive, aesthetic, and practical effects of different presentation methods; metrics for assessing impact. Studio approach with case studies, and hands-on work with tools and technologies. No credit for students who completed CSET 3890-01 offered fall 2020.
ECON 3032 Applied Econometrics. Quantitative economic analysis with emphasis on multivariate regression. Measurement, specification, estimation, inference, prediction, and interpretation of econometric models. Experience with data and computer applications. Not open to students who have earned credit for 3035 or 3050. Total credit for this course and 3035 will not exceed 3 credit hours. Total credit for this course and 3050 will not exceed 3 credit hours. Credit hours reduced from second course taken (or from test or transfer credit) as appropriate. Prerequisite: 1020; either 1500, 1510, or both MATH 2820L and either MATH 2810 or 2820; and either Math 1201 or 1301.
ECON 3035 Econometric Methods. Properties and problems in estimating economic relationships with multiple regression. Statistical and econometric theory to address empirical questions. Hands-on experience with economic data analysis with programming in statistical software. Prerequisite: ECON 1020, either ECON 1500, 1510, or both MATH 2820L and either MATH 2810 or 2820; and either MATH 1201 or 1301.
ECON 3330 Economics of Risk. Decision making under risk and uncertainty. Expected utility, risk aversion, and the value of information. Investments, insurance, and lotteries. Moral hazard and adverse selection. Prospect theory. Prerequisite: ECON 3010 (or 3012) with either ECON 1500 or 1510; or MATH 2820L with either MATH 2810 or 2820.
ECON 4050 Topics in Econometrics. Emphasis on applications. May include generalized method of moments, empirical likelihood, resampling methods, and nonparametric techniques. Prerequisite: ECON 3032, 3035, or 3050.
EES 3310 Global Climate Change. Scientific principles and policy applications. Earth’s past; evidence of human impact; future climate change; and economic, social, and ecological consequences. Economic, technological, and public policy responses. Serves as repeat credit for EES 2110. Prerequisite: one of EES 1030, 1080, 1510, BSCI 1510, CHEM 1601, ECON 1010, ES 1401 or PHYS 1501, 1601, 1901.
MATH 3640 Probability. Combinatorics, probability models (binomial, Poisson, normal, gamma, etc.), Stochastic independence, generating functions, limit theorems and types of convergence, bivariate distributions, transformations of variables. Markov processes and applications. Prerequisite: MATH 2810 or 2820. Co-requisite MATH 2410, 2501, or 2600.
MATH 3641 Mathematical Statistics. Distribution theory, order statistics, theory of point estimation and hypothesis testing, normal univariate inference, Bayesian methods, sequential procedures, regression, nonparametric methods. Students interested in applications may take MATH 2820L. Prerequisite: MATH 3640.
MATH 4650 Financial Stochastic Processes. The theory of stochastic processes and applications to financial economics. Brownian motion; martingales; Itô’s Lemma; stochastic integration. Monte Carlo simulations with variance reduction techniques. Applications include discretetime option pricing and delta hedging. Prerequisite: MATH 3650 and either MATH 2810, 2820, or 3640.
PPS 3200 Research Methods for Public Policy Analysis. Surveys in public policy analysis. Types, design, modes of implementation, sampling strategies, and data collection. Data management, cleaning, and analysis.
PPS 3250 Advanced Quantitative Methods for Public Policy. Causal inference, the empirical toolkit for public policy analysis. Potential outcomes framework, multivariate regression, matching estimators, randomized controlled trials, instrumental variables, difference-in-differences, and regression discontinuity.
PSCI 2310 Understanding Policy Data: Analysis and Interpretation. This course addresses the methodology of empirical social science research, with an emphasis on research design and interpretation – the inferences we can draw from a given study and how broadly they apply. We will cover all aspects of the research process – from identifying a research question to collecting data to establishing and implementing a suitable model for analysis. We will also be active consumers of social science reporting – how is this type of research characterized in the news media?
PSCI 3249 American Public Opinion and American Politics. Origins and effects of public opinion on politics in the United States. Influence of values, emotion, prejudice, and news information on individual political views. Prerequisite or corequisite: 1100, 1101, 1102, 1103, or 1150.
PSCI 3893 Selected Topics in American Government – Media & Data in American Politics. This course provides an overview of the media’s role in American political life. Unlike typical courses on media and politics, this course will not focus exclusively on the effects of political campaigns. Instead, we will construe this topic more broadly to include the effects of media on public opinion of all kind, but with a focus on recent phenomena, ranging from the rise of partisan media and soft news to fake news and social media. Prerequisite or corequisite: 1100, 1101, 1102, 1103, or 1150.
PSY 4220 Bayesian Modeling with Python. Statistical and cognitive modeling. Models of memory, psychophysics, categorization, and decision-making. Probabilistic programming in Python. Bayesian parameter estimation and model comparison.
PSY-PC 2120 Statistical Analysis. Second course in statistics for undergraduates. Multifactor analysis of variance designs (including repeated measures), and goodness of fit and contingency analyses. Prerequisite: PSY-PC 2110 or PSY 2100.
PSY-PC 3722 Psychometric Methods. Covers the fundamental concepts of psychological measurement and testing, examines a sample of most important psychometric instruments in current use, provides observation of testing, and considers knowledge essential to making wise use of testing information in research and applied child development settings. Prerequisites: PSY-PC 1250 or 1205/1207 or PSY 1200 and PSY-PC 2110 or PSY 2100.
PSY-PC 3724 Psychometrics. The basic objectives of this course are for students to learn the fundamental concepts, methods, and principles of educational and psychological measurement. Particular attention will be devoted to reliability and validity issues underlying psychometric theory from original sources, and how psychometric theory relates to the assessment of individual differences or human psychological diversity more generally. Students should choose between PSY-PC 3722 and this course inasmuch as credit for both is not allowed. This course is more demanding in that students will be reading original sources; it is especially relevant to students seeking advanced training in the social sciences or research careers. Prerequisite: PSY-PC 2110 or PSY 2100 and PSY-PC 2120.
PSY-PC 3738 Introduction to Item Response Theory. Students are introduced to the basic concepts of educational and psychological measurement, classical test theory (CTT), and item response theory (IRT). These concepts will be taught with practice by illustrating the construction of tests. Prerequisite: PSY-PC 2110 or PSY 2100 and PSY-PC 3722.
PSY-PC 3743 Factor Analysis. This course covers primarily Exploratory Factor Analysis (EFA), which is extensively used in psychology, education, medicine, and management to investigate the underlying dimensionality of unobserved constructs (e.g., intelligence, psychopathology). The theory behind factor analysis is covered alongside hands-on application to data, exposure to uses of factor analysis in the applied literature, and instruction in popular EFA software. Key topics include model specification, fit and evaluation, rotation methods, questionnaire development, sample size and power issues, and extensions to confirmatory factor models. Prerequisite: PSY-PC 2110 or PSY 2100 (or equivalent), and PSY-PC 2120.
PSY-PC 3749 Applied Nonparametric Statistics. This course covers nonparametric statistical methods useful when the assumptions of ordinary parametric statistics are not met, and for developing custom statistical techniques useful when other methods do not exist. Coverage is given to distribution-free procedures, sign tests, contingency tables, median tests, chi-square and other goodness-of-fit tests, rank correlations, randomness tests, ordinal regression, Monte Carlo methods, resampling methods (bootstrap and jackknife), tests of independence, 1-sample, 2-sample, and k-sample methods, permutation tests, function smoothing, and splines. Emphasis is placed on underlying theory, application to data, and software. Prerequisite: PSY-PC 2110 or PSY 2100 (or equivalent), and PSY-PC 2120.
PSY-GS 8867 Multivariate Statistics (Formerly PSY-PC 3746). This course covers several classical multivariate statistical methods. Topics include principal component analysis with rotation, canonical correlations, multidimensional scaling, correspondence analysis and hierarchical clustering. Matrix algebra and basics about multivariate data will be taught at the beginning. The course has both theoretical and applied components. R will be used as the primary computing tool. Prerequisite: PSY-PC 2110 or PSY 2100 (or equivalent), and PSY-PC 2120.
PSY-PC 3737 Structural Equation Modeling. This course introduces the basic principles of path analysis, confirmatory factor analysis, and latent variable structural modeling, which constitutes a powerful set of statistical tools for examining correlational, observational, and even experimental data in the social sciences. Computer techniques for conducting these analyses will also be taught: the LISREL program in particular, but AMOS will also be introduced.
PSY-PC 3732 Latent Growth Curve Modeling. The analysis of longitudinal data (repeated measurements on the same people over time) is central for evaluating many theories in social science and educational research. This applied course will focus on one flexible and powerful approach for analyzing within individual change over time, and between individual differences in change: the latent growth curve model. Emphasis will be placed on applications to real data, interpretation of results, and attaining a solid understanding of the statistical model. Prerequisite: PSY-PC 2110 or PSY 2100 (or equivalent), and PSY-PC 2120.
PSY-PC 3727 Modern Robust Statistical Methods. Covers modern statistical methods designed to handle violations of statistical assumptions that can compromise classic parametric procedures. More specifically, the student will learn about the classic assumptions of independence, normality, and equal variances that underlie many standard procedures, and become familiar with modern methods that perform vastly better than the classic procedures when assumptions are violated, yet offer few performance penalties under many realistic situations where assumptions are violated. Prerequisite: PSY-PC 2110 or PSY 2100 (or equivalent), and PSY-PC 2120.
PSY-PC 7878 Statistical Consulting. The objective of this course is to prepare students for providing statistical consulting in collaborative applied research settings. Statistical consulting skills are increasingly vital for research and analytic jobs in industry, education, medicine, and academia. Yet a variety of data analysis experiences beyond formal methodological coursework are needed to hone statistical consulting skills. Students work in a mentored environment on statistical and theoretical problems confronted by applied researchers in real data analysis settings within the social sciences and education. Students work in small groups or individually on consulting projects and also have opportunities for providing constructive feedback on others’ projects. This course will synthesize and further develop students’ understanding of how to translate subject-matter questions into statistical language, select an appropriate statistical method, research and develop workable solutions to new problems, write an analysis plan, and effectively communicate results through oral and written reports. This course will not only focus on the content of statistical consulting but also on the process – covering how to communicate effectively, professionally, and ethically with clients about expectations, responsibilities, hypotheses, analyses, and results. Permission of Instructor required.
C. Machine Learning, Visualization, Data Science
ANTH 3050 Special Topics: A.I. and Material Culture. Applying artificial intelligence to archaeology. Statistics, machine learning, and deep learning. Hands-on examples from archaeological excavations as well as laboratory studies. (Offered Fall 2023)
ANTH 3261 Introduction to Geographic Information Systems and Remote Sensing. Computerized graphics and statistical procedures to recognize and analyze spatial patterning. Spatial data-collection, storage and retrieval; spatial analysis and graphic output of map features. Integration of satellite imagery with data from other sources through hands-on experience. Assumes basic knowledge of computer hardware and software.
ANTH 3867 Digital Archaeology. Laboratory analysis of archaeological artifacts using digital methods. Three dimensional modeling of artifacts, digital photography, artifact technical diagramming. Virtual Reality and other representational frameworks. PXRF compositional analysis. Artifact cleaning, labeling, and preservation techniques. Assemblage curation and integration with databases. Preparation of artifacts for exhibition. Ethics of curation, representation, repatriation.
ASTR 8080 Data Mining in Large Astronomical Surveys. The manipulation and analysis of catalog-level data from large astronomical surveys. Survey observations, cross-matching catalogs, statistical analysis, version control. Emphasis on development of code and best practices.
BME 3890 Computational Genomics. The course covers computational algorithms for processing and analyzing genomic data including genome assembly, alignment, haplotype phasing, single cell RNA-Seq analysis, and multi-omics. Both algorithms and biological background necessary for engineering students to appreciate their application will be covered. Students will also get familiar with current software tools for the analysis of real sequencing data.
BME 4420 Quantitative and Functional Imaging. Quantitative analysis of non-invasive imaging techniques to assess the structure and function of tissues in the body. Applications of computed tomography, positron emission tomography, ultrasound, and magnetic resonance imaging to tissue characterization. Measurement of lesion volume, cardiac output, organ perfusion, brain function, and receptor density. Prerequisite: CS 1101 or 1103 or 1104; PHYS 1602; MATH 2400.
BMIF 6310 Foundations of Bioinformatics. This survey course introduces students to the experimental context and implementation of key algorithms in bioinformatics. The class begins with a review of basic biochemistry and molecular biology. The group will then focus on algorithms for matching and aligning biological sequences, given the context of molecular evolution. The emphasis will move from comparing sequences to the systems developed to enable high-throughput DNA sequencing, genome assembly, and gene annotation. Gene products will be the next focus as students consider the algorithms supporting proteomic mass spectrometry and protein structure inference and prediction. The informatics associated with transcriptional microarrays for genome-wide association studies will follow. Finally, the class will examine biological networks, including genetic regulatory networks, gene ontologies, and data integration. Formal training in software development is helpful but not required. Students will write and present individual projects. Undergraduates need the permission of the instructor to enroll.
BMIF 6315 Methodological Foundations of Biomedical Informatics. In this course, students will develop foundational concepts of computation and analytical thinking that are instrumental in solving challenging problems in biomedical informatics. The course will use lectures and projects directed by co-instructors and guest lecturers.
BMIF 7380 Data Privacy in Biomedicine. This course introduces students to concepts for evaluating and constructing technologies that protect personal privacy in data collected for primary care and biomedical research. Material in this course touches on topics in biomedical knowledge modeling, data mining, policy design, and law. Prerequisite: students are expected to be proficient in writing basic software programs, although no specific programming language is required.
BSCI 3272 Genome Science. Aims and importance of the science. Retrieval of genome data from public databases; experimental and computational methods used in analysis of genome data and their annotation. Functional aspects of genomics, transcriptomics, and proteomics; use of phylogenetics and population genomics to infer evolutionary relationships and mechanisms of genome evolution. Prerequisites: 1511.
CS 3265 Introduction to Database Management Systems. Logical and physical organization of databases. Data models and query languages, with emphasis on the relational model and its semantics. Data independence, security, integrity, concurrency. Prerequisite: CS 2201.
CS 3891 Special Topics: Data Visualization. Data visualization is concerned with the mapping of data into visual representations in order to support humans in gaining insight. One form of visualization is a tabular spreadsheet. Yet, there are other ways to visualize data that would more quickly lead to insight. For instance, we might map each data sample to a point, or summarize subsets of the data with bars. Certain visual representations are designed to leverage the human visual processing system for efficiently identifying patterns, and often these patterns reflect the analyses we care about. For instance, we might identify a correlation between two variables by plotting our data as a set of points, or identify an outlier by observing how a lone bar is really small or tall in a bar plot. This is the power of data visualization: the efficient processing of visual representations that can amplify one’s cognition for understanding data. Prerequisites: CS 2201 or CS 2204; CS 2212 or MATH 2410
CS 3891 Special Topics: Social Network Analysis. Explores recent research on the analysis of social networks and on models and algorithms that are used to abstract their properties and make predictions. Key topics covered in this course are: Graph models; Network centrality measurements; Computational methods of link prediction, clustering and classification on graphs, and network diffusion; Deep learning on graphs including network embedding and graph neural network models and applications; Case studies in signed networks and knowledge graphs. Prerequisites: CS 3250 and CS 3251.
CS 3891 Special Topics: Computing and the Environment. This course will study computer science application to sustainability area problems. Sustainability topics include wildlife and other environmental monitoring and protection, energy use in buildings and devices, sustainable design, waste management, and planet and ocean modeling. The computing topics used to address these issues include artificial intelligence, machine learning, optimization, game theory, mobile computing, robotics, sensor networks, computer vision and acoustics, and algorithm and hardware design. The course has seminar components based on readings, but exercises, programming assignments, and projects too. This course will study computer science application to sustainability area problems. Sustainability topics include wildlife and other environmental monitoring and protection, energy use in buildings and devices, sustainable design, waste management, and planet and ocean modeling. The computing topics used to address these issues include artificial intelligence, machine learning, optimization, game theory, mobile computing, robotics, sensor networks, computer vision and acoustics, and algorithm and hardware design. The course has seminar components based on readings, but exercises, programming assignments, and projects too. Prerequisite: CS 3251
CS 3892 Special Topics: Projects in Machine Learning. Students work in small groups on the specification, design, implementation, and testing of a sizeable machine learning or deep learning project. Projects can be either application- or algorithm-oriented Prerequisites one of CS4262, CS3891-Deep Learning, EECE3891-Statistical Pattern Recognition
CS 4260 Artificial Intelligence. Principles and programming techniques of artificial intelligence. Strategies for searching, representation of knowledge and automatic deduction, learning, and adaptive systems. Survey of applications. Prerequisite: CS 3250, CS 3251; MATH 2810 or 2820 or 3640.
CS 4266 Topics in Big Data. Principles and practices of big data processing and analytics. Data storage databases and data modeling techniques, data processing and querying, data analytics and applications of machine learning using these systems. Prerequisite: CS 3251.
CS 4267 Deep Learning. Models, algorithms, mathematical tools, and machine learning concepts used in deep learning. Modern practical deep feedforward, convolutional, and recurrent networks. Regularization for deep learning and optimization. Practical design methods. Prerequisite: CS 3250, CS 3251; MATH 2410 or 2600; MATH 2810 or 2820.
CS 6362 Advanced Machine Learning. Theory and algorithms for designing systems that learn from data including modern machine learning methods that take advantage of increased complexity to provide improved performance. Data types, data pre-processing, measures of similarity and dissimilarity. Supervised learning: decision trees, logistic regression, support vector machines, Bayesian methods, and neural networks; unsupervised learning: partitional, hierarchical, density-based, and graph clustering algorithms. Feature selection for classification and clustering. Evaluation methods. Reinforcement learning: Markov Decision processes, dynamic programming, Monte Carlo methods, TD-learning. Prerequisite: CS 4262 or 5262 or 6360.
CS 8395 Visual Analytics & Machine Learning. This course is a research seminar on topics related to visual analytics and machine learning. Visual analytics is an area of data visualization that is concerned with improving a human’s analytic process, or how one makes sense of data for a given problem: understanding, reasoning, and making decisions about a provided dataset, and a given problem domain. Visual analytics, in particular, is concerned with combining automated processes, with human-driven processes that are built around data visualization – visual representations of data, and ways to interact with data. Given the rapid growth in machine learning the last decade, research in visual analytics has witnessed similar growth in leveraging machine learning in a variety of ways. This course will cover topics that live at the interface of visual analytics and machine learning, exposing you to the basics of visual analytics, how machine learning can be used to enhance visual analytics, and how visual analytics can help machine learning. Prerequisites: sufficient background in machine learning, basic understanding of deep learning methods.
CS 8395-02 Special Topics – Selected Topics in Deep Learning. Deep learning has become an inseparable part of many tasks in AI/ML, ranging from language understanding and speech and image recognition to machine translation, planning, game playing, and autonomous driving, among many others. Expertise in deep learning is in high demand both in academic and industrial settings. This course provides a well-rounded and hands-on introduction to Deep Learning and its numerous applications, allowing students to gain foundational knowledge on the topic and understand much of the current literature. More importantly, the course views deep learning through geometric principles that will enable us to unify different concepts in deep learning, making it relevant to students with a broad spectrum of research interests. Finally, the students will get practical experience in building neural networks using Python and PyTorch. Familiarity with Multivariate calculus, linear algebra, probability theory, Python 3 suggested. Prerequisite: CS 5262 or equivalent.
DS 3891 Special Topics in Data Science – Intro to Generative Artificial. Generative artificial intelligence (AI) models encompass ChatGPT, GPT4 and successors, and related transformer-based deep learning models. This introductory course aims to provide students with a comprehensive understanding of generative AI models and their applications. This course introduces the theory and use of these models, including prompt engineering, the building of agents and bots, and the basics of the architecture and training of transformer models. Students will explore the fundamental concepts, techniques, and algorithms used to create generative AI systems, such as deep learning and attentional mechanisms. Through a combination of interactive lectures, hands-on exercises, and projects, students will gain practical experience in designing, training, and evaluating generative AI models for various tasks.
ECE 4363 Applied Statistical Machine Learning. Application of mathematical techniques that form the foundation of machine learning and artificial intelligence. Probability and statistics, applications of Bayes theorem, matrix analysis, LMS and maximum likelihood estimation. Classification techniques, linear and basis function regressions. Estimation and sampling of probability distributions. Data partitioning and n-fold cross-validation. Recursive Bayesian estimation. Methods of dimensionality reduction. The perceptron, kernel methods, support vector machines, and Gaussian processes. Prerequisite: One of MATH 2810, MATH 2820, or BME 2400.
ECE 4354 Computer Vision. Vision as a computational problem. Theories of vision, inverse optics, image representation, and solutions to ill-posed problems. Prerequisite: EECE 4353.
ECON 3750 Econometrics for Big Data. Econometric methods for analyzing large datasets. Model selection, regularization, classification, resampling, tree-based methods, and support vector machines. Forecasting stock prices, prediction of housing prices, and determination of wages. Prerequisite: ECON 3010 or 3012; either ECON 3032, 3035, 3050; or MATH 2820L with MATH 2810 or 2820.
HIST 1590 Artificial Intelligence and Society. History and overview of artificial intelligence and robotics: socioeconomic, political, ethical, cultural, and environmental implications. Theories of scientific and technological innovation. Benefits and risks of advanced informatic machines. Ethical responsibilities of researchers. Short-term and long-term policies for governance and regulation.
MATH 3130 Fourier Analysis. Fourier series topics including convolution, Poisson kernels, Dirichlet kernels, and pointwise and mean-square convergence. Integral transforms including one-dimensional and multidimensional Fourier integrals, Fourier inversion formula and Plancherel theorem, Poisson summation formula, Radon transform, and X-ray transform. Fourier analysis on Abelian groups including finite Fourier analysis and fast Fourier transform. Applications to signal processing, Shannon sampling theory, and/or compressed sensing. Prerequisite: Either MATH 2501; or both MATH 2300 (or 2310) with either MATH 2410 or 2600.
MATH 3670 Mathematical Data Science. Linear methods for regression and classification, bias-variance tradeoff, and basis expansions and regularization. Kernel methods, support vector machines, dimension reduction, and clustering algorithms. Prerequisite: one of MATH 2810, 2820, or 3641; and one of MATH 2410, 2501, or 2600.
MATH 4620 Linear Optimization. Linear programming and its applications. Formulation of linear programs. The simplex method, duality, complementary slackness, dual simplex method, and sensitivity analysis. The ellipsoid method. Interior point methods. Applications to networks, management, engineering, and physical sciences. Familiarity with computer programming is expected. Prerequisite: either MATH 2410, 2600, or 2501.
MATH 4630 Nonlinear Optimization. Mathematical modeling of optimization problems. Theory of unconstrained and constrained optimization, including convexity and the Karush-Kuhn-Tucker conditions. Derivative- and non-derivative-based methods. Familiarity with computer programming is expected. Prerequisite: MATH 2501; or both MATH 2300 (or 2310) and either MATH 2410 or 2600.
MHS 3890 Special Topics – Introduction to Data Visualization. The course will cover introductory data manipulation and data visualization using the programming language R. Participants will learn how to load, manipulate, and merge data, as well as how to perform basic statistical analysis. The following topics will be covered: data types, sorting and filtering data, if-else statements, loops, and functions. The second part will cover the types and rules of data visualization and how to produce data visualizations in R. The types of visualizations covered: bar plots, line plots, scatter plots, histograms, matrix visualization, and maps. The class will use micro-level health data as well as demographic macro-level data from the human mortality and fertility databases. Participants will complete an independent project and give a final presentation. The class assumes basic knowledge of statistics, including linear regression. A prior statistics course or experience with statistical analysis highly recommended.
NSC 3270 Computational Neuroscience. Theoretical, mathematical, and simulation models of neurons, neural networks, or brain systems. Computational approaches to analyzing and understanding data such as neurophysiological, electrophysiological, or brain imaging. Demonstrations simulating neural models. Recommended: NSC 2201. Prerequisite: either CS 1101 or 1103 or 1104, and either MATH 1200 or 1300.
PSY-PC 3751 Exploratory and Graphical Data Analysis. Exploratory Data Analysis (EDA) is a modern statistical paradigm developed by John Turkey in the 1970’s. EDA emphasizes fitting mathematical models to data as preliminary to the traditional hypothesis testing approach used in confirmatory analyses. Hallmarks of EDA include graphical methods, residual analysis, robust/resistant statistical methods, and data re-expression/transformation. But EDA is actually a whole philosophy of data analysis, and includes treatment of ethics and propriety in research. In this class we study EDA, as it has developed over the past four decades. We also do a great deal of EDA. An “EDA Portfolio” is developed by each student of different data analysis and graphical analysis projects. Included within the course is treatment of “big data” and data mining approaches, and also discussion of the current “replication crisis” and its emphasis on Questionable Research Practices (QRP’s); EDA provides a certain type of prescriptive treatment of QRP’s. Prerequisites: PSY-PC 2110 or PSY-PC 2120 or PSY-PC 3735.
PSY-PC 7500-03 Special Topics Psychology and Human Development – Neural Network Models of Cognitive Development. The combined enrollment capacity of PSY-GS 8430-01 and PSY-PC 7500-03 is 30. Although seats may appear open in a section, if the combined capacity has been reached, the course will close. The goal of the course is to introduce the basic principles of parallel distributed processing (also known as connectionist or artificial neural-network modeling) and to illustrate how this framework has been used to provide novel insights into the processes and mechanisms supporting knowledge acquisition in infants and young children. Prerequisites: A course in developmental psychology as well as a course in (or familiarity with) basic differential calculus.
SOC-3242 AI in Social Systems. Artificial intelligence across social institutions. Inequality reflected and perpetuated through data and automation. Cutting-edge questions and problems from a sociological perspective.
D. Research Hours in Data Science
DS 3850 Undergraduate Research in Data Science. Development of research project by individual student under direction of a faculty sponsor. Project must involve (1) development, evolution, or implementation of data science methods, (2) application of data science to one or more fields in the physical, life, or social sciences, engineering, arts, or humanities, or (3) study impact of data on society and its institutions. Consent of both the faculty sponsor and Director of Undergraduate Data Science required.