CMPS 4430 Introduction to Data Science
What is Data Science
Data science is the study of data to extract meaningful insights for business. It is a multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts of data. This analysis helps data scientists to ask and answer questions like what happened, why it happened, what will happen, and what can be done with the results. (Definition by Amazon)
Ethical problems in data science
Data Summaries Link Data Central Tendency Standard Deviation / Variance |
Data Plots / Visualized
Summarization Link Basic Plots Boxplot Histogram Quantile plots Heatmap / Mesh |
Data Visualization Link |
Simpson’s Paradox |
Basic
Tools |
Measurements Link Distance Measure Similarity/Corrolation |
Statistical Analysis Link Z-test t-test U-test Statistical Dependence p-value Confidence Interval ANOVA table |
Data Preprocessing Link Normalization Data Sampling Data Cleaning |
Evaluations Link Regression Problems Classification Problems Clustering Problems |
Regression Link Simple Linear Regression Polynomial Regression Curve Fitting Logistic Regression |
Optimization
Problems |
Classic Optimization Link Brute Force Greedy Algorithm Dynamic Programming |
Stochastic Algorithms Link GA ES DE PSO ACO |
Learning Types
Statistical Modeling