Chengwei LEI, Ph.D.    Associate Professor

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield

 

CMPS 4430 Introduction to Data Science

 

 


 

What is Data Science

Data science is the study of data to extract meaningful insights for business. It is a multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts of data. This analysis helps data scientists to ask and answer questions like what happened, why it happened, what will happen, and what can be done with the results. (Definition by Amazon)

 



Ethical problems in data science

Before we start

Introduction

Probability Basics

Data Summaries
Link

 
Data Central Tendency
Standard Deviation / Variance
 
 Data Plots / Visualized Summarization
Link


Basic Plots
Boxplot
Histogram
Quantile plots
Heatmap / Mesh
 
Data Visualization
Link


Simpson’s Paradox
 

  
 Basic Tools
 
Measurements
Link

 
Distance Measure
Similarity/Corrolation
 
Statistical Analysis
Link


Z-test
t-test
U-test
Statistical Dependence
p-value
Confidence Interval
ANOVA table

   
Data Preprocessing
Link

Normalization
Data Sampling
Data Cleaning
  
Evaluations
Link

Regression Problems
Classification Problems
Clustering Problems
    

 

Regression
Link

Simple Linear Regression
Polynomial Regression
Curve Fitting
Logistic Regression
 

 

 Optimization Problems
 
Classic Optimization
Link

 
Brute Force
Greedy Algorithm
Dynamic Programming
 
Stochastic Algorithms
Link

GA
ES
DE
PSO
ACO

 

 

 

Learning Types

 

Statistical Modeling