Knowledge Discovery and Data Mining Process

- Recap) Data Type and Representation

Exploratory Data Analysis
analyzing data sets to summarize their characteristics,
using statistical graphics and other data visualization methods
- Univariate analysis: histogram, boxplot
- Multivariate analysis: scatter plot, correlation analysis, heatmap
- Dimensional reduction: PCA(Principal Component Analysis), tSNE(t-distributed Stochastic Neighbor Embedding)
Measuring
- central tendency
- Dispersion
- Quartiles: Q1(25%) Q2(50%) Q3(75%) Q4(100%)
- inter-quartile range IQR = Q3-Q1
- Five-number summary: min, Q1, median, Q3, max
- variance $\sigma ^2$
- standard deviation $\sigma$
Symmetric, Skewed Data
- Symmetric: Mean=Median=Mode
- Posi-Skewed: Mode<Median<Mean

Measures
- Distributive Measure:
compute by partitioning into smaller subset
ex. sum and count
- Algebraic Measure:
compute by algebraic function INTO Distribute measures
ex. mean = sum/count