Big Data
- 3V: high Volume, Velocity, Variety
- in real world: gov, industry
- modren: VLDB(Very Large DataBase)
- parallel processing 중요해짐
- ex. IPA: 10x faster, 6x less memory, 20k times accurate, easily parallelized
- Storage
- Centralized Storage
- Distributed Storage
- inexpensive
- slow
- → new programming to Minimize data transfer
- MapReduce: move operations, instead of data
- run operation on data nodes
- AI vs Big Data
- AI
- Big Data
- Table, Graph, Matrix, Tensor, Sequence, …
Data Type
- Table
- records(row, 행), attributes(column, 열)
- vector, matrix, tensor
- graph, network
- sequence, ordered data
- time-series
- Image and multimedia
- Spacial data
- ex. map, spatiotemporal(space and time) data
Data Model = Structure+Constraint+Operation
- Structures
- rows, columns
- nodes, edges
- key-value pairs
- sequence of bytes
- Constraints
- all rows have same number of columns
- all values in col have same type
- a child cannot have two parents(tree)
- Operations