Big Data and Hadoop – Part 2

Big Data and Hadoop – Part 2

February 04, 2014

Apache Hadoop is a fast-growing big data framework.

Advantages:

Problems with Traditional Large-Scale Systems
- Processor-bound and lots of complex processing with bigger computers ( changed with distributed computers)
- Programming complexity
- Keeping data and processes in sync
- Finite Bandwidth
- Partial Failures
Distributed Systems: The Data Bottleneck
- Traditionally, data is stored in a central location
- Data is copied to processors ar runtime
- Fine for limited amount of data
Types of Analysis with Hadoop:
- Text Mining
- Index Building
- Graph Creation and Analysis
- Pattern Recognition
- Collaborative Filtering
- Prediction Models
- Sentiment Analysis
- Risk Assessment
Nature of Analysis
- Batch Processing
- Parallel Execution
- Distributed Data

Hadoop Users:

Black Berry – Growth of data, Analysis Ad hoc Queries took too much time
CBS Interactive – Cross site and Historical Analysis, Identify User Pattern
Nokia - 3G Digital Modeling, Usage pattern on Mobile Apps
Telemetry
OPower: SmartMeter Readings
Chevron: Acoustic Sampling from Oil Well
Major Video Game Company – Player Actions on Multiplayer online Game
Orbitz – Hotel Search Ranking, Targeted Recommendations, WepPage performance tracking
Ebay – Search results based on titles, descriptions, images, seller and buyer data

Comments