Big Data and Hadoop – Part 2
Apache Hadoop is a fast-growing big data framework.
Advantages: - Problems with Traditional Large-Scale Systems
- Processor-bound and lots of complex processing with bigger computers ( changed with distributed computers)
- Programming complexity
- Keeping data and processes in sync
- Finite Bandwidth
- Partial Failures
- Distributed Systems: The Data Bottleneck
- Traditionally, data is stored in a central location
- Data is copied to processors ar runtime
- Fine for limited amount of data
- Types of Analysis with Hadoop:
- Text Mining
- Index Building
- Graph Creation and Analysis
- Pattern Recognition
- Collaborative Filtering
- Prediction Models
- Sentiment Analysis
- Risk Assessment
- Nature of Analysis
- Batch Processing
- Parallel Execution
- Distributed Data
Hadoop Users:
- Black Berry – Growth of data, Analysis Ad hoc Queries took too much time
- CBS Interactive – Cross site and Historical Analysis, Identify User Pattern
- Nokia - 3G Digital Modeling, Usage pattern on Mobile Apps
- Telemetry
- OPower: SmartMeter Readings
- Chevron: Acoustic Sampling from Oil Well
- Major Video Game Company – Player Actions on Multiplayer online Game
- Orbitz – Hotel Search Ranking, Targeted Recommendations, WepPage performance tracking
- Ebay – Search results based on titles, descriptions, images, seller and buyer data
Comments
Post a Comment