Big Data and Hadoop – Part 2

Apache Hadoop is a fast-growing big data framework.
Advantages:
  • Problems with Traditional Large-Scale Systems
    • Processor-bound and lots of complex processing with bigger computers ( changed with distributed computers)
    • Programming complexity
    • Keeping data and processes in sync
    • Finite Bandwidth
    • Partial Failures
  • Distributed Systems: The Data Bottleneck
    • Traditionally, data is stored in a central location
    • Data is copied to processors ar runtime
    • Fine for limited amount of data
  • Types of Analysis with Hadoop:
    • Text Mining
    • Index Building
    • Graph Creation and Analysis
    • Pattern Recognition
    • Collaborative Filtering
    • Prediction Models
    • Sentiment Analysis
    • Risk Assessment
  • Nature of Analysis
    • Batch Processing
    • Parallel Execution
    • Distributed Data

Hadoop Users:
  • Black Berry – Growth of data, Analysis Ad hoc Queries took too much time
  • CBS Interactive – Cross site and Historical Analysis, Identify User Pattern
  • Nokia - 3G Digital Modeling, Usage pattern on Mobile Apps
  • Telemetry
  • OPower: SmartMeter Readings
  • Chevron: Acoustic Sampling from Oil Well
  • Major Video Game Company – Player Actions on Multiplayer online Game
  • Orbitz – Hotel Search Ranking, Targeted Recommendations, WepPage performance tracking
  • Ebay – Search results based on titles, descriptions, images, seller and buyer data

Comments

Popular posts from this blog

BDD - Acceptance Test Driven Development

Angular JS – Part 2

.Net Collections