Let’s move onto the definition of Big Data and what we can do if we want to start with size. Some people care about size. It’s important to realize that there’s no “set in stone” definition of how much data is needed in order for it to be considered big data.
That said, if you take a look at a lot of the technologies out there and the workloads that they’re really designed to deal with and you take a look at the technologies that are not classified as big data, and the workloads they’re designed to work with.
Then you start to see that getting into the range of hundreds of terabytes that’s sort of defines the threshold wherein you really start to require big data technology and that the more conventional technologies don’t work quite as well.
It makes sense really at first to understand some of the scenarios where Big Data is produced.
Big data has been discussed in mainstream media almost on the daily basis. It certainly stands to reason that it must exist, the question is, where is it coming from?
Now, certainly, web and internet scenarios including the analysis of web logs and clickstream data, that’s a canonical example that actually, most people are able to identify pretty quickly. Likewise,
- Sentiment Analysis (Social Media)
- Buying patterns
- Fraud Detection; forensic analyses
- Machine learning based investment strategies and iteration of same
- Healthcare research (hospital can take advantage of big data analyses to determine how best to distribute services, help in research lab to understand genomic information)
- Supply chain scenarios produce tons of data given the prevalence of RFID tags and the number of scanners in different supply chain facilities that scan those RFID tags and the articles that they are attached to.
- Cell towers produce all kinds of data both about the calls that they connect and complete as well as just about the devices that pass near the towers and how long and what the signal strength is and what platforms and nominal numbers and brands names those devices are associated with.
- Familiar names like Twitter, Facebook, LinkedIn etc.
All of these things are somewhat modern but we can go back even further. We can look at technology around supermarket check-out scanning. UPC codes which actually date back to the 1970s, they can produce Big Data too. And that underlies a really important point, which is that Big Data isn’t really new. We’ve always had it. What we haven’t done is keep it and analyze it. And what’s changing now is that we are keeping that data and doing analysis on it.
So the question is what has given rise to that?