What is Big Data?
- Yazmin T. Montana
- Sep 13, 2021
- 2 min read
Updated: Jun 8, 2022
Big Data is a collection of information that is so large it does not fit into traditional databases and cannot be processed by a single computer.Data comes from a variety of sources, for example traditional databases, server log files, social media engagement, website visits and page views, email copy, or virtually anything that can be uploaded to a computer.
In order to find valuable insights from a vast amount of unorganized data, we must process it in a way that traditional databases can't.
Big Data is about finding useful insights that are hidden in undetected patterns and relationships. It is only until very recently that it has become affordable and practical to analyze very large volumes of data, which could not be done in the past due to the size of the computer power required to do it.
A key particularity of Big Data is that the tools and methods used to explore it, typically don't work well on small sized data bases. You can always gain new insights if you analyze 10,000 parameters instead of 10, and if you do it every second instead of once a week.
More data with simple algorithms is better than great algorithms with fewer data.
When a limited set of samples have to be used to reduce computational load, many useful patterns and correlations usually become difficult to find. However, larger volumes of data enable us to accept reduced quality in the structure because size compensates for accuracy.
Often, understanding the general pattern of our data and being able to reveal a trend, is more important in real- life cases than having precise information on the exact details.
How are the insights in Big Data found?
Scientists, programmers, engineers and analysts use statistical analysis and (or) data mining algorithms that detect correlations.
Correlation is the statistical relationship between different data values: If two data values have a strong relationship, one value is likely to change when the other does.
With Big Data, you will know what insights was found but not why they occurred. Nevertheless, sufficient data volumes can help make conclusions even if no explanation is available.
When speaking of Big data, we often discuss the 3 V's:
Volume (Data size)
Velocity (Speed of data flow)
Variety (How unstructured the information is)
In the same way, we have algorithms that are often discussed:
5 advanced Analytics algorithms for Big Data
Linear regression (Can be performed on any statistical software, including Excel)
Logistic regression (Requieres a more sophisticated statistical software, such a Minitab)
Classification trees (It is preferably to develop an algorithm for the specific usage of the data)
K- Nearest neighbors
K-Means clustering

