What is the difference between Data Science, Big Data, Data Mining and Machine Learning?

On an average, about 1.7 megabytes of new information will be created every second for every human being on the planet.According to IBM, 2.5 billion gigabytes (GB) of data was generated every day in 2012.Facebook users send on average 31.25 million messages and view 2.77 million videos every minute.Our accumulated digital universe of data will grow from 4.4 zettabytes today to around 44 zettabytes, or 44 trillion gigabytes.
It is not possible to handle such a large amount of data using a traditional method like SQL or RDBMS, if we use these methods it may take a large amount of time to processed data.
Also data not only of a single type (Structured) but also in Unstructured form. Let’s understand this two term.

Structured Data:- Structured data is data that can be easily organized.It is clean, analytical and usually stored in databases.Some types of structured data can be machines generated, such as data that comes from medical devices (heart rate, blood pressure), manufacturing sensors (rotation per minute, temperature), or web server logs (number of times a page is visited). Structured data can also be human generated- data such as age, zip code, and gender.

Unstructured Data :-Unstructured data refers to information that either does not have a pre-defined data model and/or is not organized in a predefined manner.This types of data came from Social Media.Even Email body also contain unstructured data.Some example is Word Processing Files, PDF files, Spreadsheets, Digital Images, Video, Audio and Social Media Posts etc.
To deal with Unstructured data Big data technology Introduced.
Big Data refers to humongous volumes of data that cannot be effectively processed with traditional applications.
Big data is often characterized by 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which the data must be processed.
The processing of Big Data begins with the raw data that isn’t aggregated or organized and is most often impossible to store in the memory of a single computer.
Hadoop is one of the frameworks for big data.Hadoop is an open source, a Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. HBase, Hive, Pig have mostly used software packages in Hadoop.

Data Science is another buzzword for a large amount of data set.Also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization.Data Science is a field that encompasses anything related to data cleansing, preparation, and analysis. Data Science is an umbrella term for techniques used when trying to extract insights and information from data.
Big Data is Process & Store Data, whereas Data Science is Analyse and Generate Insights.

Machine Learning

Traditional Programming: Data and program are run on the computer to produce the output.

Machine Learning: Data and output are run on the computer to create a program. This program can be used in traditional programming.

Machine learning algorithm mainly classified into four types-

Supervised learning: (also called inductive learning) Training data includes desired outputs. This is spam this is not, learning is supervised.

Unsupervised learning: Training data does not include desired outputs. An example is clustering. It is hard to tell what is good learning and what is not

.
Semi-supervised learning: Training data includes a few desired outputs.

Reinforcement learning: Rewards from a sequence of actions. AI types like it, it is the most ambitious type of learning.

Venn Diagram to understand the relation between above-discussed term. Image result for relation between machine learning and artificial intelligence

Image result for relation between machine learning and artificial intelligence

Difference between Data Science and Data Mining

https://qph.ec.quoracdn.net/main-qimg-9acf2c9c0fe3f064fce1171489ba24e6-c

Data science means research, development of new theories and discoveries while data mining means using existing technologies and techniques to discover useful patterns in data. They both compliment each other to accomplish a common task. While doing data mining some times task on hand is not accomplished able or cannot be done efficiently or lacks in discovering new patterns, that’s where data scientist comes in with new research and techniques.

Images Source: Google

Search This Blog

GeekyMe

What is the difference between Data Science, Big Data, Data Mining and Machine Learning?

Comments

Post a Comment

Popular posts from this blog

What is the difference between spooling, buffering and caching in OS?

About Me