10.1 Introduction to Data Science
Introduction to Data Science
What is Data Science?
- Almost everything that has something to do with data
- Data Science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies
A little history of Data Science and how the word was coined
In 1980s, IBM came up with the first relational database storing data like customer details, payroll data.
Soon after, we started thinking of what useful things we can mine from this data and the term data mining was coined.
In a paper called From Data mining to Knowledge Discovery in databases , Data Mining is the application of specific algorithms for extracting patterns from data.
In the 1990s, when computer science was developing with incredible speeds, people started thinking about combining data mining with computer science and thus the word data science was born.
With the boom in digital world, huge amounts of data are being collected – every click, every scroll, every visit, all of this is collected and stored. Why? Business Intelligence. Using data science to pick patterns in the data and predict when a customer would buy, if someone will buy a product
Data Science Workspace - Building Intelligent Services on Adobe Experience Platform
With Data Science Workspace, Adobe Experience Platform allows you to streamline and accelerate data-to-insights with...
- Integrated access to data stored in Platform
- The computing power essential for machine learning/AI & managing big datasets
- Prebuilt ML recipes to accelerate into AI-driven experiences
- Simplified authoring, reuse, and modification of recipes for data scientists of varied skill levels
- Intelligent service publishing and sharing in just a few clicks - and monitoring & retraining for continuous optimization of personalized experiences
Data scientists of all skill levels will achieve insights faster and create more effective digital experiences sooner.
Data Science Workspace Terminology
- Algorithm : Standard Data Science techniques such as classification, regression, k-means clustering, etc.
- Feature : An individual measurable property or characteristics of a phenomenon being observed.
- Feature Engineering : Process of converting raw data into usable form for analysis using domain knowledge.
- Recipe : A proprietary algorithm or an ensemble of algorithms to help solve specific business problems.
- Instance : An occurrence of the recipe configured with the right data definition to help solve specific business problems – one recipe can create many instances.
- Trained Model : An instance of the recipe that is trained using historical data to learn from. The trained model finds patterns in the training data to predict the target.
- Service : Service that is created from a trained model to be used in building experiments.
- Jupyter Notebook : An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
- Hyper-parameters : High level properties of a model, different from standard model parameters that are usually fixed. Examples: depth of decision tree, number of hidden layers, learning rate, etc.
Let's continue with a hands-on exercise.
Next Step: 10.2 Churn Prediction: Data Preparation