Diving into the realm of data science can feel like navigating a labyrinth of hype and reality. So in this article, I’ll help to demystify this scientific discipline by offering a concise data science definition, delving into its origins, clarifying what it is and what it is not, and shedding light on the scientific method to transform raw data into valuable insights.
Indeed, it’s easy to get tangled in the web of buzz and misconceptions surrounding data science. However, make no mistake; this field is on a meteoric path! Job growth for data scientists is expected to rise by an astounding 36% between 2021 to 2031, paving the way for 13,500 new positions annually. Moreover, the data science software tools and platforms market is worth hundreds of billions of dollars and is expanding quickly. And brace yourselves – we have only created 90% of the massive amounts of data we have now in the past couple of years. As a result, the field of data science is set to grow at an unprecedented rate.
“Data are becoming the new raw material of business.”Craig Mundle
A Data Science Definition Grounded In Science, Not Hype.
Are you aware of the terms “data science”, “big data”, “data analytics” or “data scientist” but not sure what they mean? Don’t worry, you’re not alone! The term data science is quite new and in use only in this century. Sadly, a few marketers have used it to promote their products, leading to different meanings for data-related words. To get through this confusion, we need a clear definition of data science. I think the following definition is the most accurate one I have found.
“Data science is the discipline of making data useful.”Cassie Kozyrkov, Chief Decision Scientist, Google
I like this short definition because it focuses on the essence of what data science is. Specifically, it highlights that data science is a science using the scientific method. Second, it is a discipline with its own domain knowledge, methodologies, and tool sets. Lastly, this definition explicitly states that this science is focused on data and the problem of how to make data useful. For more detail, see 3 discussion points below that explains and breakdowns the merits of this definition of data science.
1. Data Science Is A Science Because Data Scientists Use The Scientific Method.
This short definition of data science may not fully explain what it is, but it defines it well. First and foremost, it is a science because there are actual data scientists working in this discipline Thus, data science must be a science. However, “Data Scientist” is another overused term where many people have the title of Data Scientist, but they are not data scientists. Specifically, a “true” scientist is one who uses the scientific method to conduct basic and applied research. So a definition of data scientist is as follows:
Data Scientist – “Uses scientific methods (data science) to liberate and create meaning from raw data.”Data Science Association’s “Professional Code of Conduct”
2. It is a Discipline Because It Has Its Own Domain Knowledge and Tool Sets.
A data scientist is someone with expertise in data analytics, statistics, and artificial intelligence AI. Additionally as part of the scientific method, data scientists will use AI tools such as machine learning and data science methodologies to solve problems and advance the science. Lastly, the end goal of this process is to make data useful.
3. Putting It All Together – A Detailed Data Science Definition.
To put it all together, below is a more detailed definition of data science.
“Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a broad range of application domains. Data science is related to data mining, machine learning, big data, computational statistics and analytics.”Wikipedia
For a more detailed discussion on the definition of data science, see Cassie Kozyrkov’s What On Earth Is Data Science?. Also, see SC Tech Insights’ Data Analytics vs Data Science – Know the Most Important Differences for more details on the differences between data science and other data disciplines.
To Better Understand Data Science Definition Know Its Short, Explosive History.
Data science has come a long way since its humble beginnings in the 1960s. In particular as computers began to rapidly generate and store data, a whole new field of science emerged from the academic discipline of statistics. Next, fast forward to the 21st century and data science is now an integral part of the scientific world. To list, see timeline below.
- 1962 – The Science Of Data Proposed. John Tukey proposed a reformation of academic statistics. In “The Future of Data Analysis,” he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from data, or “data analysis.”
- Later Half Of 20th Century. Academia continued to debate the merits of expanding the boundaries of theoretical statistics or to create a new discipline with emphasis on data preparation and presentation. Also, debate focused on there being more emphasis on prediction rather than inference in statistics. During this time, the term data science began to be used within the academic community.
- 21st Century To Present. The term “data science” is now commonly used and the body of knowledge expands. For instance, the Data Science Journal debuted in 2002 and the scientific discipline continues to mature. In 2017 David Donoho, a professor of statistics at Stanford classified this growing field into 6 divisions of data science known as Greater Data Science (GDS). See next section for more details on GDS. Also further in this article, there is a description of the steps that a data scientist goes through using the scientific method.
For a more detailed description of the origins and growth of data science see Journal of Computational and Graphical Statistics’ 50 Years of Data Science.
Greater Data Science Definition – The 6 Divisions Of Data Science Described
Data science, its domain knowledge, and its software tool sets are rapidly growing as well as the amount of data in general. Additionally, data scientists apply data science across multiple disciplines, organizations, and industries. David Donoho, a professor of statistics at Stanford, describes and classifies the various activities of data science in 50 years of Data Science. Specifically, he describes GDS (Greater Data Science), the science of learning from data, as divided into six divisions. These six divisions of data science include:
- Exploring and Preparing Data
- Representing and Transforming Data
- Computing with Data, Modeling Data
- Modeling Data
- Presenting Data, and Science about Data Science
See SC Tech Insights’ article, What Is Data Science? – Its Fundamentals Described In 6 Parts for details.
What Is Data Science? Its Focus, Applications, 6 Components Explained. Data science is quickly becoming one of the most important fields of study in the business world. It is an interdisciplinary field that combines mathematics, computer science, and statistics to extract insights from large datasets. Data science helps businesses make better decisions, gain a competitive edge, and discover innovative solutions. Click here for more on data science, its applications, and the six components that answer the question of what is data science.
The Scientific Process of Data Science.
Recently, data scientists have begun to document the unique scientific method for data science. As with any scientific process, the process starts with a statement of the problem. Moreover, this problem is usually a business problem where the business stakeholders have tasked the data science to research and come up with a possible solution.
Also, there are many variations of a scientific method that data scientist use to make data useful and develop applications for business. For example, AWS uses a data science process called OSEMN – 1) Obtain Data; 2) Scrub Data; 3) Explore Data; 4) Model Data; 5) Interpret Results. For more detailed explanation of data scientific methods, see SC Tech Insights’ The Scientific Method – What Does A Data Scientist Do Every Day?.
The data science profession is exploding, but what does a data scientist do? Basically, a data scientist makes data useful for businesses and organizations using the scientific method. Also as many of you know, the data scientist profession is exploding where there are over 100 thousand jobs today with an expected increase of 36% in job openings in the next 10 years. To detail, click here for an explanation of why businesses and organizations need data scientists, skills needed, education backgrounds, and typical tasks they do everyday within the data science lifecycle.
“We’re entering a new world in which data may be more important than software.”Tim O’Reilly
For more information from SC Tech Insights on AI, Data Analytics, & Robotics, click here.
Greetings! As an independent supply chain tech expert with 30+ years of hands-on experience, I take great pleasure in providing actionable insights to logistics leaders. My background includes implementing 100s of innovative solutions using emerging technologies and a data-centric development approach. I have also provided business intelligence (BI) solutions for 1,000s of shippers. For more about me, click here.