Diving into the realm of data science can feel like navigating a labyrinth of hype and reality. So in this article, I’ll help to demystify this scientific discipline by offering a concise data science definition and looking into its origins. Also, I’ll clarify what it is and what it is not. Lastly, I’ll shed light on the scientific method that data scientists use to build tools that can transform raw data into valuable insights.
Indeed, it’s easy to get tangled in the web of buzz and misconceptions surrounding data science. However, make no mistake; this field is on a meteoric path! Job growth for data scientists is expected to rise by an astounding 36% between 2021 to 2031, paving the way for 13,500 new positions annually. Moreover, the data science software tools and platforms market is worth hundreds of billions of dollars and is expanding quickly. Further, we have only created 90% of the massive amounts of data we have now in the past couple of years. As a result, the field of data science will continue to grow at an unprecedented rate.
“Data are becoming the new raw material of business.”
Craig Mundle
1. A Data Science Definition Grounded In Science, Not Hype.
I’m sure you are familiar with the terms “data science”, “big data”, “data analytics” or “data scientist” but do you know what they actually mean? Don’t worry, you’re not alone! The term data science is quite new and in use only in this century. Sadly, a few marketers have used it to promote their products, leading to different meanings for data-related words. To get through this confusion, we need a clear definition of data science. The following is what I consider to be the most accurate definition available.
Data Science Definition
“Data science is the discipline of making data useful.”
Cassie Kozyrkov, Chief Decision Scientist, Google
I like this short definition because it focuses on the essence of what data science is. Specifically, it highlights that data science is a scientific discipline. Thus, it also uses the scientific method. Second, it is a discipline with its own domain knowledge, methodologies, and tool sets. Lastly, this definition explicitly states that this science is focused on data and the problem of how to make data useful. For more detail, see 3 discussion points below that explains and breakdowns the merits of this definition of data science.
a. Data Science Is A Science Because Data Scientists Use The Scientific Method.
First and foremost, data science is a science because data scientists use the scientific method as part of this discipline. On the other hand, the title, “Data Scientist”, is often misused. This because there are many businesses and organizations that will bestow members of their staff the title of Data Scientist, but they do not really do Data scientist work. Specifically, a “true” scientist is one who uses the scientific method to conduct basic and applied research. A good definition of data scientist is as follows:
“Uses scientific methods (data science) to liberate and create meaning from raw data.”
Data Science Association’s “Professional Code of Conduct”
For more discussion on the scientific method and data scientists, see my article, Scientific Method Example: Data Scientists Use An Unique Way To Achieve The Best Results.
b. It is a Discipline Because It Has Its Own Domain Knowledge and Tool Sets.
A data scientist is someone with expertise in data analytics, statistics, and artificial intelligence (AI). Additionally as part of the scientific method, data scientists will use AI tools such as machine learning (ML) and data science methodologies to solve problems and advance the science within their domain. Lastly, the end goal of this process is to make data useful.
c. Putting It All Together – A Detailed Data Science Definition.
To put it all together, below is a more detailed definition of data science.
“Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a broad range of application domains. Data science is related to data mining, machine learning, big data, computational statistics and analytics.”
Wikipedia
For more discussions on the definition of data science, see Cassie Kozyrkov’s What On Earth Is Data Science?. Also, see SC Tech Insights’ Data Analytics vs Data Science – Know the Most Important Differences for more details on the differences between data science and other data disciplines. Lastly, see the graphic below, for an entertaining decision tree on what Data Science and what it is not.
The Data Science Adventure
2. To Better Understand Data Science Definition, Know Its Short, Explosive History.
Data science has come a long way since its humble beginnings in the 1960s. In particular as computers began to rapidly generate and store data, a whole new field of science emerged from the academic discipline of statistics. Next, fast forward to the 21st century and data science is now an integral part of the scientific world. To list, see timeline below.
- 1962 – The Science Of Data Proposed. John Tukey proposed a reformation of academic statistics. In “The Future of Data Analysis,” he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from data, or “data analysis.”
- Later Half of the 20th Century. Subsequently, academia continued to debate the merits of expanding the boundaries of theoretical statistics or to create a new discipline with emphasis on data preparation and presentation. Also, debate focused on there being more emphasis on prediction rather than inference in statistics. Consequently during this time, the term data science began to be used within the academic community.
- 21st Century To Present. The term “data science” is now commonly used and the body of knowledge expands. For instance, the Data Science Journal debuted in 2002 and the scientific discipline continues to mature. In 2017 David Donoho, a professor of statistics at Stanford classified this growing field into 6 divisions of data science known as Greater Data Science (GDS). See next section for more details on GDS. Also further in this article, there is a description of the steps that a data scientist goes through using the scientific method.
Also, for a more detailed description of the origins and growth of data science see David Donoho’s article 50 Years of Data Science.
3. Greater Data Science Definition – The Six Divisions Of Data Science Described.
Data science, its domain knowledge, and its software tool sets are rapidly growing. Also, the amount of data is growing exponentially. Further, data scientists apply data science across multiple disciplines, organizations, and industries. To better understand what Data Science is David Donoho, a professor of statistics at Stanford, describes and classifies the various activities of data science in detail in his book, 50 years of Data Science. Specifically, he describes GDS (Greater Data Science), the science of learning from data, as divided into six divisions. These six divisions of data science include:
The Six Divisions of Data Science
- Exploring and Preparing Data
- Representing and Transforming Data
- Computing with Data, Modeling Data
- Modeling Data
- Visualizing
- Presenting Data, and Science about Data Science
For a more detail discussion, see SC Tech Insights’ article, What Is Data Science? – Its Fundamentals Described In 6 Parts for details.
4. The Scientific Process of Data Science.
Recently, data scientists have begun to document the unique scientific method for data science. As with any scientific process, the process starts with a statement of the problem. Moreover, this problem is usually a business problem where the business stakeholders have tasked data scientists to research and come up with a possible solution.
Also, there are many variations of the scientific method that data scientists use to make data useful and develop applications for business. For example, AWS uses a data science process called OSEMN – 1) Obtain Data; 2) Scrub Data; 3) Explore Data; 4) Model Data; 5) Interpret Results. For more detailed explanation of data scientific methods, see SC Tech Insights’ The Scientific Method – What Does A Data Scientist Do Every Day?.
“We’re entering a new world in which data may be more important than software.”
Tim O’Reilly
For more information from SC Tech Insights, see the latest articles on Data Analytics.
Greetings! As an independent supply chain tech expert with 30+ years of hands-on experience, I take great pleasure in providing actionable insights and solutions to logistics leaders. My focus is to drive transformation within the logistics industry by leveraging emerging LogTech, applying data-centric solutions, and increasing interoperability within supply chains. I have a wide range of experience to include successfully leading the development of 100s of innovative software solutions across supply chains and delivering business intelligence (BI) solutions to 1,000s of shippers. Click here for more info.