What is Data?
January 22, 2024
The application of methods from statistics, computer science, and related fields to produce information and knowledge from data in order to solve domain specific problems and make decisions.
Because of the intersection of the methods fields (statistics, CS, etc) and the domain application fields (biology, finance, insurance, sports, etc), data science is necessarily an interdisciplinary field.
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from potentially noisy, structured, or unstructured data.
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
While there is not yet a consensus on what precisely constitutes data science, three professional communities, all within computer science and/or statistics, are emerging as foundational to data science: (i) Database Management enables transformation, conglomeration, and organization of data resources, (ii) Statistics and Machine Learning convert data into knowledge, and (iii) Distributed and Parallel Systems provide the computational infrastructure to carry out data analysis.
Good question!
Data is anything that can be observed, stored (most often digitally as numbers or characters), and recalled.
Here, “anything” could be, but is not limited to: measurements, artifacts, and proxies for the state of nature.
information, especially facts or numbers, collected to be examined and considered and used to help decision-making, or information in an electronic form that can be stored and used by a computer
In common usage data is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally.
Data refers to raw, unprocessed facts and figures without context. It is the foundation for all subsequent layers but holds limited value in isolation.
Information is organized, structured, and contextualized data. Information is useful for answering basic questions like “who,” “what,” “where,” and “when.”
Knowledge is the result of analyzing and interpreting information to uncover patterns, trends, and relationships. It provides an understanding of “how” and “why” certain phenomena occur.
Wisdom is the ability to make well-informed decisions and take effective action based on understanding of the underlying knowledge.
Really, anything downstream of data itself that is useful in some way.
Good data products will help an end-user solve a problem or make a decision.
What fields are not using data?
How do data scientists apply statistical and computational methods?