Thinking and Talking about Data

When creating software products, it's common to think about the rows of data required in an application. Frequently developers, product managers, and other stakeholders will discuss what "columns", "rows", and "tables" their application requires directly.

There are multiple problems with thinking about software as "data" in "rows". Language is the most powerful abstraction available, and being careless about how we communicate can have big impacts on our application, team, and company.

Creating software, designing data, creating information, with abstract language. Software development team. Project planning.

Writing Software

A GenAI’s attempt at “a group of people and a whiteboard designing software”

Data vs Information

First, let's address thinking and talking directly about data. To begin, data are raw individual facts. A row in a database is data; perhaps one contains a user ID and a transaction ID — a simple join table. While this data can be a nice thing, one can see a problem with discussing only the data — it has no meaning on it's own.

When data is provided context and interpretation, it becomes "information", which is far more powerful. By joining our simple table to Users and Transactions, we can find not only the other details about this transaction, but also the user who initiated the transaction. Suddenly, these two small data points become more informative. With the additional context that "This user had a problem with this transaction", we may provide additional context such as their payment details, line items in the order, or shipping information. All of these data points are relatively meaningless on their own, but become rich with detail when contextualized into information.

Raw Data

A GenAI’s attempt at representing “abstract representation of raw unorganized data”

Data is raw, meaningless, and valueless without context. Information is rich and powerful. Data imbued with context becomes valuable information.

Knowing now that information is what brings us value, our goal should be to talk about the information our application and users require.

Talking About Information

How can we talk about our application if not by talking about its data? The practices of Domain Driven Design specifically encourage creating a common vocabulary for each application's business domain. This "ubiquitous language" should be used not only when talking to domain experts about requirements, but also in the software itself to guide implementation. The language is refined through conversations with domain experts, carefully defining the nuances of each Noun (class, typically), Verb (method, service, etc.), and Adjective (properties). The ubiquitous language becomes a model of a domain and its business value. Where this vocabulary requires more detail, the software will also require detail. If externalities can be abstracted in the language, then so too can they be abstracted in the software.

Eric Evan’s classic “Domain-Driven Design” available on Amazon

Domain Driven Design is an all encompassing development philosophy with many confusing technical details. While I encourage you to learn more about DDD overall, I encourage you to immediately begin creating a "ubiquitous language" to talk about your application domain. When discussing requirements, avoid technical details and prefer using normal business domain language. This is a skill that takes practice individually and as a team, and can require a lot of soft skills. Requests frequently come as implementation directives, e.g. "Add column X to table Y" and we must learn to back up and discover "Users need to be able to do P when Q", which can be a difficult conversation. Clarifying questions about What, Why, When etc. can all become more common, trying to refine how new feature requests fit into the ubiquitous language.

Sometimes the technical solution is obvious, but we can still benefit from focusing on the business requirements with our language. By choosing to make the business language our priority, we may discover a single word in our requirements that fundamentally changes our implementation. When we talk about requirements as "Add column X to table Y", we lose any chance to discover mis-alignment between the business domain and our software. If we instead talk to our stakeholders and discover "Users need to see all Xs for a given Y", we can more easily see that the X might need to live in another table, or be a list, or make some other modification so our software aligns with the word "all".

The ubiquitous language itself is deeply full of information. In addition, this language is the context that makes all of your raw data have any value. While you can get very far without treating this language as a priority, your language is your software's architectural blueprint whether you pay attention to it or not.

Information in "Rows"

When we talk about information in "rows", we are limiting ourselves to a very narrow view of our application's data. It removes the context that makes our data "information", implies a relational database, and fundamentally constrains our design choices.

By thinking about our application's information as interconnected entities rather than isolated rows, we can better understand how different parts of our system relate to one another. This allows us to create more robust and flexible software that can adapt to changing business requirements over time; this is the goal of domain driven design and its ubiquitous language.

Thinking about information in "rows" also implies the use of a relational database. While relational databases are powerful and widely used, they are not always the best choice for every application. By focusing on rows of data, we may miss opportunities to use other types of databases or data structures that better fit our domain. Some applications are best served by a document database like MongoDB, a search tool like OpenSearch, or a key-value store like DynamoDB.

Problem Space vs Solution Space

The Double Diamond design process proposed by the British Design Council explicitly breaks apart the design process into phases for exploring first the problem space followed by the solution space.

Double Diamond Design, graphic by me

The concepts of "data" and "rows" exist in solution space; the set of all possible solutions to a given problem within the context of a particular problem space. Talking about the low level concepts like data and rows we keeps us firmly planted in a narrow subset of our solution space. In the example above, implying a relational database model can be highly limiting in our solution's capabilities. If we only address the solution space, it can be difficult to fully define the actual problem; as we all know, bad requirements lead to mistakes, miscommunications, and rewriting software.

Thinking about the higher level concepts of our application helps us stay in the problem space as we understand what we're trying to implement. A problem space refers to the set of all possible problems that could be solved by a given system or application. This includes not only the specific business requirements that must be met, but also any technical constraints or limitations that may impact how those requirements can be implemented.

Make sure that when you or your team is exploring the problem space, you explicitly stay in the problem space. Leave the solution space for implementation time.

Language Abstraction

Though it may not be obvious, language is the most powerful abstraction available to us when creating software products. By being mindful of how we communicate we can create a common vocabulary for our business domain that becomes a model of its value. This ubiquitous language helps us stay in the problem space as we understand the needs of our stakeholders. By remaining in the problem space, we can not only hone our understanding of our business but also free our solution space of unnecessary restrictions. When we can communicate clearly about our actual problems, everyone can be sure we're solving the right problem the right way.

Previous
Previous

The Statistical Significance Purgatory: Why A/B Testing Can Leave You Stuck (and How RL Offers Escape)

Next
Next

Data Engineering is a Critical Skill