Jake Worth

Jake Worth

Prefer Real Data For Software Development

Published: August 18, 2022 • Updated: February 08, 2024 2 min read

  • data

When testing with data, I prefer realistic data rather than random data. This might sound like something everyone agrees on, but it isn’t.

Here’s the pattern I’m challenging: when testing forms, data presentation, or even seeding data, a programmer (often in a rush) will enter gibberish or jargon instead of something like a real user would enter.

An example might be an address form, where we mash some combination of the home row letters into each field. In this case, I prefer to enter “123 Milwaukee Ave.” or similar into the first address field, “Chicago” in the city field, and so on.

Or in a CMS data field for a page header, a programmer might type “Jake Testing the Header”. In this case, I prefer to enter “Dashboard”, or whatever would be appropriate in the context.

Or when seeding a database, setting a customer’s email to “foo@bar.com”. Here, I’ll often prefer to generate a series of realistic, unique emails like “cyrus-1999@example.com”.

Why Does This Matter?

Why does this matter? Yes, it’s a subtle preference. I’ll make three arguments in its favor:

  • Realistic data stresses the software in realistic ways
  • It’s easier to work with
  • It presents a polished environment

Realistic Stress

Real data stresses the software realistically. Perhaps your UI doesn’t display gibberish well: you entered a three-character email address into the form, there were no validations to stop you, and now the page looks broken.

Is the presentational aspect a problem that’s worth solving? Probably not. No customer is going type ‘abc’ into a form for their email, and we shouldn’t allow it. So the presentation of ‘abc’ is a situation the customer should never experience. But by entering that data, we’re considering it.

Unrealistic data brings havoc to a test suite, too. When all your fixture users are named “Walter White”, you end up writing assertions on an index page like expect(findByText("Walter White")).toHaveLength(6). Inexplicable magic number aside, this is a great vector for bugs to sneak into the UI.

When you enter realistic data, you feel the customer’s pain. Maybe it’s easy to use the default state in your address form, but when you try to change it, you realize it isn’t accessible, and changing it triggers a bunch of aggressive inline validations. Almost every real customer will experience this. But you won’t, unless you’re using realistic data.

Easier to Work With

Real data is more pleasant to work with. I prefer to look at a website, even in development, with data that looks real. It helps me understand the experience and empathize with my users. Development environments full of gibberish make it difficult to tell what’s broken, and what was hasty data entry.

Polished

Lastly, real data feels polished. I share a lot of screenshots with my team. I prefer that those images, and those of my teammates, look real. You never know where a screenshot from your development environment might end up. I’ve participated in more than a few demos where a key stakeholder has derailed the presentation with a question like: “Hang on; a customer created a recipe called ‘Everything Broken Hotfix Please Work!!!‘? We need to fix that ASAP.”

Wrapping Up

Yes, it takes takes more time to think of something realistic to type. I think that it’s almost always worth it. It becomes a habit. Once you’ve spent an hour on a bug report with a title like “The homepage is broken” and find out the root cause was nonsensical CMS data entry, you might agree.

What are your thoughts on realistic test data? Let me know!


Join 100+ engineers who subscribe for advice, commentary, and technical deep-dives into the world of software.