A simple and quick way to generate fake data for your Python tests.

When you are testing your Python apps, you will find in the need for generating some dummy data that is as close as possible to real user input and also not have to worry about it containing personally identifying information (PII) for security reasons (This may seem trivial but us developers tend to suck at generating data, and we end using our own data, some movies or games that we like or data from somebody’s on the team or a real customer). Also, something essential is to generate data that is not repeated on every test run to add more reliability to those tests against different inputs, that’s what your users will do at the end. One possible solution for this in the case we are using Django can be using fixtures or factories by using FactoryBoy, but this means a lot of unnecessary extra work and the generated is static. However, there is a package named faker https://github.com/joke2k/faker that can be used to create fake data very quickly for your tests or just to populate a database for a demo or whatever your need is. In this post, i will show the basics of such package and how handy it can be to have it in your toolset.


Setting Up

In this post, i will use Python3.6 (because there is no reason to use two anymore 😊) and pipenv for creating a virtualenv and installing packages, I wrote about it in a previous post. Open a terminal window and follow these steps:

Getting hands-on code

Let’s consider a case where we have a Customer object with some attributes and methods, and we wanna write tests for those methods:

class Customer:

    def __init__(self, name, address, state, zipcode, contact_name,
                 phone_number, email, notes):
        self.name = name
        self.address = address
        self.state = state
        self.zipcode = zipcode
        self.contact_name = contact_name
        self.phone_number = phone_number
        self.email = email
        self.notes = notes

    def full_address(self):
        return f'{self.address} {self.state} {self.zipcode}'

    def contact(self):
        return f'{self.contact_name} <self.email>'

Now lets create a simple and trivial testcase illustrating how to generate fake attributes for our Customer object:

import unittest

from faker import Faker

from customer import Customer

class TestCustomer(unittest.TestCase):

    def setUp(self):
        fake = Faker()
        self.customer = Customer(
                name = fake.company(),
                address = fake.street_address(),
                state = fake.state(),
                zipcode = fake.zipcode(),
                contact_name = fake.name_female(),
                phone_number = fake.phone_number(),
                email = fake.company_email(),
                notes = fake.text(max_nb_chars=20)

    def test_full_address(self):
        expected = f'{self.customer.address} {self.customer.state} {self.customer.zipcode}'
        self.assertEqual(expected, self.customer.full_address)

    def test_contact(self):
        expected = f'{self.customer.contact_name} <self.customer.email>'

Let’s run the tests with:

(faker_testing-rhVJ83Ej) ʄ faker_testing >> python -m unittest
Ran 2 tests in 0.049s


Advanced Usage

As you can see above, faker is very easy to use. There are more advanced things that you can do with it:

Generating data for some specific locale

Lets try generating some data in Spanish:

from faker import Faker

fake  = Faker('es_ES')

(faker_testing-rhVJ83Ej) ʄ faker_testing >> python fake_es.py
Joan Morán Ávila
Pasadizo Felix Escamilla 13 Apt. 22
Granada, 18242
+34 686 985 516

Creating custom providers

This example illustrates how to create a provider for the planets on the milky way:

from faker.providers import BaseProvider

class PlanetProvider(BaseProvider):

    __provider__ = 'planet'
    __lang___ = 'en_US'

    planets = ['Neptune', 'Mars', 'Mercury', 'Venus', 'Earth', 'Jupiter', 'Saturn', 'Uranus']

    def planet(self):
        return self.random_element(self.planets)

It can be used like:

from faker import Faker

from planet_provider import PlanetProvider

fake = Faker()

fake = Faker()
(faker_testing-rhVJ83Ej) ʄ faker_testing >> python planet_test.py

Generating always the same data

Sometimes you need to return the same data on every run for tests that need verifiable input/output. Faker allows to do this by using a seed(can be any number) that always will return the same data, like:

from faker import Faker

fake = Faker()

(faker_testing-rhVJ83Ej) ʄ faker_testing >> python faker_seed.py
Jessica Smith
(faker_testing-rhVJ83Ej) ʄ faker_testing >> python faker_seed.py
Jessica Smith
(faker_testing-rhVJ83Ej) ʄ faker_testing >> python faker_seed.py
Jessica Smith

As you can see running the program will always return the same data.


To learn more about faker and see all the available generators, you can refer to the documentation at: