Test data generation with RapidRep®

03.05.2018

Generate your test data for the Back-End with RapidRep – uncomplicated, lasting, and highly flexible.

A central step in the testing process is to test with a high amount of records. Not the least for reasons of data protection, however, should your test data not be your production data. This is highly relevant especially with regards to the EU-wide General Data Protection Regulation, which comes into force in May 2018, as well as the supplementary, national “new Federal Data Protection Act” (“neue Bundesdatenschutzgesetz“; in Germany).¹ Random, fictitious data is required. But data generated with online generators or low-compliance software is more often than not insufficient to fully cover your own use cases. Generating the necessary test data is therefore often a time-consuming and costly process, possibly complicated by complex data relations.

Here we offer you an easy solution with RapidRep. Thanks to RapidRep sets of rules, you create with RapidRep realistic, random data for the back-end that comply with your requirements as well as with the new data protection regulations, are flexibly adaptable and can thus cover a high number of use cases. You are not limited to a small amount of records or a single target system, nor to a single output format. Since RapidRep also has an innovative solution for data quality evaluation on board, you can optionally also always check the quality of your test data.

The procedure for creating test data with RapidRep essentially comprises two steps: the description of a data model for your test data and the creation of a RapidRep set of rules for test data generation. The rules must then be included in a RapidRep report definition to control the output of the data.²

Step 1: Identify the data model and target tables

First, the question arises what data you need for the test. This should normally result from your particular use case and in the best case you already have a corresponding data model available. If this is not the case, identify a data model for your test data in the first step. From this can then be derived both the target tables and columns, which we want to fill with test data, as well as their relational dependencies. (In the following example, we will look at the output in spreadsheets, but with RapidRep you can choose other output formats, such as CSV.)

Data model for the sample of test data generation

(Figure 1: Data model for the sample of test data generation)

In this example, the record "Customer" is the starting point on which the further records depend and which, as well as e.g. the order should be uniquely identifiable, i.e. in the table "Customer", no customer should be duplicated and exactly one customer must be assigned to each order, invoice and address. On the other hand, a customer could have different, i.e. several, addresses, e.g. a different delivery address for a particular order. The model shows that we want to create the tables "Address", "Customer", "Order", "Order_Position", "Invoice" and "Payment" for our data.

However, how are the values to fill these target tables determined, and how can the resulting test data be made realistic (e.g. correspond to statistical frequencies)? For this, you transfer in the next step the data model into a RapidRep set of rules that contains all the rules needed to generate the corresponding test data.

Step 2: Create RapidRep set of rules for test data generation

For the sake of clarity and accessibility, RapidRep rules are created in Excel and have a flexible structure. They should be understandable to all involved persons in order to ensure transparency and credibility of the results. More information about the properties of the rules and the model-based testing with RapidRep can be found here: Model-based testing with the RapidRep Test Suite.

The test data generation set of rules includes a worksheet that depicts the data model, a worksheet with raw data, and various worksheets with rules and specifications for how they are used. The raw data represents the lists of values used as the data source for the target tables and columns. By means of the step-by-step evaluation of the contained rules, the output or raw data are transformed into the expected result, in this case the test data. The following graphic shows a selection of the rules that are used in our test data generation example.

Detail Set of Rules; Rules for the sample for test data generation

(Figure 2: Detail Set of Rules; Rules for the sample for test data generation)

RapidRep rules have the following common properties:

Rules have the form: If (condition) -> Then (action).
Each rule is uniquely identifiable via the attribute RULE_ID (Figure 2, column 3).
With the attribute ORDER_ID the order of the evaluation of rules can be changed, if required (Figure 2, column 2; not relevant in our example).
Each rule performs a very specific task in a set of rules (functional aspect).
Each rule must have at least one attribute, which the rule engine can use to evaluate the If-condition.
Rules may have as many other attributes as needed.

In the stepwise evaluation of the rules, the rule with the RULE_ID 1 would first be used in the example shown. This refers to the table "Customer" (see Figure 2, column "Aspect") and fills in the column "Country" (see Figure 2, column "Target"). The value that RapidRep will add to the column is determined by the function specified in the "Source" column. This selects a value from the value list "Country" according to given weightings for the probability (see Figure 3 below). Where the values can be found and with which weighting they will be selected is specified on other worksheets ("Import specifications", "Enumerations").

Value list

(Figure 3: Value list "Country" for the sample for test data generation)

The succeeding rules in the example in figure 2 refer to the selection and, based on this selection, determine the value of the "Language" column of the "Customer" table. If, for example, the value "IT" is selected for "Country", then Rules 2-5 are evaluated as "failed" because the pre-condition (see figure 2, column 5, "Pre-Condition") is not fulfilled. Depending on the random percentage likelihood set in the condition (see figure 2, column 6, "Condition"), the value "it", "de" or "fr" is entered in the target column "Language". If the value for "Country" does not match any of the default countries, rules 2-12 will be counted as "failed" and the rule with rule_ID 13 will apply, i.e. in this case, enter "en" as the language value in the "Language" column.

In this way, all columns of the target tables specified in our data model are filled. This is done with fictitious and random values, which are stored in the worksheet "Raw data" in lists. Functions used, for example, to select the country, are created in the report definition in RapidRep, which interprets the rules that are created and can be traced in the rules. Once created, the test data in RapidRep can be randomly generated whenever needed by means of the set of rules and the report definition, and always according to the completely user-defined and therefore to any special case applicable requirements.

Conclusion: Test data generation with RapidRep

To make sure your processes and data processing work and comply with the new safety regulations, you should try as many data records as possible in your tests. Of course, these test data must also comply with the provisions of the EU General Data Protection Regulation and the "new Federal Data Protection Act". Test data that is not only fictitious but also matches your use cases and correctly maps the relational dependencies are the nuts and bolts.
With RapidRep sets of rules you create test data that match exactly your requirements:

they are fictitious;
they are random;
they are comprehensible;
they can correspond to the frequency distributions of general and customer-specific statistics;
they can be adapted to individual circumstances (implementation of your own data model).

Further advantages:

The number of test data records is not limited to a few hundred;
the output format and the target system can be adjusted;
sets of rules can be easily copied and modified for other case scenarios;
the quality of test data can be directly tested with RapidRep’s integrated solutions.

You do not need any other test data generation software, but can directly generate it with RapidRep, while taking advantage of all the other benefits of working with the RapidRep suites.

If you need support for creating your test data with RapidRep, please contact us and we will advise you!

Footnotes:
1: See for example https://www.datenschutz-wiki.de/BDSG_2018 (in German)

2: These are typical work steps with RapidRep that go beyond the topic discussed here. Information is provided by other articles from this website, our forum and the RapidRep documentation.

Go back