Usage

The Data Quality Rules repository allows you to evaluate the data quality of Solvency 2, VNS and FTK supervisory reports before you submit them to DNB. You can either apply the rules directly to the XBRL-instance that you want to submit to DNB, or you can apply the rules at an earlier stage in your reporting process.

If you want to apply the rules on internal data sets, then you have to make sure that the format of the data satisfies the data format requirement set out in Data format requirements.

If you want to apply the rules to the XBRL-instance you first need to convert the instance to Pandas files (see Converting XBRL instances to Pandas and CSV). The resulting files will be in the correct data format. We recommend to start with this option.

Evaluation of the rule sets is done by a simple command line interface. We added Jupyter notebooks with a step-by-step description of all parts of the code, in case you want to understand the separate steps and include some steps in your internal reporting process.

Currently we have three rules sets available:

All rule sets are evaluated with DNB’s data-patterns package. With this package we generated most rule sets that we have published here (based on reports that have been submitted to us).

Converting XBRL instances to Pandas and CSV

We use a command line interface to convert XBRL instance to Pandas and CSV. You can run the XBRL to Pandas and CSV converter with:

python src\instance2csv.py

The script will ask for the desired taxonomy, instance-file, output directory, and whether you want verbose column names instead of rc-codes (row-column codes). To evaluate rules you need the templates with the rc-codes. You can also run the script with command line arguments –taxo, –instance, –output and –verbose_labels.

Make sure you use the corresponding version of the taxonomy for your instance.

For each template in the XBRL instance the results are exported to a Pandas pickle-file and a csv-file. A Pandas pickle-file maintains the correct indices, whereas the csv does not, so if you want to access the data read the pickle-files.

The csv-files and the pickle-files are stored in a subdirectory identical to the name of the XBRL-instance (without extension).

All closed axis tables in the XBRL-instance are combined and stored in one pickle-file in the subdirectory where all pickle-files are stored (with the name of the XBRL-instance).

The easiest way to access the data of a separate template is to read the corresponding pickle-file with:

df = pd.read_pickle(filename)

Evaluating additional Rules

To run the additional rules use:

python solvency2-rules\apply1.py

You can run DNBs additional Rules for the following Solvency II reports

  • Annual Reporting Solo (ARS); and
  • Quarterly Reporting Solo (QRS)

The command line interface will ask the entrypoint (ARS or QRS), the directory where the reports (in pickle-files) are stored, the output type (confirmation, exceptions or both) and the output directory. You can also run the script with command line arguments –entrypoint, –report_dir, –output_type and –output_dir.

We distinguish 2 types of tables

  • With a closed-axis, e.g. the balance sheet: an entity reports only 1 balance sheet per period
  • With an open-axis, e.g. the list of assets: an entity reports several ‘rows of data’ in the relevant table

To evaluate the patterns we use a PatternMiner-object (part of the data_patterns package), and run the analyze function.

Evaluating financial Data Rules

To run the financial data rules use:

python solvency2-rules/apply2.py

The command line interface will ask the directory where the reports (in pickle-files) are stored and the output directory. You can also run the script with command line arguments –report_dir, output_type and –output_dir.

With output_type you can choose to output confirmations only, exceptions only, or all results.

If, given the output_type, no output was generated (for example, no exceptions when the output type is exceptions only), then no Excel output file is generated.

The rules are related to the following tables:

  • S.06.02.01.01 (Information on positions held)
  • S.06.02.01.02 (Information on assets)
  • S.06.02.01.01 (Information on positions held) and S.06.02.01.02 (Information on assets)
  • S.08.01.01.01 (Information on positions held) and S2.08.01.01.02 (Information on derivatives)
  • S.08.01.01.02 (Information on derivatives)

Evaluating rule sets for report comparison

The idea of the report comparison rules is a bit more difficult than the additional validation rules from DNB and the financial data validation rules. The report comparison rule sets evaluate differences between two report sets (for example ARS-2020 and ARS-2019), whereas the latter rule sets evaluate one single report set (for example QRS-2020-Q1).

The goal is to detect differences in whether data points are reported (datapoint that were included in one period and not in the other) and to detect significant changes in the values of data point between two periods. As such it is not unusual that some data points are included in one period and not in another, and that some data points change significantly between two periods. Because of that we only included datapoints for which it is highly unusual that they are included in one period and not in the other (< 5% of previous reports), and datapoints for which it is highly unusual (< 5% of previous reports) that they changes significantly over two periods (> 10% change).

You can compare two quarterly reports or two annual reports, but you cannot compare a quarterly report with an annual report, even if they have corresponding data points.

To run the additional rules use:

python solvency2-rules\apply3.py

The command line interface will ask the rule set that you want to apply (compare two QRS-reports or compare two ARS-reports), the entity category (Schade, Herverzekeraar, Leven), the two directories where the two reports are located and the output directory. You can also run the script with command line arguments –rule_set, –entity_category, –report_dir_1, –report_dir_2, output_type and output_dir.

With output_type you can choose to output confirmations only, exceptions only, or all results.

You cannot test these rules with the example instances provided by EIOPA because the instances of subsequent periods contain different LEI-codes.

Rule thresholds

Currently, we have set the threshold on zero decimals, which means that a rule is satisfied if the difference between the expected value and the reported value is lower than 1.5e0. This threshold is set at the level of the rule. This is a slightly different approach than the one applied in XBRL validation rules, where thresholds are set at the level of separate data points. This means that it is possible that an exception to a DQR rule is triggered, where an XBRL rule is not triggered because it applies higher thresholds.

Logging

Logging is performed for each rule set and is set to logging.INFO. Output is written to results\rule-set-1.log, results\rule-set-2.log and results\rule-set-3.log, depending on the rule set that is evaluated. Logging of the data-patterns package is stored in this file, so you can see the result of each rule and pattern evaluation. Log level is currently not an input but you can change the level in the apply-files in solvency2-rules.

The format of the patterns and rules files

The input with the rule sets described above are stored in the same format in Excel files. The columns are described here:

  • pattern_id: the name of the pattern or rule that was applied. A pattern or rule can apply to more than part of the report, but has the same form.
  • cluster: the group or cluster to which the pattern or rule applies (optional), for example life, non-life or reinsurance.
  • pattern_def: the definition of the pattern. The definition uses a simply syntax with references to the datapoints that should be relatively easy to read.
  • support: the number of occurences that satisfy this pattern or rule in reports that were previously submitted to DNB.
  • exceptions: the number of occurences that do not satisfy this pattern or rule in reports submitted to DNB.
  • confidence: the support divided by (support plus exceptions). This is an indicator for how exceptional this pattern or rule is. If the confidence is one, then the pattern or rule is in all cases satisfied. If the confidence is lower than one then this could point to an error in the data or an unusual but acceptable circumstance that led to this exception. Only patterns with very high confidences are published in this repository.
  • pattern_status: the status of the pattern or rule, i.e. blocking taxonomy rule, non-blocking taxonomy rule, validation rule or statistical validation rule.

You can find the documentation of the data-patterns package here.

The format of the results files

The output of the evaluation of the rule sets are all stored in the same format in Excel files.

  • First columns describe the index of the report
  • result_type: true if the pattern or rule is satisfied, false if the pattern or rule is not satisfied
  • pattern_id: the name of the pattern or rule that was applied. A pattern or rule can apply to one or more parts of the report, but has the same form.
  • cluster: the group or cluster to which the pattern or rule applies (optional), for example life, non-life or reinsurance.
  • support: the number of occurrences that satisfy this pattern or rule in the report.
  • exceptions: the number of occurrences that do not satisfy this pattern or rule in the report.
  • confidence: the support divided by (support plus exceptions).
  • pattern_def: the definition of the pattern. The definition uses a simply syntax with references to the datapoints that should be relatively easy to read.
  • P values: the values of data points in the left hand side of the pattern or rule (in case of an if-then rule: the if part)
  • Q values: the values of data points in the right hand side of the pattern or rule (in case of an if-then rule: the then part)

Data format requirements

If you want to apply the rules on internal data sets, then you have to make sure that the data is in the correct format.

Solvency 2

  • the template name follows the standard Solvency 2 code, for example S.02.01.02.01 and S.28.02.01.02;
  • the file names of the individual templates is the template name plus an extension (.csv or .pickle), for example S.01.02.07.01.pickle;
  • the file name of all closed axes templates combined is the instance file name plus an extension, for example qrs_270_instance.pickle (the example instance for qrs);
  • the column names and the index names for all templates have the following format: {reporting template name},R????,C???? or {reporting template name},C????, depending on the definition; for example S.02.01.02.01,R0030,C0010 or S.06.02.01.01,C0040;

VNS

  • the template name follows the standard VNS code with prefix FTK, for example FTK.T2A or FTK.T4B;
  • the file names of the individual templates is the template name plus an extension (.csv or .pickle), for example FTK.T2A.pickle;
  • the file name of all closed axes templates combined is the instance file name plus an extension, for example Verzekeraars Nationale Staten Sample Fixed Interval Instance VNS-JR 1.pickle
  • the column names and the index names for all templates have the following format: {reporting template name},R???,C???; for example T2A,R030,C010 or T0,R010,C010;

FTK

  • the template name follows the standard FTK code with prefix FTK, for example FTK.K101-1 or FTK.K209B;
  • the file names of the individual templates is the template name plus an extension (.csv or .pickle), for example FTK.K101-1.pickle;
  • the file name of all closed axes templates combined is the instance file name plus an extension, for example DNB-NR_FTK-2019-06_2019-12-31_MOD_FTK-BEL.pickle (the example instance for FTK-BEL);
  • the column names and the index names for all templates have the following format: {reporting template name},R???,C??? or {reporting template name},C???, depending on the definition; for example FTK.K101-1,R010,C010 or FTK.K209B,C150;