Skip to contents

Summarise the `diffdf::diffdf` results in a compact table

Usage

get_diffdf_sum(dta)

mydiff(base_dta, compare_dta, key = c("numind", "hhid"), tollerance = 0.01)

get_MAD(dta)

compare_two(dta_base, dta_compare, tol = 0.01, keys = c("hhid"))

Arguments

dta

is the element of the list resulting from the diffdf::diffdf() function. Usually, such elements of a list start from `VarDiff`.

base_dta

base data frame

compare_dta

data frame to compare with base

key

is the key columns in both data frames that could be used for row-by-row comparison. If NULL, data frames are compared by row order.

tolerance

numerical tolerance for each comparison (0.01 by default).

Value

`get_diffdf_sum()` returns A data frame that summaries the differences between `B` - Base data and `C` - Compare data frame by variable present in the data. The columns in The summary data frame consists of:

* `VARIABLE` is the name of the variable compared;

* `Diff ` is the number of rows different between Base and Compare for each variable given provided tolerance; Diff excl NA` is the number of rows where both values in Base and Compare are non-NA and the difference between them is above the tolerance;

* `NA-!0 B*C|C*B[Total]` is the number of rows that have **NA (missing) value in base and a non-Zero value in Compare**, after "|" the comparison of Compare versus Base is made and the total is reported in brackets;

* `NA-0 B*C|C*B[Total]` is the number of rows that have **NA (missing) value in base and a Zero value in Compare**, after "|" the comparison of Compare versus Base is made and the total is reported in brackets;

* `MAD` is the average of the absolute deviation between base and compare (not rounded), i.e. $\sum_i|\textBase - \textCompare| / N$, where $i$ is the index of the row that is different between Base and Compare given the tolerance (any missing/NA values are converted to zero);

* `Mean ABS dev/Base` is the mean absolute deviation between the base and compare divided by base value and multiplied by 100

digits), i.e. $\sum_i|\textBase - \textCompare| / N$, where $i$ is the index of the row that is different between Base and Compare given the tolerance;

Functions

  • mydiff(): Compare base and compage data using diffdf::diffdf() by matching columns only.

  • get_MAD(): Computed summary statistics about the differences in a single value

  • compare_two(): Compares two data sets and prints knittable summary