hooqu.checks¶
Use this API to add checks to existing Checks.
-
class
hooqu.checks.Check(level: hooqu.checks.CheckLevel, description: str, constraints: Tuple[hooqu.constraints.constraint.Constraint, ...] = <factory>)[source]¶ -
add_constraint(constraint)[source]¶ Returns a new Check object with the given constraint added to the constraints list.
- Parameters
constraint (
Constraint) – New constraint to be added- Return type
-
contains_credit_card_number(column, assertion=<function is_one>, hint=None)[source]¶ Check to run against the compliance of a column against a credit card pattern.
- Parameters
column (
str) – Name of the column that should be checked.assertion (
Callable[[float],bool]) – Callable that receives a double input parameter and returns a boolean. The input is the fraction of unique values in columns.hint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
-
contains_email(column, assertion=<function is_one>, hint=None)[source]¶ Check to run against the compliance of a column against a against an e-mail pattern.
- Parameters
column (
str) – Name of the column that should be checked.assertion (
Callable[[float],bool]) – Callable that receives a double input parameter and returns a boolean. The input is the fraction of unique values in columns.hint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
-
contains_url(column, assertion=<function is_one>, hint=None)[source]¶ Check to run against the compliance of a column against a against an URL pattern.
- Parameters
column (
str) – Name of the column that should be checked.assertion (
Callable[[float],bool]) – Callable that receives a double input parameter and returns a boolean. The input is the fraction of unique values in columns.hint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
-
evaluate(context)[source]¶ Evaluate this check on computed metrics
- Parameters
context (
AnalyzerContext) – result of the metrics computation- Return type
-
has_completeness(column, assertion, hint=None)[source]¶ Creates a constraint that asserts on a column completion
- Parameters
column (
str) – Column to run the assertion on.assertion (
Callable[[float],bool]) – A callable that receives a float and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
has_max(column, assertion, hint=None)[source]¶ Creates a constraint that asserts on the maximum of the column
- Parameters
column (
str) – Column to run the assertion on.assertion (
Callable[[float],bool]) – A callable that receives a float and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
has_mean(column, assertion, hint=None)[source]¶ Creates a constraint that asserts on the mean of the column.
- Parameters
column (
str) – Column to run the assertion on.assertion (
Callable[[float],bool]) – A callable that receives a float and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
has_min(column, assertion, hint=None)[source]¶ Creates a constraint that asserts on the minimum of the column
- Parameters
column (
str) – Column to run the assertion on.assertion (
Callable[[float],bool]) – A callable that receives a float and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
has_pattern(column, pattern, assertion=<function is_one>, name=None, hint=None)[source]¶ Checks for pattern compliance. Given a column name and a regular expression, defines a Check on the average compliance of the column’s values to the regular expression.
- Parameters
column (
str) – Name of the column that should be checked.pattern (
Union[str,Pattern]) – The columns values will be checked for a match against this pattern.assertion (
Callable[[float],bool]) – Callable that receives a double input parameter and returns a boolean. The input is the fraction of unique values in columns.hint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
-
has_quantile(column, q, assertion, hint=None)[source]¶ Creates a constraint that asserts on the quantile of the column. Note that the quantile calculation is done using the “nearest” interpolation, meaning that the closest value of the column
columnis returned- Parameters
column (
str) – Column to run the assertion on.q (
float) – The q-th quantile to calculate which must be between 0 and 1 inclusive.assertion (
Callable[[float],bool]) – A callable that receives a float and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
has_size(assertion, hint=None)[source]¶ Creates a constraint that calculates the data frame size and runs the assertion on it.
- Parameters
assertion (
Callable[[int],bool]) – A callable that receives a long input parameter and returns a boolean. The callable will receive the value of the size (number of rows) and return a boolean based on whether it satisfies a condition, e.g.lambda sz: sz > 5.hint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
has_standard_deviation(column, assertion, hint=None)[source]¶ Creates a constraint that asserts on the standard deviation of the column. Note that unlike pandas this calculate the population variance i.e. degree of freedom (ddof=0). NaNs are ignored when performing the calculation.
- Parameters
column (
str) – Column to run the assertion on.assertion (
Callable[[float],bool]) – A callable that receives a float and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
has_sum(column, assertion, hint=None)[source]¶ Creates a constraint that asserts on the sum of the column.
- Parameters
column (
str) – Column to run the assertion on.assertion (
Callable[[float],bool]) – A callable that receives a float and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
has_uniqueness(columns, assertion, hint=None)[source]¶ Creates a constraint that asserts on uniqueness in a single or combined set of key columns.
- Parameters
columns (
Union[Sequence[str],str]) – Column or columns to run the assertion onassertion (
Callable[[float],bool]) – Callable that receives a double input parameter and returns a boolean. The input is the fraction of unique values in columns.hint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
-
is_complete(column, hint=None)[source]¶ Creates a constraint that asserts on a column completion.
- Parameters
column (
str) – Column to run the assertion on.- Return type
-
is_contained_in(column, allowed_values, assertion=<function is_one>, hint=None)[source]¶ Asserts that every non-null value in a column is contained in a set of predefined values. Note that this only works on a set of string sequences.
- Parameters
column (
str) – Column to run the assertion onallowed_values (
Sequence[Union[str,int]]) – Allowed values for the columnassertion (
Callable[[float],bool]) – Callable that receives a float input parameter and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
is_contained_in_range(column, lower_bound, upper_bound, include_lower_bound=True, include_upper_bound=True, hint=None)[source]¶ Asserts that the non-null values in a numeric column fall into the predefined interval
- Parameters
column (
str) – Column to run the assertion onlower_bound (
float) – lower bound of the intervalupper_bound (
float) – upper bound of the intervalinclude_lower_bound (
bool) – is a value equal to the lower bound allowed?include_upper_bound (
bool) – is a value equal to the upper bound allowed?hint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
is_non_negative(column, assertion=<function is_one>, hint=None)[source]¶ Creates a constraint that asserts that a column contains no negative values
- Parameters
column (
str) – Column to run the assertion onassertion (
Callable[[float],bool]) – Callable that receives a float input parameter and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
is_positive(column, assertion=<function is_one>, hint=None)[source]¶ Creates a constraint that asserts that a column contains positive values.
- Parameters
column (
str) – Column to run the assertion onassertion (
Callable[[float],bool]) – Callable that receives a float input parameter and returns a booleanhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
is_unique(column, hint=None)[source]¶ Creates a constraint that asserts on a column uniqueness.
- Parameters
column (
str) – Column to run the assertion onhint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-
satisfies(column_condition, constraint_name, assertion=<function is_one>, hint=None)[source]¶ Creates a constraint that evaluates on the column_condition and executes the assertion. This is useful for complex or custom checks that are better described using a valid expression.
- Parameters
column_condition (
str) – The column expression to be evaluated. If using a Pandas data-frame this expression is evaluated withpandas.eval.constraint_name (
str) – A name that summarizes the check being made. This name is being used to name the metrics for the analysis being done.assertion (
Callable[[float],bool]) – Callable that receives a float input parameter and returns a boolean.hint (
Optional[str]) – A hint to provide additional context why a constraint could have failed
- Return type
-