hooqu.checks

Use this API to add checks to existing Checks.

class hooqu.checks.Check(level: hooqu.checks.CheckLevel, description: str, constraints: Tuple[hooqu.constraints.constraint.Constraint, ...] = <factory>)[source]
add_constraint(constraint)[source]

Returns a new Check object with the given constraint added to the constraints list.

Parameters

constraint (Constraint) – New constraint to be added

Return type

Check

evaluate(context)[source]

Evaluate this check on computed metrics

Parameters

context (AnalyzerContext) – result of the metrics computation

Return type

CheckResult

has_completeness(column, assertion, hint=None)[source]

Creates a constraint that asserts on a column completion

Parameters
  • column (str) – Column to run the assertion on.

  • assertion (Callable[[float], bool]) – A callable that receives a float and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

has_max(column, assertion, hint=None)[source]

Creates a constraint that asserts on the maximum of the column

Parameters
  • column (str) – Column to run the assertion on.

  • assertion (Callable[[float], bool]) – A callable that receives a float and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

has_mean(column, assertion, hint=None)[source]

Creates a constraint that asserts on the mean of the column.

Parameters
  • column (str) – Column to run the assertion on.

  • assertion (Callable[[float], bool]) – A callable that receives a float and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

has_min(column, assertion, hint=None)[source]

Creates a constraint that asserts on the minimum of the column

Parameters
  • column (str) – Column to run the assertion on.

  • assertion (Callable[[float], bool]) – A callable that receives a float and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

has_quantile(column, q, assertion, hint=None)[source]

Creates a constraint that asserts on the quantile of the column. Note that the quantile calculation is done using the “nearest” interpolation, meaning that the closest value of the column column is returned

Parameters
  • column (str) – Column to run the assertion on.

  • q (float) – The q-th quantile to calculate which must be between 0 and 1 inclusive.

  • assertion (Callable[[float], bool]) – A callable that receives a float and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

has_size(assertion, hint=None)[source]

Creates a constraint that calculates the data frame size and runs the assertion on it.

Parameters
  • assertion (Callable[[int], bool]) – A callable that receives a long input parameter and returns a boolean. The callable will receive the value of the size (number of rows) and return a boolean based on whether it satisfies a condition, e.g. lambda sz: sz > 5.

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

has_standard_deviation(column, assertion, hint=None)[source]

Creates a constraint that asserts on the standard deviation of the column. Note that unlike pandas this calculate the population variance i.e. degree of freedom (ddof=0). NaNs are ignored when performing the calculation.

Parameters
  • column (str) – Column to run the assertion on.

  • assertion (Callable[[float], bool]) – A callable that receives a float and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

has_sum(column, assertion, hint=None)[source]

Creates a constraint that asserts on the sum of the column.

Parameters
  • column (str) – Column to run the assertion on.

  • assertion (Callable[[float], bool]) – A callable that receives a float and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

has_uniqueness(columns, assertion, hint=None)[source]

Creates a constraint that asserts on uniqueness in a single or combined set of key columns.

Parameters
  • columns (Union[Sequence[str], str]) – Column or columns to run the assertion on

  • assertion (Callable[[float], bool]) – Callable that receives a double input parameter and returns a boolean. The input is the fraction of unique values in columns.

is_complete(column, hint=None)[source]

Creates a constraint that asserts on a column completion.

Parameters

column (str) – Column to run the assertion on.

Return type

CheckWithLastConstraintFilterable

is_contained_in(column, allowed_values, assertion=<function is_one>, hint=None)[source]

Asserts that every non-null value in a column is contained in a set of predefined values. Note that this only works on a set of string sequences.

Parameters
  • column (str) – Column to run the assertion on

  • allowed_values (Sequence[Union[str, int]]) – Allowed values for the column

  • assertion (Callable[[float], bool]) – Callable that receives a float input parameter and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

is_contained_in_range(column, lower_bound, upper_bound, include_lower_bound=True, include_upper_bound=True, hint=None)[source]

Asserts that the non-null values in a numeric column fall into the predefined interval

Parameters
  • column (str) – Column to run the assertion on

  • lower_bound (float) – lower bound of the interval

  • upper_bound (float) – upper bound of the interval

  • include_lower_bound (bool) – is a value equal to the lower bound allowed?

  • include_upper_bound (bool) – is a value equal to the upper bound allowed?

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

is_non_negative(column, assertion=<function is_one>, hint=None)[source]

Creates a constraint that asserts that a column contains no negative values

Parameters
  • column (str) – Column to run the assertion on

  • assertion (Callable[[float], bool]) – Callable that receives a float input parameter and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

is_positive(column, assertion=<function is_one>, hint=None)[source]

Creates a constraint that asserts that a column contains positive values.

Parameters
  • column (str) – Column to run the assertion on

  • assertion (Callable[[float], bool]) – Callable that receives a float input parameter and returns a boolean

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

is_unique(column, hint=None)[source]

Creates a constraint that asserts on a column uniqueness.

Parameters
  • column (str) – Column to run the assertion on

  • hint (Optional[str]) – A hint to provide additional context why a constraint could have failed

Return type

CheckWithLastConstraintFilterable

class hooqu.checks.CheckLevel(value)[source]

An enumeration.

class hooqu.checks.CheckResult(check: Any, status: hooqu.checks.CheckStatus, constraint_results: Sequence[hooqu.constraints.constraint.ConstraintResult] = <factory>)[source]
class hooqu.checks.CheckStatus(value)[source]

An enumeration.

class hooqu.checks.CheckWithLastConstraintFilterable(level, description, constraints, create_replacement)[source]
where(query)[source]

Defines a filter to apply before evaluating the previous constraint

Parameters

filter – A Pandas query sring to evaluate.

Returns

Return type

A filtered Check