hooqu.analyzers¶
Subpackages¶
Module contents¶
-
class
hooqu.analyzers.Analyzer(*args, **kwds)[source]¶ Bases:
abc.ABC,typing.Generic-
calculate(data, aggregate_with=None, save_states_with=None)[source]¶ Runs preconditions, calculates and returns the metric
- Parameters
data (
DataFrameLike) – Data frame being analyzedaggregate_with – Loader for previous states to include in the computation (optional)
save_states_with – persist internal states using this (optional)
- Returns
- Return type
Returns failure metric in case preconditions fail.
-
-
class
hooqu.analyzers.Completeness(column, where=None)[source]¶ Bases:
hooqu.analyzers.analyzer.StandardScanShareableAnalyzer
-
class
hooqu.analyzers.Compliance(instance, predicate, where=None)[source]¶ Bases:
hooqu.analyzers.analyzer.NonScanAnalyzerCompliance is a measure of the fraction of rows that complies with the given column constraint. E.g if the constraint is “att1>3” and data frame has 5 rows with att1 column value greater than 3 and 10 rows under 3; a DoubleMetric would be returned with 0.33 value
- Parameters
instance (
str) – Unlike other column analyzers (e.g completeness) this analyzer can not infer to the metric instance name from column name. also the constraint given here can be referring to multiple columns, so metric instance name should be provided, describing what the analysis being done for.predicate (
str) – predicate that can be understood by DataFrameLike.eval.where (
Optional[str]) – Additional filter to apply before the analyzer is run.
-
class
hooqu.analyzers.FrequenciesAndNumRows(*args, **kwds)[source]¶ Bases:
hooqu.analyzers.analyzer.State
-
class
hooqu.analyzers.MaxState(*args, **kwds)[source]¶ Bases:
hooqu.analyzers.analyzer.DoubledValuedState
-
class
hooqu.analyzers.Maximum(column, where=None)[source]¶ Bases:
hooqu.analyzers.analyzer.StandardScanShareableAnalyzer
-
class
hooqu.analyzers.Mean(column, where=None)[source]¶ Bases:
hooqu.analyzers.analyzer.StandardScanShareableAnalyzer
-
class
hooqu.analyzers.MeanState(*args, **kwds)[source]¶ Bases:
hooqu.analyzers.analyzer.DoubledValuedState
-
class
hooqu.analyzers.MinState(*args, **kwds)[source]¶ Bases:
hooqu.analyzers.analyzer.DoubledValuedState
-
class
hooqu.analyzers.Minimum(column, where=None)[source]¶ Bases:
hooqu.analyzers.analyzer.StandardScanShareableAnalyzer
-
class
hooqu.analyzers.NonScanAnalyzer(*args, **kwds)[source]¶ Bases:
hooqu.analyzers.analyzer.AnalyzerAnalyzer that does not need to run any aggregation and can extract the information straight from the dataframe. This is a special implementation of Hooqu for the Size Analyzer.
-
class
hooqu.analyzers.NumMatches(*args, **kwds)[source]¶ Bases:
hooqu.analyzers.analyzer.DoubledValuedState
-
class
hooqu.analyzers.NumMatchesAndCount(*args, **kwds)[source]¶ Bases:
hooqu.analyzers.analyzer.DoubledValuedStateA state for computing ratio-based metrics, contains #rows that match a predicate and overall #rows
-
class
hooqu.analyzers.PatternMatch(column, pattern, where=None)[source]¶ Bases:
hooqu.analyzers.analyzer.StandardScanShareableAnalyzer
-
class
hooqu.analyzers.Quantile(column, quantile, where=None)[source]¶ Bases:
hooqu.analyzers.analyzer.StandardScanShareableAnalyzerQuantile analyzer that computes the quantile using a linear interpolation, i.e. returning a value within the column.
- column:
Column in DataFrameLike for which the quantile is analyzed.
- quantile:
Computed Quantile. Must be in the interval [0, 1], where 0.5 would be the median.
- where:
Additional filter to apply before the analyzer is run.
-
class
hooqu.analyzers.QuantileState(*args, **kwds)[source]¶ Bases:
hooqu.analyzers.analyzer.DoubledValuedState
Bases:
hooqu.analyzers.analyzer.AnalyzerAn analyzer that runs a set of aggregation functions over the data, can share scans over the data
-
class
hooqu.analyzers.StandardDeviation(column, where=None)[source]¶ Bases:
hooqu.analyzers.analyzer.StandardScanShareableAnalyzerCalculate the population standard deviation (degrees of freedom = 0) on the specified column. NaNs are ignored in the calculations.
Note that unlike pandas this calculate the population variance i.e. degree of freedom (ddof=0)
-
class
hooqu.analyzers.StandardDeviationState(*args, **kwds)[source]¶ Bases:
hooqu.analyzers.analyzer.DoubledValuedState
-
class
hooqu.analyzers.Sum(column, where=None)[source]¶ Bases:
hooqu.analyzers.analyzer.StandardScanShareableAnalyzer