Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Common sql bugfixes and improvements (#26761)
* Fix BigQueryTableCheckOperator test Signed-off-by: Benji Lampel <benjamin@astronomer.io> * Remove job_id generation in table/col check operators The job_id is automatically generated by hook.insert_job() if an empty string is passed, so job_id generation in the operator is removed in favor of the existing code. Signed-off-by: Benji Lampel <benjamin@astronomer.io> * Rework SQL query building SQL query building is moved to the init() method of the column and table check operators to lessen the amount of duplicate code in the child operator. It also has the added effect of, ideally, passing a more complete query to OpenLineage. In doing the above, the column check operator had to be reworked and now matches the logic of the table check operator in terms of returning multiple rows and only sending one query to the database. * Remove self.sql overwrite parameter in BigQuery check operators. * Add option to fail check operators without retries Adds a new parameter, retry_on_failure, and a new function to determine if operators should retry or not on test failure. * Rename and reorder private functions, fix typo * Update airflow/providers/common/sql/operators/sql.py Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Small updates to helper functions * Use _raise_exception when query returns 0 rows * Update tests and operator code Updates tests to reflect changes in operator code, and fixed bugs in operators as well. Mainly moving the code to check for failed tests into the column and table check operators as it works slightly differently for each and doesn't make much sense as a top-level function. * Fix _failed_checks() to match update in parent operators * Insert quotes around check_name table op's sql template to fix sql error * Fix BigQueryTableCheckOperator test Signed-off-by: Benji Lampel <benjamin@astronomer.io> * Remove job_id generation in table/col check operators The job_id is automatically generated by hook.insert_job() if an empty string is passed, so job_id generation in the operator is removed in favor of the existing code. Signed-off-by: Benji Lampel <benjamin@astronomer.io> * Rework SQL query building SQL query building is moved to the init() method of the column and table check operators to lessen the amount of duplicate code in the child operator. It also has the added effect of, ideally, passing a more complete query to OpenLineage. In doing the above, the column check operator had to be reworked and now matches the logic of the table check operator in terms of returning multiple rows and only sending one query to the database. * Remove self.sql overwrite parameter in BigQuery check operators. * Add option to fail check operators without retries Adds a new parameter, retry_on_failure, and a new function to determine if operators should retry or not on test failure. * Rename and reorder private functions, fix typo * Update airflow/providers/common/sql/operators/sql.py Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Small updates to helper functions * Use _raise_exception when query returns 0 rows * Update tests and operator code Updates tests to reflect changes in operator code, and fixed bugs in operators as well. Mainly moving the code to check for failed tests into the column and table check operators as it works slightly differently for each and doesn't make much sense as a top-level function. * Fix _failed_checks() to match update in parent operators * Insert quotes around check_name table op's sql template to fix sql error * Remove unnecessary list comprehension * Added assertions in existing column and table tests * Add new tests for TableCheckOperator * Update operator logic and tests Adds "where" option in checks dictionaries for column and table operators, which may be renamed. This allows for check-level partitioning, whereas the partition_clause param will always be for all checks. New tests are added for this addition. * Add testing for column check operator and line edits Cleans up operator and adds testing for new generator function. * Change name 'where' to 'partition_clause' in check dictionaries * Update docs and use f strings * Edit operator docstring * Move _raise_exception to base class to simplify method * Updates from code review * Update BigQuery Check operators according to code review * Rewrite data-building loop to generator * Add new accept_none argument to column check operator The new argument, defaulting to true, will convert Nones returned from the query to 0s so numeric calculations can be performed correctly. This allows empty tables to be handled as a row of zeroes. Additional documentation is also supplied * Fix BigQueryTableCheckOperator test Signed-off-by: Benji Lampel <benjamin@astronomer.io> * Remove job_id generation in table/col check operators The job_id is automatically generated by hook.insert_job() if an empty string is passed, so job_id generation in the operator is removed in favor of the existing code. Signed-off-by: Benji Lampel <benjamin@astronomer.io> * Rework SQL query building SQL query building is moved to the init() method of the column and table check operators to lessen the amount of duplicate code in the child operator. It also has the added effect of, ideally, passing a more complete query to OpenLineage. In doing the above, the column check operator had to be reworked and now matches the logic of the table check operator in terms of returning multiple rows and only sending one query to the database. * Remove self.sql overwrite parameter in BigQuery check operators. * Add option to fail check operators without retries Adds a new parameter, retry_on_failure, and a new function to determine if operators should retry or not on test failure. * Rename and reorder private functions, fix typo * Update airflow/providers/common/sql/operators/sql.py Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com> * Small updates to helper functions * Use _raise_exception when query returns 0 rows * Update tests and operator code Updates tests to reflect changes in operator code, and fixed bugs in operators as well. Mainly moving the code to check for failed tests into the column and table check operators as it works slightly differently for each and doesn't make much sense as a top-level function. * Fix _failed_checks() to match update in parent operators * Insert quotes around check_name table op's sql template to fix sql error * Remove unnecessary list comprehension * Added assertions in existing column and table tests * Add new tests for TableCheckOperator * Update operator logic and tests Adds "where" option in checks dictionaries for column and table operators, which may be renamed. This allows for check-level partitioning, whereas the partition_clause param will always be for all checks. New tests are added for this addition. * Add testing for column check operator and line edits Cleans up operator and adds testing for new generator function. * Change name 'where' to 'partition_clause' in check dictionaries * Update docs and use f strings * Edit operator docstring * Move _raise_exception to base class to simplify method * Updates from code review * Update BigQuery Check operators according to code review * Rewrite data-building loop to generator * Add new accept_none argument to column check operator The new argument, defaulting to true, will convert Nones returned from the query to 0s so numeric calculations can be performed correctly. This allows empty tables to be handled as a row of zeroes. Additional documentation is also supplied * Fix formatting issues after rebase Signed-off-by: Benji Lampel <benjamin@astronomer.io> Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
- Loading branch information