Skip to content

Commit

Permalink
Common sql bugfixes and improvements (#26761)
Browse files Browse the repository at this point in the history
* Fix BigQueryTableCheckOperator test

Signed-off-by: Benji Lampel <benjamin@astronomer.io>

* Remove job_id generation in table/col check operators

The job_id is automatically generated by hook.insert_job()
if an empty string is passed, so job_id generation in the
operator is removed in favor of the existing code.

Signed-off-by: Benji Lampel <benjamin@astronomer.io>

* Rework SQL query building

SQL query building is moved to the init() method of the column
and table check operators to lessen the amount of duplicate code
in the child operator. It also has the added effect of, ideally,
passing a more complete query to OpenLineage.

In doing the above, the column check operator had to be reworked and
now matches the logic of the table check operator in terms of
returning multiple rows and only sending one query to the database.

* Remove self.sql overwrite parameter in BigQuery check operators.

* Add option to fail check operators without retries

Adds a new parameter, retry_on_failure, and a new function to
determine if operators should retry or not on test failure.

* Rename and reorder private functions, fix typo

* Update airflow/providers/common/sql/operators/sql.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Small updates to helper functions

* Use _raise_exception when query returns 0 rows

* Update tests and operator code

Updates tests to reflect changes in operator code, and fixed
bugs in operators as well. Mainly moving the code to check for
failed tests into the column and table check operators as it works
slightly differently for each and doesn't make much sense as a
top-level function.

* Fix _failed_checks() to match update in parent operators

* Insert quotes around check_name table op's sql template to fix sql error

* Fix BigQueryTableCheckOperator test

Signed-off-by: Benji Lampel <benjamin@astronomer.io>

* Remove job_id generation in table/col check operators

The job_id is automatically generated by hook.insert_job()
if an empty string is passed, so job_id generation in the
operator is removed in favor of the existing code.

Signed-off-by: Benji Lampel <benjamin@astronomer.io>

* Rework SQL query building

SQL query building is moved to the init() method of the column
and table check operators to lessen the amount of duplicate code
in the child operator. It also has the added effect of, ideally,
passing a more complete query to OpenLineage.

In doing the above, the column check operator had to be reworked and
now matches the logic of the table check operator in terms of
returning multiple rows and only sending one query to the database.

* Remove self.sql overwrite parameter in BigQuery check operators.

* Add option to fail check operators without retries

Adds a new parameter, retry_on_failure, and a new function to
determine if operators should retry or not on test failure.

* Rename and reorder private functions, fix typo

* Update airflow/providers/common/sql/operators/sql.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Small updates to helper functions

* Use _raise_exception when query returns 0 rows

* Update tests and operator code

Updates tests to reflect changes in operator code, and fixed
bugs in operators as well. Mainly moving the code to check for
failed tests into the column and table check operators as it works
slightly differently for each and doesn't make much sense as a
top-level function.

* Fix _failed_checks() to match update in parent operators

* Insert quotes around check_name table op's sql template to fix sql error

* Remove unnecessary list comprehension

* Added assertions in existing column and table tests

* Add new tests for TableCheckOperator

* Update operator logic and tests

Adds "where" option in checks dictionaries for column and table
operators, which may be renamed. This allows for check-level
partitioning, whereas the partition_clause param will always be
for all checks. New tests are added for this addition.

* Add testing for column check operator and line edits

Cleans up operator and adds testing for new generator function.

* Change name 'where' to 'partition_clause' in check dictionaries

* Update docs and use f strings

* Edit operator docstring

* Move _raise_exception to base class to simplify method

* Updates from code review

* Update BigQuery Check operators according to code review

* Rewrite data-building loop to generator

* Add new accept_none argument to column check operator

The new argument, defaulting to true, will convert Nones returned
from the query to 0s so numeric calculations can be performed
correctly. This allows empty tables to be handled as a row of zeroes.

Additional documentation is also supplied

* Fix BigQueryTableCheckOperator test

Signed-off-by: Benji Lampel <benjamin@astronomer.io>

* Remove job_id generation in table/col check operators

The job_id is automatically generated by hook.insert_job()
if an empty string is passed, so job_id generation in the
operator is removed in favor of the existing code.

Signed-off-by: Benji Lampel <benjamin@astronomer.io>

* Rework SQL query building

SQL query building is moved to the init() method of the column
and table check operators to lessen the amount of duplicate code
in the child operator. It also has the added effect of, ideally,
passing a more complete query to OpenLineage.

In doing the above, the column check operator had to be reworked and
now matches the logic of the table check operator in terms of
returning multiple rows and only sending one query to the database.

* Remove self.sql overwrite parameter in BigQuery check operators.

* Add option to fail check operators without retries

Adds a new parameter, retry_on_failure, and a new function to
determine if operators should retry or not on test failure.

* Rename and reorder private functions, fix typo

* Update airflow/providers/common/sql/operators/sql.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

* Small updates to helper functions

* Use _raise_exception when query returns 0 rows

* Update tests and operator code

Updates tests to reflect changes in operator code, and fixed
bugs in operators as well. Mainly moving the code to check for
failed tests into the column and table check operators as it works
slightly differently for each and doesn't make much sense as a
top-level function.

* Fix _failed_checks() to match update in parent operators

* Insert quotes around check_name table op's sql template to fix sql error

* Remove unnecessary list comprehension

* Added assertions in existing column and table tests

* Add new tests for TableCheckOperator

* Update operator logic and tests

Adds "where" option in checks dictionaries for column and table
operators, which may be renamed. This allows for check-level
partitioning, whereas the partition_clause param will always be
for all checks. New tests are added for this addition.

* Add testing for column check operator and line edits

Cleans up operator and adds testing for new generator function.

* Change name 'where' to 'partition_clause' in check dictionaries

* Update docs and use f strings

* Edit operator docstring

* Move _raise_exception to base class to simplify method

* Updates from code review

* Update BigQuery Check operators according to code review

* Rewrite data-building loop to generator

* Add new accept_none argument to column check operator

The new argument, defaulting to true, will convert Nones returned
from the query to 0s so numeric calculations can be performed
correctly. This allows empty tables to be handled as a row of zeroes.

Additional documentation is also supplied

* Fix formatting issues after rebase

Signed-off-by: Benji Lampel <benjamin@astronomer.io>
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
  • Loading branch information
denimalpaca and uranusjr committed Oct 26, 2022
1 parent e6c8c07 commit 87eb46b
Show file tree
Hide file tree
Showing 5 changed files with 413 additions and 185 deletions.
Loading

0 comments on commit 87eb46b

Please sign in to comment.
  翻译: