Introduction
BigTesty is a framework that allows to create Integration Tests with BigQuery on a real and short-lived Infrastructure.
Integration and End-to-End tests are a robust way to validate if SQL queries work as expected.
There is no emulator in this case and the queries are executed directly in the BigQuery Engine.
BigTesty isolates the tests for each execution to prevent collisions.
Multiples developers or CI CD pipelines can execute tests at the same time.
The infrastructure proposed for the tests is ephemeral by default, but we can keep it if needed, to analyse the
result in BigQuery.
After each test, a report result is returned to indicate the good and failure cases.
Schema that illustrates how BigTesty
works:
Getting started
There is a Python
package for BigTesty and it can be installed from PyPi
:
pip install bigtesty
How to run tests
You need to be authenticated with Google Cloud Platform before running command.
We recommend to be authenticated with Application Default Credentials
gcloud auth application-default login
The current GCP user needs to have the expected privileges to perform the actions in BigQuery and in the GCS bucket specified as backend URL
Example of code structure
The root test folder
This folder contains all the testing definition files and the tests scenarios. The format is Json.
Example of a scenarios with a nominal case definition_spec_failure_by_feature_name_no_error.json
:
{
"description": "Test of monitoring data",
"scenarios": [
{
"description": "Nominal case find failure by feature name",
"given": [
{
"input_file_path": "monitoring/given/input_failures_feature_name.json",
"destination_dataset": "monitoring",
"destination_table": "job_failure"
}
],
"then": [
{
"fields_to_ignore": [
"\\[\\d+\\]\\['dwhCreationDate']"
],
"actual_file_path": "monitoring/when/find_failures_by_feature_name.sql",
"expected_file_path": "monitoring/then/expected_failures_feature_name.json"
}
]
}
]
}
In this example, there is only one scenario with 3 blocs:
given
: a list of input test data to ingest to the BigQuery tables. The input data can be proposed in a separateJson
file or directly embedded.input_file_path
for a separate file andinput
for an embedded objectthen
: a list of objects containing the SQL query to test and the expected data.actual/actual_file_path
=> SQL query |expected/expected_file_path
=> expected data
The root tables folder
This folder contains the resources concerning the BigQuery
datasets
and tables
to create.
For example, all the BigQuery
schemas are proposed in this folder.
The tables config file
The config file that lists all the BigQuery
datasets
and tables
to create in a Json
format.
Example:
[
{
"datasetId": "monitoring",
"datasetRegion": "EU",
"datasetFriendlyName": "Monitoring Dataset",
"datasetDescription": "Monitoring Dataset description",
"tables": [
{
"tableId": "job_failure",
"autodetect": false,
"tableSchemaPath": "schema/monitoring/job_failure.json",
"partitionType": "DAY",
"partitionField": "dwhCreationDate",
"clustering": [
"featureName",
"jobName",
"componentType"
]
}
]
}
]
In this example, we have a dataset called monitoring
with the metadata.
This dataset contains a table called job_failure
with the metadata. Some fields can target on the BigQuery
schemas proposed in the root tables folder
.
Run with CLI
We need to pass the 3 parameters indicated in the previous section, in the command line to launch the tests:
- root-test-folder: the root folder containing all the testing files
- root-tables-folder: the root folder containing all the needed files to create the datasets and tables in BigQuery (Json schema...)
- tables-config-file: the Json configuration file that lists all the datasets and tables to create in BigQuery
Also, common GCP parameters like:
- project: the GCP project ID
- region: the GCP region
BigTesty
uses an ephemeral infra internally via the concept of Infra As Code and the backend to host the state must be a cloud Storage
bucket.
We need to pass the backend URL via parameter in the CLI:
- iac-backend-url
By default, the infra created by BigTesty for the tests is ephemeral, but we can also keep it with an optional option:
- keep-infra
Sometimes, it could be interesting to keep alive the created infra, to allow developers, data engineers and data scientist to analyse the result with SQL queries.
The tests can be executed with the following command line:
bigtesty test \
--project $PROJECT_ID \
--region $LOCATION \
--iac-backend-url gs://$IAC_BUCKET_STATE/bigtesty \
--root-test-folder $(pwd)/examples/tests \
--root-tables-folder $(pwd)/examples/tests/tables \
--tables-config-file $(pwd)/examples/tests/tables/tables.json
All the testing files showed in the documentation are accessible from the examples
folder proposed at the root of the BigTesty
repo.
Run with Docker
Instead of pass the arguments by the CLI, we can also pass them with environment variables.
export PROJECT_ID={{project_id}}
export LOCATION={{region}}
export IAC_BACKEND_URL=gs://{{gcs_state_bucket}}/bigtesty
export ROOT_TEST_FOLDER=/opt/bigtesty/tests
export ROOT_TABLES_FOLDER=/opt/bigtesty/tests/tables
export TABLES_CONFIG_FILE_PATH=/opt/bigtesty/tests/tables/tables.json
docker run -it \
-e GOOGLE_PROJECT=$PROJECT_ID \
-e GOOGLE_REGION=$LOCATION \
-e IAC_BACKEND_URL=$IAC_BACKEND_URL \
-e TABLES_CONFIG_FILE="$TABLES_CONFIG_FILE_PATH" \
-e ROOT_TEST_FOLDER=$ROOT_TEST_FOLDER \
-e ROOT_TABLES_FOLDER="$ROOT_TABLES_FOLDER" \
-v $(pwd)/examples/tests:/opt/bigtesty/tests \
-v $(pwd)/examples/tests/tables:/opt/bigtesty/tests/tables \
-v $HOME/.config/gcloud:/opt/bigtesty/.config/gcloud \
groupbees/bigtesty test
Some explanations:
All the parameters are passed as environment variables.
We need also to mount as volumes:
- the tests root folder :
-v $(pwd)/examples/tests:/opt/bigtesty/tests
- the tables root folder:
-v $(pwd)/examples/tests/tables:/opt/bigtesty/tests/tables
- the
gcloud
configuration:-v $HOME/.config/gcloud:/opt/bigtesty/.config/gcloud
When the authentication is done with Applications Default Credentials via the following command gcloud auth application-default login
,
a short-lived credential is generated in the local gcloud
configuration: $HOME/.config/gcloud
To prevent the use of a long-lived SA token key, we can share and mount as volume the local gcloud
configuration with the Docker
container: -v $HOME/.config/gcloud:/opt/bigtesty/.config/gcloud
With this technic, the container will be authenticated in Google Cloud securely, with your current user in the Shell session.
Run with Cloud Build
export PROJECT_ID={{project_id}}
export LOCATION={{region}}
export IAC_BACKEND_URL=gs://{{gcs_state_bucket}}/bigtesty
gcloud builds submit \
--project=$PROJECT_ID \
--region=$LOCATION \
--config examples/ci/cloud_build/run-tests-cloud-build.yaml \
--substitutions _IAC_BACKEND_URL=$IAC_BACKEND_URL \
--verbosity="debug" .