import os
'/home/thulasiram/personal/going_deep_and_wide/GiveDirectly/gx_tutorials/great_expectations') os.chdir(
Create Checkpoint
Use this notebook to configure a new Checkpoint and add it to your project:
Checkpoint Name: data_quality_demo_checkpoint
from ruamel.yaml import YAML
import great_expectations as gx
from pprint import pprint
= YAML()
yaml = gx.get_context() context
Create a Checkpoint Configuration
If you are new to Great Expectations or the Checkpoint feature, you should start with SimpleCheckpoint because it includes default configurations like a default list of post validation actions.
In the cell below we have created a sample Checkpoint configuration using your configuration and SimpleCheckpoint to run a single validation of a single Expectation Suite against a single Batch of data.
To keep it simple, we are just choosing the first available instance of each of the following items you have configured in your Data Context: * Datasource * DataConnector * DataAsset * Partition * Expectation Suite
Of course this is purely an example, you may edit this to your heart’s content.
My configuration is not so simple - are there more advanced options?
Glad you asked! Checkpoints are very versatile. For example, you can validate many Batches in a single Checkpoint, validate Batches against different Expectation Suites or against many Expectation Suites, control the specific post-validation actions based on Expectation Suite / Batch / results of validation among other features. Check out our documentation on Checkpoints for more details and for instructions on how to implement other more advanced features including using the Checkpoint class: - https://docs.greatexpectations.io/docs/reference/checkpoints_and_actions - https://docs.greatexpectations.io/docs/guides/validation/checkpoints/how_to_create_a_new_checkpoint - https://docs.greatexpectations.io/docs/guides/validation/checkpoints/how_to_configure_a_new_checkpoint_using_test_yaml_config
= "data_quality_demo_checkpoint" # This was populated from your CLI command.
my_checkpoint_name
= f"""
yaml_config name: {my_checkpoint_name}
config_version: 1.0
class_name: SimpleCheckpoint
run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
validations:
- batch_request:
datasource_name: data_quality_demo
data_connector_name: default_inferred_data_connector_name
data_asset_name: yellow_tripdata_sample_2019-02.csv
data_connector_query:
index: -1
expectation_suite_name: data_quality_expectation_demo
"""
print(yaml_config)
name: data_quality_demo_checkpoint
config_version: 1.0
class_name: SimpleCheckpoint
run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
validations:
- batch_request:
datasource_name: data_quality_demo
data_connector_name: default_inferred_data_connector_name
data_asset_name: yellow_tripdata_sample_2019-02.csv
data_connector_query:
index: -1
expectation_suite_name: data_quality_expectation_demo
Customize Your Configuration
The following cells show examples for listing your current configuration. You can replace values in the sample configuration with these values to customize your Checkpoint.
# Run this cell to print out the names of your Datasources, Data Connectors and Data Assets
pprint(context.get_available_data_asset_names())
{'data_quality_demo': {'default_inferred_data_connector_name': ['yellow_tripdata_sample_2019-01.csv',
'yellow_tripdata_sample_2019-02.csv'],
'default_runtime_data_connector_name': ['my_runtime_asset_name']}}
context.list_expectation_suite_names()
['data_quality_expectation_demo']
Test Your Checkpoint Configuration
Here we will test your Checkpoint configuration to make sure it is valid.
This test_yaml_config()
function is meant to enable fast dev loops. If your configuration is correct, this cell will show a message that you successfully instantiated a Checkpoint. You can continually edit your Checkpoint config yaml and re-run the cell to check until the new config is valid.
If you instead wish to use python instead of yaml to configure your Checkpoint, you can use context.add_checkpoint()
and specify all the required parameters.
= context.test_yaml_config(yaml_config=yaml_config) my_checkpoint
Attempting to instantiate class from config...
Instantiating as a SimpleCheckpoint, since class_name is SimpleCheckpoint
Successfully instantiated SimpleCheckpoint
Checkpoint class name: SimpleCheckpoint
Review Your Checkpoint
You can run the following cell to print out the full yaml configuration. For example, if you used SimpleCheckpoint this will show you the default action list.
print(my_checkpoint.get_config(mode="yaml"))
name: data_quality_demo_checkpoint
config_version: 1.0
template_name:
module_name: great_expectations.checkpoint
class_name: Checkpoint
run_name_template: '%Y%m%d-%H%M%S-my-run-name-template'
expectation_suite_name:
batch_request: {}
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
site_names: []
evaluation_parameters: {}
runtime_configuration: {}
validations:
- batch_request:
datasource_name: data_quality_demo
data_connector_name: default_inferred_data_connector_name
data_asset_name: yellow_tripdata_sample_2019-02.csv
data_connector_query:
index: -1
expectation_suite_name: data_quality_expectation_demo
profilers: []
ge_cloud_id:
expectation_suite_ge_cloud_id:
Add Your Checkpoint
Run the following cell to save this Checkpoint to your Checkpoint Store.
**yaml.load(yaml_config)) context.add_checkpoint(
{
"action_list": [
{
"name": "store_validation_result",
"action": {
"class_name": "StoreValidationResultAction"
}
},
{
"name": "store_evaluation_params",
"action": {
"class_name": "StoreEvaluationParametersAction"
}
},
{
"name": "update_data_docs",
"action": {
"class_name": "UpdateDataDocsAction",
"site_names": []
}
}
],
"batch_request": {},
"class_name": "Checkpoint",
"config_version": 1.0,
"evaluation_parameters": {},
"module_name": "great_expectations.checkpoint",
"name": "data_quality_demo_checkpoint",
"profilers": [],
"run_name_template": "%Y%m%d-%H%M%S-my-run-name-template",
"runtime_configuration": {},
"validations": [
{
"batch_request": {
"datasource_name": "data_quality_demo",
"data_connector_name": "default_inferred_data_connector_name",
"data_asset_name": "yellow_tripdata_sample_2019-02.csv",
"data_connector_query": {
"index": -1
}
},
"expectation_suite_name": "data_quality_expectation_demo"
}
]
}
Run Your Checkpoint & Open Data Docs(Optional)
You may wish to run the Checkpoint now and review its output in Data Docs. If so uncomment and run the following cell.
=my_checkpoint_name)
context.run_checkpoint(checkpoint_name context.open_data_docs()