Create Checkpoint

Use this notebook to configure a new Checkpoint and add it to your project:

Checkpoint Name: data_quality_demo_checkpoint

import os
os.chdir('/home/thulasiram/personal/going_deep_and_wide/GiveDirectly/gx_tutorials/great_expectations')
from ruamel.yaml import YAML
import great_expectations as gx
from pprint import pprint

yaml = YAML()
context = gx.get_context()

Create a Checkpoint Configuration

If you are new to Great Expectations or the Checkpoint feature, you should start with SimpleCheckpoint because it includes default configurations like a default list of post validation actions.

In the cell below we have created a sample Checkpoint configuration using your configuration and SimpleCheckpoint to run a single validation of a single Expectation Suite against a single Batch of data.

To keep it simple, we are just choosing the first available instance of each of the following items you have configured in your Data Context: * Datasource * DataConnector * DataAsset * Partition * Expectation Suite

Of course this is purely an example, you may edit this to your heart’s content.

My configuration is not so simple - are there more advanced options?

Glad you asked! Checkpoints are very versatile. For example, you can validate many Batches in a single Checkpoint, validate Batches against different Expectation Suites or against many Expectation Suites, control the specific post-validation actions based on Expectation Suite / Batch / results of validation among other features. Check out our documentation on Checkpoints for more details and for instructions on how to implement other more advanced features including using the Checkpoint class: - https://docs.greatexpectations.io/docs/reference/checkpoints_and_actions - https://docs.greatexpectations.io/docs/guides/validation/checkpoints/how_to_create_a_new_checkpoint - https://docs.greatexpectations.io/docs/guides/validation/checkpoints/how_to_configure_a_new_checkpoint_using_test_yaml_config

my_checkpoint_name = "data_quality_demo_checkpoint" # This was populated from your CLI command.

yaml_config = f"""
name: {my_checkpoint_name}
config_version: 1.0
class_name: SimpleCheckpoint
run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
validations:
  - batch_request:
      datasource_name: data_quality_demo
      data_connector_name: default_inferred_data_connector_name
      data_asset_name: yellow_tripdata_sample_2019-02.csv
      data_connector_query:
        index: -1
    expectation_suite_name: data_quality_expectation_demo
"""
print(yaml_config)

name: data_quality_demo_checkpoint
config_version: 1.0
class_name: SimpleCheckpoint
run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
validations:
  - batch_request:
      datasource_name: data_quality_demo
      data_connector_name: default_inferred_data_connector_name
      data_asset_name: yellow_tripdata_sample_2019-02.csv
      data_connector_query:
        index: -1
    expectation_suite_name: data_quality_expectation_demo

Customize Your Configuration

The following cells show examples for listing your current configuration. You can replace values in the sample configuration with these values to customize your Checkpoint.

# Run this cell to print out the names of your Datasources, Data Connectors and Data Assets
pprint(context.get_available_data_asset_names())
{'data_quality_demo': {'default_inferred_data_connector_name': ['yellow_tripdata_sample_2019-01.csv',
                                                                'yellow_tripdata_sample_2019-02.csv'],
                       'default_runtime_data_connector_name': ['my_runtime_asset_name']}}
context.list_expectation_suite_names()
['data_quality_expectation_demo']

Test Your Checkpoint Configuration

Here we will test your Checkpoint configuration to make sure it is valid.

This test_yaml_config() function is meant to enable fast dev loops. If your configuration is correct, this cell will show a message that you successfully instantiated a Checkpoint. You can continually edit your Checkpoint config yaml and re-run the cell to check until the new config is valid.

If you instead wish to use python instead of yaml to configure your Checkpoint, you can use context.add_checkpoint() and specify all the required parameters.

my_checkpoint = context.test_yaml_config(yaml_config=yaml_config)
Attempting to instantiate class from config...
    Instantiating as a SimpleCheckpoint, since class_name is SimpleCheckpoint
    Successfully instantiated SimpleCheckpoint


Checkpoint class name: SimpleCheckpoint

Review Your Checkpoint

You can run the following cell to print out the full yaml configuration. For example, if you used SimpleCheckpoint this will show you the default action list.

print(my_checkpoint.get_config(mode="yaml"))
name: data_quality_demo_checkpoint
config_version: 1.0
template_name:
module_name: great_expectations.checkpoint
class_name: Checkpoint
run_name_template: '%Y%m%d-%H%M%S-my-run-name-template'
expectation_suite_name:
batch_request: {}
action_list:
  - name: store_validation_result
    action:
      class_name: StoreValidationResultAction
  - name: store_evaluation_params
    action:
      class_name: StoreEvaluationParametersAction
  - name: update_data_docs
    action:
      class_name: UpdateDataDocsAction
      site_names: []
evaluation_parameters: {}
runtime_configuration: {}
validations:
  - batch_request:
      datasource_name: data_quality_demo
      data_connector_name: default_inferred_data_connector_name
      data_asset_name: yellow_tripdata_sample_2019-02.csv
      data_connector_query:
        index: -1
    expectation_suite_name: data_quality_expectation_demo
profilers: []
ge_cloud_id:
expectation_suite_ge_cloud_id:

Add Your Checkpoint

Run the following cell to save this Checkpoint to your Checkpoint Store.

context.add_checkpoint(**yaml.load(yaml_config))
{
  "action_list": [
    {
      "name": "store_validation_result",
      "action": {
        "class_name": "StoreValidationResultAction"
      }
    },
    {
      "name": "store_evaluation_params",
      "action": {
        "class_name": "StoreEvaluationParametersAction"
      }
    },
    {
      "name": "update_data_docs",
      "action": {
        "class_name": "UpdateDataDocsAction",
        "site_names": []
      }
    }
  ],
  "batch_request": {},
  "class_name": "Checkpoint",
  "config_version": 1.0,
  "evaluation_parameters": {},
  "module_name": "great_expectations.checkpoint",
  "name": "data_quality_demo_checkpoint",
  "profilers": [],
  "run_name_template": "%Y%m%d-%H%M%S-my-run-name-template",
  "runtime_configuration": {},
  "validations": [
    {
      "batch_request": {
        "datasource_name": "data_quality_demo",
        "data_connector_name": "default_inferred_data_connector_name",
        "data_asset_name": "yellow_tripdata_sample_2019-02.csv",
        "data_connector_query": {
          "index": -1
        }
      },
      "expectation_suite_name": "data_quality_expectation_demo"
    }
  ]
}

Run Your Checkpoint & Open Data Docs(Optional)

You may wish to run the Checkpoint now and review its output in Data Docs. If so uncomment and run the following cell.

context.run_checkpoint(checkpoint_name=my_checkpoint_name)
context.open_data_docs()