Assetario Documentation
  • Overview
    • Assetario Overview
    • Predicted Life-Time Value (pLTV)
    • In-App Purchase Personalization (IAP Recommendation)
  • Integration
    • Assetario onboarding
    • Data Requirements
    • Data input integration guides
      • DeltaDNA integration
      • Amplitude AWS integration
      • Google BigQuery access share
      • AWS S3 Bucket access share
      • Scheduling Queries in BigQuery
    • PLTV: Setup
    • IAP Personalization: Setup
      • Dynamic offer personalization
        • Offers definition table
        • API reference
      • Contextual price personalization
        • Price levels definition
        • API reference
      • A/B tests
      • Third Party Integrations
        • RevenueCat
  • FAQ
    • IAP Personalization FAQ
  • Dashboards
    • PLTV Dashboard
    • IAP Personalization Dashboard
  • Best practices
    • PLTV: Google Adwords
      • Offline conversions
    • PLTV: Facebook
      • Offline conversions
      • Value-Based Lookalikes
  • Data Security
  • Dictionary
Powered by GitBook
On this page
  1. Integration
  2. Data input integration guides

Scheduling Queries in BigQuery

PreviousAWS S3 Bucket access shareNextPLTV: Setup

Last updated 2 years ago

It is possible to run queries in an automated fashion through the BigQuery Data Transfer service. This can be used, for example, to clean up data that is being dumped daily into BigQuery into a more convenient format for downstream services.

This is a step-by-step guide on how to schedule queries through the Google Console UI.

In our toy example, we receive daily dumps into a date-sharded BigQuery table, and we want to extract two id's into a separate cumulative table which will contain the entire set of user_id’s present in our data. We write a query which processes one day worth of data, and appends user_id’s we don’t have in our cumulative table yet to it. We schedule the query, and back-fill it on historical data.

Official Documentation

Feel free to make use of the official documentation provided by Google:

Enable the BigQuery Data Transfer service

Navigate to the in the list of API’s, and click the Enable button. If you can see a Manage button instead, the service is already enabled.

Permissions

The account scheduling or running the query must have the following permissions:

  • bigquery.jobs.create

  • bigquery.datasets.update

  • bigquery.transfers.get

  • bigquery.transfers.update

The bigquery.admin role includes these permissions.

Name the new role appropriately, and add the aforementioned permissions.

Writing the query

Navigate to the Scheduled Queries tab of the BigQuery service, and click on Create A Scheduled Query (or create a scheduled query in editor).

Write the query that you wish to be scheduled. You may run it to check whether the output is what you expect; this is separate from scheduling it, and it does not write the result into any destination (unless the query itself does so).

The below is a sample query which selects two id columns from a date-sharded table schedule_test_* on a particular day, but only such rows that are not already present in (our destination) table mapping_test:

WITH id_mapping AS (
SELECT `one_id` `user_id`, `another_id` `real_id`
FROM `custom_data`.`schedule_test_*`
WHERE `_TABLE_SUFFIX` = FORMAT_DATE('%Y%m%d' , DATE_SUB(@run_date, INTERVAL 1 DAY) )
GROUP BY `one_id`, `another_id`
)

SELECT `user_id`, `real_id`
FROM 
    id_mapping
WHERE
    `user_id` IS NOT NULL

EXCEPT DISTINCT
SELECT `user_id`, `real_id` FROM `custom_data.mapping_test`

We subtract one day from the @run_date, because we want the query to execute on data that was dumped yesterday in relation to the date on which the query will be run (so when it runs on 2022-10-15, it will process data in the table schedule_test_20221014. We may test this by replacing @run_date with DATE('2022-10-15') to test-run the query in the editor.

Scheduling the query

Click the Schedule button on the top bar to open a drop-down menu, and select Create new scheduled query.

Name the query appropriately.

Choose how often the query should be run, e.g. Daily.

Set a custom time when the query will be run. Take care of the timezone the time is in, and that the data that the query should process is available at the time that it will be running.

Set the destination dataset and table where the query results will be written, and Append to table if the results should be appended (and the table is cumulative), or Overwrite table if the query results should replace the table.

In our case we named the query Cumulative user_id query, set it to run daily at 6AM UTC, and append the results to the mapping_test table (since our query outputs only rows which are not already present in it).

You may schedule the query to be run under a service account in the Advanced options, and you may opt to receive email notifications when the query fails under Notification options

Managing the query & executing a back-fill

You may navigate to the Scheduled queries tab from the left-hand menu. You should see your newly scheduled query.

Clicking on it, you can check its individual runs under the Run History tab, and its configuration under the Configuration tab.

Schedule a back-fill (run the query as if it had been scheduled in the past) by clicking the Schedule Backfill button in the top-right corner, then Run for a date range, and use the calendar to select the start and end date for the historical query runs.

After the operation succeeds, and refreshing the page, you should see the historical query runs under the Run History tab.

After the runs succeed, you may check whether the destination table contains the data you expected.

It is possible to create a custom role with these permissions and assign it to a (service) account so that they may schedule queries, by going to the service’s roles tab and clicking Create Role at the top.

Then, attach the role to the account by going to the , clicking Edit Principal, clicking Add Another Role, and selecting the newly created role.

You may make use of @run_date, @run_time to insert the run date or time into the query. Unfortunately, Google Console does not support running a query in the editor with these parameters (even though scheduling the query will work and it will run properly). You may either test the query using the command-line bq tool, or by simply replacing the special parameters with a sample date and time when test-running the query, then putting the parameters back in when scheduling it.

Enabling the BigQuery Data Transfer service
Scheduling Queries
Formatting the query schedule
BigQuery Data Transfer API
Google Console IAM & Admin
main IAM page
special parameters