Thrutalk to Redshift data dictionary
Thrutalk offers 5 reports that can be synced to Redshift through Bluelink:
When setting up the pipeline in the UI, users can choose which subset of the 5 possible reports they want to sync as part of this pipeline.
Each report goes into a different table in Redshift. The table names are: callers, follow_ups, script_results, results, patchthru.
Bluelink adds the following fields to all tables:
- id: An ID for the row created by Bluelink. Bluelink creates IDs because Thrutalk does not provide unique IDs for rows. This ID is created by hashing all the fields we receive from Thrutalk.
- insert_timestamp: The time that this row was inserted into Redshift by Bluelink. This can be useful for debugging, such as seeing the last time the sync ran.
- campaign_name: Bluelink extracts this from the report filename to determine which campaign the report is for. For example, if the filename is “https://getthru-prod-dialer-reports.s3.amazonaws.com/yourorg-special-phone-bank-results-05-31-2022--06-01-2022.csv, all rows from that report will have campaign_name “yourorg-special-phone-bank”
Table-specific fields
Callers
The following fields come from Thrutalk (full documentation can be found here):
- date
- login
- name
- phone
- minutes_in_call
- minutes_in_wrap_up
- minutes_in_ready
- minutes_in_not_ready
- deliverability_error
- no_contact
- remove_number_from_list
- talked_to_correct_person
- custom_data – Note: this is not documented by Callhub but shows up in the CSV.
Follow Ups
The following fields come from Thrutalk (full documentation can be found here):
- voter_id
- sent_at__est_
- delivery_mechanism
- delivered_to
- message
Script Results
This table is not a direct sync of the Thrutalk CSV. Instead, Bluelink extracts each question and answer from the script results CSV into its own row in the script_results Redshift table, in order to create a more normalized, queryable table.
The following columns are in the table (in addition to the fields that Bluelink adds to all tables):
- checksum - This is a hash of the fields (voter_id, voter_id_type, voter_first_name, voter_last_name, voter_phone, date_called, time_called__est_). It is added by Bluelink to allow a join between results and script_results. You can join between results.checksum and script_results.checksum to see which script results correlate to each result row.
- voter_id - The Voter ID given by Thrutalk
- question - The question, as given in the header of the Thrutalk script results CSV. For example, if there is a header “Support Movement”, the value for question will be “Support Movement”.
- answer - The answer to the question, as given in the CSV row. Note: A row will only be added to this table if the answer is not null.
See Thrutalk Script Results documentation here:
Results
Fields added by Bluelink:
- checksum - This is a hash of the fields (voter_id, voter_id_type, voter_first_name, voter_last_name, voter_phone, date_called, time_called__est_). It is added by Bluelink to allow a join between results and script_results. You can join between results.checksum and script_results.checksum to see which script results correlate to each result row.
The following fields come from Thrutalk (full documentation can be found here):
- voter_id
- voter_id_type
- voter_first_name
- voter_last_name
- voter_phone
- date_called
- time_called__est_
- caller_login
- result
PatchThru
The following fields come from Thrutalk (full documentation can be found here):
- service
- voter_name
- voter_id
- voter_phone
- date
- caller_username
- transfer_start_time
- transfer_end_time
- transfer_duration
- time_in_queue
- transfer_status
- transfer_target
Note: PatchThru results will only be synced if the client enables this feature in their Thrutalk account.
General note:
This pipeline is currently only in the UI as Thrutalk -> TMC Redshift, but we can make it available for “generic Redshift” if that’s requested.