Data cloud
Last updated
Was this helpful?
Last updated
Was this helpful?
Swarmia will send your data in the form of a Parquet file to your storage bucket, starting with Amazon S3. This data can then be pulled to your choice of data warehouse such as Snowflake. The full dataset will be refreshed every 8 hours to ensure there is recent enough data when the data is pulled.
The data exported currently is information about pull requests and deployments. The export contains multiple files
pull_requests.parquet
deployments.parquet
authors.parquet
pull_request_reviews.parquet
pull_request_review_requests.parquet
pull_request_comments.parquet
The pull request data contains:
Author information
Contributing teams
Investment categories
: cycle time, progress time, time in review, and time in merge
Timestamps of pull request events: first review time, first and last commit time, etc...
Pull request comments by authors, review requests, when pull request was reviewed and by whom etc.
The deployments data contains:
List of deployed PRs in the deployment
Deployment information: version, environment, application
You can read more of these metrics in Metrics & definitions.
The setup requires you to have an Amazon S3 storage bucket to which Swarmia can push data.
Insert your S3 bucket credentials
Save the settings. After this, you can test the integration from the "Export" button in the UI.
Swarmia requires the following permissions on an S3 bucket and folder to be able to upload the exports:
s3:GetBucketLocation
s3:PutObject
An example of AWS IAM Policy for the export bucket, where bucket name
is the name of the bucket you provided use in the Data Cloud settings.
If you will access the data directly from the warehouse you most likely will need additional permissions. The following IAM Policy is the one recommended by Snowflake, which works then in both directions: Swarmia to upload and Snowflake to extract data from the same bucket.
You can load data into various data warehouse systems using their native S3 connection methods. Here are list of some commonly used systems and links to their documentation.
Example: Loading data into Snowflake
With the following SQL template can get you started with Swarmia Data Cloud and Snowflake. Use this template to create needed Snowflake stages, file formats, and data tables to start making analytics based on Swarmia data.
All export schemas are defined as follow
pull_request
field
type
required
description
id
UTF8
Y
UUID
github_number
INT64
Y
GitHub number, like 5532
author_id
UTF8
Y
Created by, UUID
title
UTF8
Y
Pull request title
description
UTF8
N
Pull request description
author_external_identifier
UTF8
Y
Created by nickname
github_repository_full_name
UTF8
Y
Repository's name
pr_status
UTF8
Y
One of the following: OPEN, CLOSED, MERGED
owner_team_ids
Array, UTF8
Y
Who owns the pull request
owner_team_names
Array, UTF8
Y
Team names of the
investment_category_ids
Array, UTF8
Y
In which investment categories pull request belongs to
investment_category_names
Array, UTF8
Y
Linked investment categories names
labels
Array, UTF8
Y
Attached labels to pull request
is_draft
BOOLEAN
Y
Is the pull request draft
is_excluded
BOOLEAN
Y
Is the pull request excluded from metrics
github_created_at
TIMESTAMP_MICROS
Y
Created at in GitHub
github_updated_at
TIMESTAMP_MICROS
Y
Updated at in GitHub
last_closed_or_merged_at
TIMESTAMP_MICROS
N
first_commit_at
TIMESTAMP_MICROS
N
Time of when first commit took place
last_commit_at
TIMESTAMP_MICROS
N
Time of when last commit took place
opened_at
TIMESTAMP_MICROS
N
When the pull request was opened
first_review_request_at
TIMESTAMP_MICROS
N
When the first review request was sent
first_reviewed_at
TIMESTAMP_MICROS
N
When somebody reviewed pull request first time
first_approved_review_at
TIMESTAMP_MICROS
N
When pull request was approved first time
cycle_time_started_at
TIMESTAMP_MICROS
Y
When cycle time started
cycle_time_ended_at
TIMESTAMP_MICROS
N
When cycle time ended
additions
INT64
N
How many lines were added
deletions
INT64
N
How many lines were deleted
cycle_time_seconds
INT64
N
How long was pull request cycle time
progress_time_seconds
INT64
N
How long pull request was in progress
review_time_seconds
INT64
N
How long pull request was in review
merge_time_seconds
INT64
N
How long it took to merge pull request
linked_swarmia_issue_id
UTF8
N
Linked swarmia issue
is_revert
Boolean
N
Did the pull request revert existing pull request
changed_files
Array, UTF8
Array of changed file names
pull_request_review_requests
field
type
required
description
id
UTF8
Y
UUID
pull_request_id
UTF8
Y
Linked pull request, UUID
requested_by_atuhor_id
UTF8
Y
Who requested review
requested_from_author_id
UTF8
N
From whom review was requested
requested_from_team_name
UTF8
N
Team whom review was requested
source_id
UTF8
Y
e.g GitHub's ID
requested_at
TIMESTAMP_MICROS
Y
When review was requested
pull_request_reviews
field
type
required
description
id
UTF8
Y
UUID
pull_request_id
UTF8
Y
Linked pull request, UUID
author_id
UTF8
Y
Reviewed by, UUID
author_external_identifier
UTF8
Y
Reviewed by's nickname
source_id
UTF8
Y
e.g GitHub's ID
state
UTF8
Y
One of the following : approved, changes_requested, commented
submitted_at
TIMESTAMP_MICROS
Y
When review was submitted
github_repository_full_name
UTF8
Y
Deprecated, repository name
source_repository_full_name
UTF8
Y
Repository name
pull_request_review_request_id
UTF8
N
Review request linked to review, UUID
pull_request_comments
field
type
required
description
id
UTF8
Y
UUID
pull_request_id
UTF8
Y
Linked pull request, UUID
source_id
UTF8
Y
e.g GitHub's ID
body_text
UTF8
Y
Comment content
created_at
TIMESTAMP_MICROS
Y
When the comment was posted
author_id
UTF8
Y
Who posted the comment, UUID
author_is_bot
BOOLEAN
Y
Is the author a bot
deployments
field
type
required
description
id
UTF8
Y
UUID
pull_request_ids
Array, UTF8
Y
Linked pull requests requests within deployment
deployment_environment
INT64
Y
E.g production
deployed_at
UTF8
Y
When deployment was done
fixes_deployment_id
UTF8
N
If deployment fixes previous deployment
fixes_version
UTF8
N
Version of the fixed deployment
deployment_version
UTF8
Y
deployment_app_name
UTF8
Y
involved_team_ids
Array, UTF8
Y
involved_team_names
Array, UTF8
Y
involved_author_ids
Array, UTF8
Y
time_to_recovery_seconds
INT64
N
change_lead_time_seconds
INT64
N
time_to_deploy_seconds
INT64
N
is_change_failure
BOOLEAN
Y
change_failure_source
UTF8
N
authors
field
type
required
description
id
UTF8
Y
UUID
name
UTF8
N
Author's name
associated_emails
Array, UTF8
Y
Author's all email addresses
DORA metrics: , ,
Navigate to
BigQuery, by using
AWS Redshift, by with
Snowflake, with
Make sure to set up the access from Snowflake to the S3 bucket by following .