Data cloud
Last updated
Was this helpful?
Last updated
Was this helpful?
Swarmia will send your data in the form of a Parquet file to your storage bucket, starting with Amazon S3. This data can then be pulled to your choice of data warehouse such as Snowflake. The full dataset will be refreshed every 8 hours to ensure there is recent enough data when the data is pulled.
The data exported currently is information about pull requests and deployments. The export contains multiple files
pull_requests.parquet
deployments.parquet
authors.parquet
pull_request_reviews.parquet
pull_request_review_requests.parquet
pull_request_comments.parquet
The pull request data contains:
Author information
Contributing teams
Investment categories
: cycle time, progress time, time in review, and time in merge
Timestamps of pull request events: first review time, first and last commit time, etc...
Pull request comments by authors, review requests, when pull request was reviewed and by whom etc.
The deployments data contains:
List of deployed PRs in the deployment
Deployment information: version, environment, application
You can read more of these metrics in Metrics & definitions.
The setup requires you to have an Amazon S3 storage bucket to which Swarmia can push data.
Insert your S3 bucket credentials
Save the settings. After this, you can test the integration from the "Export" button in the UI.
Swarmia requires the following permissions on an S3 bucket and folder to be able to upload the exports:
s3:GetBucketLocation
s3:PutObject
An example of AWS IAM Policy for the export bucket, where bucket name
is the name of the bucket you provided use in the Data Cloud settings.
If you will access the data directly from the warehouse you most likely will need additional permissions. The following IAM Policy is the one recommended by Snowflake, which works then in both directions: Swarmia to upload and Snowflake to extract data from the same bucket.
You can load data into various data warehouse systems using their native S3 connection methods. Here are list of some commonly used systems and links to their documentation.
Example: Loading data into Snowflake
With the following SQL template can get you started with Swarmia Data Cloud and Snowflake. Use this template to create needed Snowflake stages, file formats, and data tables to start making analytics based on Swarmia data.
All export schemas are defined as follow
pull_request
field
type
required
description
id
UTF8
Y
UUID
github_number
INT64
Y
GitHub number, like 5532
author_id
UTF8
Y
Created by, UUID
title
UTF8
Y
Pull request title
description
UTF8
N
Pull request description
author_external_identifier
UTF8
Y
Created by nickname
github_repository_full_name
UTF8
Y
Repository's name
pr_status
UTF8
Y
One of the following: OPEN, CLOSED, MERGED
owner_team_ids
Array, UTF8
Y
Who owns the pull request
owner_team_names
Array, UTF8
Y
Team names of the
investment_category_ids
Array, UTF8
Y
In which investment categories pull request belongs to
investment_category_names
Array, UTF8
Y
Linked investment categories names
labels
Array, UTF8
Y
Attached labels to pull request
is_draft
BOOLEAN
Y
Whether the PR is currently marked as draft in GitHub. If the PR was originally opened as draft, but has since been marked ready for review, this will be false
.
is_excluded
BOOLEAN
Y
Whether the PR is currently excluded from metrics and alerts in Swarmia. This includes both rule-based exclusions and manual exclusions.
github_created_at
TIMESTAMP_MICROS
Y
When the PR was created on GitHub. This is unaffected by whether the PR was opened as draft or not.
github_updated_at
TIMESTAMP_MICROS
Y
Last time the PR data was updated in GitHub. This includes pushing code, reviews, comments, etc.
last_closed_or_merged_at
TIMESTAMP_MICROS
N
Most recent ClosedEvent
for the PR, which covers both regular closing and merge actions. null
if the PR hasn’t been closed or merged yet. If a closed PR is reopened, this goes back to null
.
first_commit_at
TIMESTAMP_MICROS
N
The authored date (falling back to committed date) of the first commit associated with the PR.
last_commit_at
TIMESTAMP_MICROS
N
The authored date (falling back to committed date) of the last commit associated with the PR.
opened_at
TIMESTAMP_MICROS
N
When the PR was created on GitHub. This is unaffected by whether the PR was opened as draft or not.
first_review_request_at
TIMESTAMP_MICROS
N
Time of the first ReviewRequestedEvent
for the PR.
first_reviewed_at
TIMESTAMP_MICROS
N
Time of the first PullRequestReview
for the PR. This excludes reviews by bots, and comments by the author themselves.
first_approved_review_at
TIMESTAMP_MICROS
N
Time of the first PullRequestReview
for the PR, with state approved
. This excludes reviews by bots.
last_approved_review_at
TIMESTAMP_MICROS
N
Time of the last PullRequestReview
for the PR, with state approved
.
cycle_time_started_at
TIMESTAMP_MICROS
Y
Time of the first commit associated with the PR, or time when the PR was created on GitHub, whichever came earlier.
cycle_time_ended_at
TIMESTAMP_MICROS
N
Time when the PR was merged or closed. null
if the PR is still open.
additions
INT64
N
How many lines were added
deletions
INT64
N
How many lines were deleted
cycle_time_seconds
INT64
N
Amount of seconds between start and end of cycle time. If the PR is still open, the current time is considered the end.
progress_time_seconds
INT64
N
Amount of seconds between start and end of in progress time. Start is the same as cycle time start time. End is defined as, in order of preference:
Start of review time
Start of merge time
Cycle time end time
Current time
review_time_seconds
INT64
N
Amount of seconds between start and end of review time. Start is defined as:
For PRs without a ReadyForReviewEvent
or ReviewRequestedEvent
, and which don’t have approving reviews, review time is not defined
Otherwise, start is, in order of preference:
Time of first ReviewRequestedEvent
Time of latest ReadyForReviewEvent
Time when PR was opened
End is defined as, in order of preference:
Start of merge time
End of cycle time
Current time
merge_time_seconds
INT64
N
Amount of seconds between start and end of merge time. Start is defined by, in order of precedence:
For merged and approved PRs, the time of the last approval
For merged PRs with review requested, but never given, merge time is not defined
For merged PRs without reviews requested, time of last commit, or PR open time, whichever comes last End is the same as the end of cycle time. If the PR is still open, the current time is considered the end.
linked_swarmia_issue_id
UTF8
N
Linked swarmia issue
is_revert
Boolean
N
Did the pull request revert existing pull request
changed_files
Array, UTF8
Array of changed file names
pull_request_review_requests
field
type
required
description
id
UTF8
Y
UUID
pull_request_id
UTF8
Y
Linked pull request, UUID
requested_by_atuhor_id
UTF8
Y
Who requested review
requested_from_author_id
UTF8
N
From whom review was requested
requested_from_team_name
UTF8
N
Team whom review was requested
source_id
UTF8
Y
e.g GitHub's ID
requested_at
TIMESTAMP_MICROS
Y
When review was requested
pull_request_reviews
field
type
required
description
id
UTF8
Y
UUID
pull_request_id
UTF8
Y
Linked pull request, UUID
author_id
UTF8
Y
Reviewed by, UUID
author_external_identifier
UTF8
Y
Reviewed by's nickname
source_id
UTF8
Y
e.g GitHub's ID
state
UTF8
Y
One of the following : approved, changes_requested, commented
submitted_at
TIMESTAMP_MICROS
Y
When review was submitted
github_repository_full_name
UTF8
Y
Deprecated, repository name
source_repository_full_name
UTF8
Y
Repository name
pull_request_review_request_id
UTF8
N
Review request linked to review, UUID
pull_request_comments
field
type
required
description
id
UTF8
Y
UUID
pull_request_id
UTF8
Y
Linked pull request, UUID
source_id
UTF8
Y
e.g GitHub's ID
body_text
UTF8
Y
Comment content
created_at
TIMESTAMP_MICROS
Y
When the comment was posted
author_id
UTF8
Y
Who posted the comment, UUID
author_is_bot
BOOLEAN
Y
Is the author a bot
deployments
field
type
required
description
id
UTF8
Y
UUID
pull_request_ids
Array, UTF8
Y
Linked pull requests requests within deployment
deployment_environment
INT64
Y
E.g production
deployed_at
UTF8
Y
When deployment was done
fixes_deployment_id
UTF8
N
If deployment fixes previous deployment
fixes_version
UTF8
N
Version of the fixed deployment
deployment_version
UTF8
Y
deployment_app_name
UTF8
Y
involved_team_ids
Array, UTF8
Y
involved_team_names
Array, UTF8
Y
involved_author_ids
Array, UTF8
Y
time_to_recovery_seconds
INT64
N
change_lead_time_seconds
INT64
N
time_to_deploy_seconds
INT64
N
is_change_failure
BOOLEAN
Y
change_failure_source
UTF8
N
authors
field
type
required
description
id
UTF8
Y
UUID
name
UTF8
N
Author's name
associated_emails
Array, UTF8
Y
Author's all email addresses
DORA metrics: , ,
Navigate to
BigQuery, by using
AWS Redshift, by with
Snowflake, with
Make sure to set up the access from Snowflake to the S3 bucket by following .