# International Comparative Study Report on Open Source Software Publication Activities by Governments: Analysis Data

## Overview

This dataset contains the analysis data used in the report **“International Comparative Study Report on Open Source Software Publication Activities by Governments: Quantitative Analysis of Government Agency Repositories on GitHub”** published by the Information-technology Promotion Agency, Japan (IPA).

It compiles two sets of snapshot data collected at different points in time, focusing on repositories from official accounts of government agencies and related organizations published on GitHub. This data is organized to enable comparison of quantitative trends in open source software (OSS) publishing activities.


## Dataset Structure

| File Name                            | Content                                  | Data Acquisition Period                    | Primary Key                       | Relationship       |
| ------------------------------------ | ---------------------------------------- | ------------------------------------------ | --------------------------------- | ------------ |
| **organization_master.csv**          | Master information for target government agency GitHub organization accounts       | Fixed (as of 2025 survey)                     | `organization_id`                 | Base for each statistical data point    |
| **github_org_stats_firstcommit.csv** | Snapshot A: Repository statistics primarily acquired from late September to early October 2025   | 2025/9/30–2025/10/3, 2025/12/17–2025/12/18 | (`organization_id`, `repository`) | One-to-many relationship with organization master |
| **github_org_stats_pr.csv**          | Snapshot B: Repository statistics at the time of acquisition in September and December 2025 | 2025/9/16–2025/9/24, 2025/12/17–2025/12/18 | (`organization_id`, `repository`) | One-to-many relationship with organization master |

* Both datasets are independent snapshots (point-in-time data). Due to differences in the time periods captured, the number of repositories may not perfectly align.


## Data Item Definitions

### 1. organization_master.csv

| Column Name                 | Description                         | Type      | Example     |
| -------------------- | -------------------------- | ------ | ----- |
| organization_id      | Organization identifier (internally unique)                | int    | 12    |
| organization_account | GitHub organization account name             | string | govuk |
| repository_count     | Number of target repositories                   | int    | 53    |
| country_code         | Country code (ISO 3166-1 alpha-2 format)             | string | GB    |


### 2. github_org_stats_firstcommit.csv (Snapshot A)

| Column Name              | Content                       | Type                 | Example              |
| ----------------- | ------------------------ | ----------------- | -------------- |
| organization_id   | Organization ID (corresponds to master)            | int               | 12             |
| repository        | Repository name                   | string            | govuk-frontend |
| star              | Number of stars (at Snapshot A)        | int               | 1420           |
| fork              | Number of forks (at same time)               | int               | 205            |
| branch            | Number of branches (at same point)               | int               | 8              |
| people            | Number of contributors (at same point)         | int               | 24             |
| first_commit_date | First commit date and time                | string (ISO 8601) | 2015-09-01T09:30:00Z     |


### 3. github_org_stats_pr.csv (Snapshot B)

| Column Name  | Description            | Type  | Example     |
| -------------- | ------------------------ | ------ | ------------- |
| organization_id | Organization ID (corresponds to master)     | int    | 12             |
| repository      | Repository name            | string | govuk-frontend |
| star            | Number of stars (at Snapshot B)     | int    | 1456           |
| fork            | Number of forks (at same time)        | int    | 210            |
| branch          | Number of branches (at same point)        | int    | 8              |
| people          | Number of commit contributors (at same point)     | int    | 24             |
| issue           | Total number of issues (at same point)           | int    | 158            |
| pull_request    | Total number of pull requests (at same point)    | int    | 233            |
| contributor     | Contributors (at time) | int    | 45             |


## Data Creation Method

* **Target**: Official GitHub organization accounts of national governments and administrative agencies
* **Acquisition Method**: Collected repository metadata per organization using the GitHub API
* **Acquisition Dates**:
  * Snapshot A (`github_org_stats_firstcommit.csv`): September 30–October 3, 2025, and December 17–18, 2025
  * Snapshot B (`github_org_stats_pr.csv`): September 16–24, 2025, and December 17–18, 2025
* **Target Countries**: Japan, Estonia, Singapore, Germany, France, United States, United Kingdom
* **Purpose**: To organize government OSS publication activities in a format enabling temporal and international comparisons


## Primary Key and Relation Structure

| File                             | Primary Key                       | Foreign Key                             |
| -------------------------------- | --------------------------------- | --------------------------------------- |
| organization_master.csv          | `organization_id`                 | -                                       |
| github_org_stats_firstcommit.csv | (`organization_id`, `repository`) | `organization_id` → organization_master |
| github_org_stats_pr.csv          | (`organization_id`, `repository`) | `organization_id` → organization_master |


## Analysis Considerations

* Both snapshots were captured within different date ranges. Discrepancies may arise due to API response timing or repository updates.
* Deleted or private repositories are excluded.
* All dates and times are UTC.
* When conducting comparative analysis, it is recommended to **consider overlaps and differences in the acquisition periods**.


## Update History & Version Information

| Item    | Content                                                                                                                 |
| ------- | ----------------------------------------------------------------------------------------------------------------------- |
| Data Collection Period | - Snapshot A (firstcommit.csv): 2025/9/30–10/3, 2025/12/17–12/18<br>- Snapshot B (pr.csv): 2025/9/16–9/24, 2025/12/17–12/18 |
| Target Countries     | Japan, Estonia, Singapore, Germany, France, United States, United Kingdom                                                     |
| Data Release Date  | January 28, 2026                                                                                                                |
| Update Schedule    | To be determined                                                                                                         |


## License

This dataset is provided under the [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/deed.ja).
It may be freely used for commercial or non-commercial purposes provided the source is clearly attributed.

Citation Example:

> Information-technology Promotion Agency, Japan (IPA) “International Comparative Study Report on Open Source Software Publication Activities by Governments: Quantitative Analysis of Government Agency Repositories on GitHub” Analysis Data (2026)


## Disclaimer

This organization makes no warranties whatsoever regarding the usefulness, accuracy, non-infringement of intellectual property rights, or any other aspect of the content of this dataset.


## Contact

Information-technology Promotion Agency, Japan (IPA)

Digital Infrastructure Center

Email: disc-info@ipa.go.jp
