Storage options
There are many different storage options available for storing research data. This section describes some of the options available at the University of Washington, as well as some other options that may be useful.
Options comparison
The following table compares the different options for storing research data:
Feature | UW Psych Dept | U Drive | UW LOLO | UW SFS | UW OneDrive | UW Google Drive | RedCap | ResearchWorks | Amazon S3 | Azure Blobs | Dropbox | OSF | GitHub |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Suitability | General use at UW Psych | Interactive use at UW | Long-term storage, archival | Interactive use and collaboration at UW | Interactive use and collaboration at UW | Interactive use and collaboration at UW | Research data | Research data publication at UW | Programming, sharing, archival | Programming, sharing, archival | Interactive use and collaboration | Research data, collaboration, publishing | Code, software containers, collaboration |
UW support | đź’˛discount 5% | đź’˛discount ~10% | |||||||||||
Storage cost | Free up to 2 TiB; 10 TiB free on request; more available for purchase (no recurring costs) | Free up to 50 GiB, then $123/TiB/month ($0.12/GiB) | $3.45/TiB/month | $256/TiB/month | Free, 5 TiB limit | Free, 100 GiB limit | ~$25/month for 1 TiB hot data, ~$5/month for cold (S3 Glacier Instant Retrival) | ~$25/month for 1 TiB hot data, ~$5/month for cold; ~$30/TiB for cold egress | Subscription ($12/month for 2 TiB) | Free | Free for most uses | ||
Off-campus availability | VPN | VPN | VPN | VPN | |||||||||
Automatic backups | Limited | Limited | |||||||||||
Versioning | |||||||||||||
Platforms | |||||||||||||
Encryption in transit | |||||||||||||
Encryption at rest | |||||||||||||
FERPA | |||||||||||||
HIPAA | On request | ||||||||||||
Local filesystem | |||||||||||||
Sharing | UW only | UW only | UW only | UW only | |||||||||
Available on Klone | w/rclone | w/rclone | w/rclone | w/rclone | w/rclone | w/rclone | w/rclone |
UW-supported options
UW provides several different storage options for faculty, staff, and students. Additionally, there are storage options that are not directly supported by UW but are available to UW users through UW’s enterprise agreements with the service providers, sometimes at a discount.
The following sections describe some of the most common options. UW IT maintains a page that provides an overview of UW’s online storage options, as well as a comparison of file service options. The UW IT Service Catalog has a more complete list of UW IT services.
The UW Libraries Research Data Services also maintains a page with information about storage options.
Departmental storage
The Department of Psychology has a server that can be used for storing research data. This server is located in Guthrie Hall and is managed by the department’s IT staff. The server is not intended for sharing data with collaborators outside of the University of Washington. Contact the department’s IT staff for more information.
Features and options
- 2 TiB of storage (up to 10 TiB available on request; for more, contact mailto:dougkalk@uw.edu) per lab
- 1 Gigabit Ethernet connection to campus network
- Accessible on Klone via rclone’s
smb
backend - Accessible off-campus via VPN
- Weekly backups (twice a week on request) to a backup server
- Monthly backups to network-attached storage (NAS)
- Storage is not encrypted
- Shared access typically by sharing a single username and password per lab
- Alternative configurations are possible on request (e.g., read-only accounts, restricted access to specific directories)
- Mountable as a drive on Windows, Mac, and Linux systems
The department’s IT staff can provide more information about the server’s configuration and options. Contact mailto:dougkalk@uw.edu for more information.
Transfer speed to the server is limited by the speed of the network connection and its disk drives. The server is connected to the campus network via a 1 Gigabit Ethernet connection. The maximum write speed is around 70-100 MiB/s.
Pricing
Free for departmental labs up to 10 TiB. More may be available on request.
Eligibility
- Department faculty and staff
UW LOLO
The UW LOLO Data Archive provides long-term tape-backed archival storage for users at the university. It is only intended for use by UW faculty, staff, and affiliated organizations. It is suitable for data that is not actively being worked on. The Hyak documentation has additional information about the LOLO Data Archive.
Features and options
- All files immediately and directly accessible via SSH protocol
- Support for up to 1,000 files per TB of data stored
- Fast uploads and downloads for large (>=100GB) files
- 10Gbs network connection to campus, the internet, and Hyak
- Two copies of all files are preserved, each in a separate data center, each in a different seismic zone
Pricing
The LOLO Data Archive is priced at $3.45/TiB/month, with a minimum purchase of 1 TiB. The LOLO Data Archive is billed to a UW Workday worktag.
Eligibility
- UW faculty, staff, and affiliated organizations
- UW Workday worktag required
UW U Drive
The UW U Drive provides a network file system that can be mounted as a drive on Windows, Mac, and Linux 0systems on campus or by VPN.
Features and options
- CIFS/SMB access from UW campus subnets. (Can also be accessed via VPN Service, which provides remote systems with a UW campus IP address.)
SFTP access to U Drive will no longer be available after March 20, 2024.
Pricing
Free up to 50 GiB. $0.12/GiB/month for additional storage.
Eligibility
UW faculty; UW staff; UW students; UW students in residence halls; UW researchers
UW OneDrive for Business
UW provides access to the Microsoft OneDrive for Business cloud file syncing service as part of its Office 365 subscription.
Features and options
- 5 TiB of storage
- 250 GiB maximum file size
- HIPAA and FERPA compliance
- Sharing with UW users
- Desktop, browser, and mobile access
- SharePoint Online integration
- Accessible on Klone via rclone
OneDrive has technical limitations that may cause problems when using it with some applications (e.g., FreeSurfer). Caution is advised when working with files stored in OneDrive. The support page has more information about OneDrive’s limitations.
Pricing
Free for eligible UW users.
Eligibility
UW faculty; UW staff; UW students; UW researchers; UW clinicians; For Shared UW NetIDs and Sponsored & affiliate UW NetIDs, accounts can be provisioned by UW employees
UW Google Drive
UW Google Drive provides Google Drive file storage and sync for users at UW (this is not the same as the Google Drive service provided by Google).
Features and options
UW Google Drive is not HIPAA compliant.
ResearchWorks Archive
ResearchWorks Archive is the University of Washington’s digital repository (also known as “institutional repository”) for disseminating scholarly work. More information about ResearchWorks can be found on the Scholarly Publishing Services page.
Link: ResearchWorks Archive
RedCap
RedCap is a web-based application for building and managing online surveys and databases. According to the RedCap website:
Research Electronic Data Capture (REDCap) is a rapidly evolving web tool developed by researchers for researchers in the translational domain.
REDCap features a high degree of customizability for your forms and advanced user right control. It also features free, unlimited survey functionality, a sophisticated export module with support for all the popular statistical programs, and supports HIPAA compliance.
Cloud storage
Amazon Web Services
Amazon Web Services (AWS) is a cloud computing platform that provides a number of different storage services, including Amazon S3, Amazon EBS, Amazon EFS, and Amazon Glacier. Amazon S3 is comparable to Azure Blobs. Amazon EBS is comparable to Azure Files. Amazon EFS is comparable to Azure Files. Amazon Glacier is comparable to Azure Archive Storage.
Features and options
- Data Egress Waiver, which effectively eliminates the standard charges for moving data out of the AWS Service.
- HIPAA Eligible Account, requires special request and approval, and is subject to important operational considerations to meet compliance requirements of HIPAA, the UW BAA, and UW Medicine Compliance Policies.
- Accessible on Klone via rclone
Pricing
The approximate cost of storing 1 TiB of data per month is: - ~$25 for “hot” data accessed frequently (S3 Standard) - ~$5/month for “cold” data accessed infrequently (S3 Glacier Instant Retrieval)
The Data Egress Waiver eliminates the standard charges for moving data out of the AWS Service.
UW IT has a subscription service for AWS that provides a discount of 5% and a waiver for data egress charges. The subscription is covered by UW’s HIPPAA BAA and other enterprise contracts. A UW Workday worktag is required to use the service. For more information, see the UW IT AWS page.
For detailed pricing information, see the AWS pricing calculator.
Eligibility
UW faculty; UW staff; UW researchers; UW clinicians; UW academic units; UW administrative units; UW affiliated organizations; UW Medical Center; Harborview Medical Center; Any group with an approved UW blanket PO. NOTE: UW students are not eligible for this service, except when working within the scope of UW employment, e.g., as an RA, TA or GSA. Known Prerequisites: This service requires a valid UW budget number.
Azure
Microsoft’s Azure cloud platform includes a number of storage options, including Azure Blobs and Azure Files.
Azure Blobs is comparable to Amazon AWS S3. It is designed to store large amounts of unstructured data but is not designed to be used as a file system.
Features and options
Pricing
The approximate cost of storing 1 TiB of data per month is: - ~$25 for “hot” data accessed frequently (Azure Blobs Hot Tier) - ~$5/month for “cold” data accessed infrequently (Azure Blobs Cold Tier)
UW IT has a subscription service for Azure that provides a discount of roughly 10% on Azure usage charges and allows charges to be paid with a UW Workday worktag. A UW Workday worktag is required to use the service. For more information, see the UW IT Azure Subscription Service.
For detailed pricing information, see the Azure pricing calculator.
Eligibility
UW faculty; UW staff; UW researchers; UW clinicians; UW academic units; UW administrative units; UW affiliated organizations; UW Medical Center; Harborview Medical Center; Any group with an approved UW blanket PO. NOTE: UW students are not eligible for this service, except when working within the scope of UW employment, e.g., as an RA, TA or GSA. Known Prerequisites: This service requires a valid UW budget number.
UW-IT’s Azure Subscription service allows UW units that wish to create and manage their own Azure subscription to receive a discount (~10%) on Azure usage charges and to pay those with a UW Workday worktag.
Other options
Google Cloud Storage
Google Cloud Storage is a cloud object storage service that is part of the Google Cloud Platform. It resembles the offerings by AWS and Azure. It is distinct from UW Google Drive and Google Drive. Google Cloud Storage can be accessed on Klone via rclone.
Google Cloud also offers block storage, file storage, and archival storage services. See here for more information.
Open Science Framework
Open Science Framework (OSF) is a web-based platform for managing research projects. OSF can be used to store research data and their metadata. It is best suited for collaborating with other researchers and sharing research data for papers. Storage addons are available to connect Amazon S3, Dropbox, GitHub, OneDrive, and GitHub to OSF.
Dropbox
Dropbox is a subscription-based cloud file sync service. The University of Washington does not support Dropbox. Dropbox is not suited for storing sensitive data. However, Dropbox can be useful for storing non-sensitive data that is actively being worked on, and it provides convenient versioning and recovery options. Dropbox can be accessed on Klone via rclone.
GitHub
GitHub is a web-based hosting service for version control using Git. Files up to 100 MiB each can be stored in a GitHub repository. The total size of a repository should be under 5 GiB.
GitHub repositories can be made publicly available or can be made private. Private repositories are only accessible to collaborators who have been granted access to the repository. Private repositories are only available to collaborators who have GitHub accounts. GitHub repositories can be accessed from the GitHub website or the GitHub Desktop application.
GitHub Releases
GitHub Releases can be used to store files up to 2 GiB in size each.
GitHub Packages
GitHub Packages can be used to create container that can be uploaded to the GitHub Container Registry. For public repositories, there are no charges nor storage limits. For private repositories, storage limits and charges apply. A GitHub account is required to access GitHub Packages.
Although GitHub Packages is not designed to store research data, it can be used to store research data in the form of a container containing an archive of research data (similar to a ZIP file). The data cannot be viewed or modified while stored in GitHub Packages, but can be downloaded and extracted from the container. Any changes to the data require creating a new image and uploading it to GitHub Packages.
Public packages can be accessed by anyone and should not be used for sensitive data. However, it may be possible to encrypt data before storing it in the image or to deploy an encrypted container.
Large File Storage
GitHub has a Large File Storage (LFS) extension that can be used to store large files in a GitHub repository.
According to GitHub’s documentation, the file size limits for GitHub repositories are:
Plan | Maximum file size |
---|---|
GitHub Free | 2 GiB |
GitHub Pro | 2 GiB |
GitHub Team | 4 GiB |
GitHub Enterprise Cloud | 5 GiB |