Skip to main content

Understanding Amazon S3 Files: Bridging Object and File Storage

· 10 min read
Nando Lutgerink
AWS Authorized Instructor, AI/ML Advisor & Entertrainer

Amazon S3 Files represents a groundbreaking advancement in cloud storage technology, seamlessly integrating object storage with file system capabilities. This innovation addresses a critical challenge for many organizations: efficiently accessing and managing vast datasets stored in Amazon S3 while leveraging familiar file system interfaces and tools.

Introduction to Amazon S3 Files

Amazon S3, originally designed as an object storage service, has long been the cornerstone of data storage solutions for businesses of all sizes. However, many applications and workflows are built around the concept of a file system—a structure where data is organized into files and folders that can be easily navigated and manipulated. Amazon S3 Files bridges this gap, allowing users to interact with their S3 data as if it were a traditional file system.

With S3 Files, organizations can:

  • Access S3 data directly as files: Eliminate the need to copy data between object storage and file systems.
  • Leverage existing tools: Use familiar file system tools and applications without modification.
  • Achieve high performance: Enjoy low-latency access and high throughput for demanding workloads.
  • Maintain scalability and durability: Benefit from S3's inherent scalability and durability without compromise.

Key Benefits of S3 Files

1. Turn Your S3 Bucket into a Shared File System

S3 Files transforms an S3 bucket—a fundamental storage unit in Amazon S3—into a shared file system accessible by multiple compute resources. This eliminates data silos and enables collaborative workflows across teams and applications.

2. Run High-Performance File Workloads with Your S3 Data

S3 Files is designed to handle demanding file-based workloads, such as processing large datasets, running machine learning training, and executing data-intensive applications. It delivers consistent, high-performance access to your data, ensuring optimal performance for critical tasks.

3. Store Your Data Once and Access It Everywhere

One of the most significant advantages of S3 Files is its ability to provide universal access to your data. Whether your compute resources are running on Amazon EC2 instances, in containers, or as serverless functions, they can all access the same S3 data through a unified file system interface. This eliminates the need for data duplication and simplifies data management.

4. Scale Elastically and Pay Only for What You Use

S3 Files inherits the scalability and cost-effectiveness of Amazon S3. As your workload demands increase, S3 Files automatically scales to meet those demands. You only pay for the storage and compute resources you actually use, ensuring optimal cost efficiency.

Performance Highlights

  • Up to 90% Lower Costs: By eliminating the need for data duplication and synchronization between object storage and separate file systems, S3 Files can significantly reduce storage costs.
  • 10M+ File System IOPS per Bucket: S3 Files supports extremely high levels of input/output operations per second (IOPS), enabling responsive performance for even the most demanding applications.
  • 4TB/s+ Aggregate Read Throughput: S3 Files can deliver multi-terabytes per second of read throughput, making it suitable for large-scale data processing and analytics.
  • 25K Concurrent Compute Resources: Multiple compute resources can simultaneously access the same S3 file system, facilitating collaborative workflows and parallel processing.

How S3 Files Works

S3 Files functions similarly to a traditional high-performance file system but is built on top of Amazon S3. Here's a simplified explanation of how it works:

  1. Mapping S3 to File System: S3 Files creates a mapping between files and folders in the file system and objects in your S3 bucket. This means that when you access a file through the file system, S3 Files translates that request into an operation on the corresponding S3 object.

  2. Caching for Performance: To achieve high performance, S3 Files uses caching mechanisms. Frequently accessed files and metadata are cached in high-performance storage, reducing latency and improving responsiveness. Less frequently accessed data is retrieved directly from S3, ensuring efficient use of resources.

  3. Lazy Loading: When you read files, S3 Files employs a "lazy loading" approach. Only the portions of file metadata and content that are actively needed are loaded into high-performance storage. This minimizes resource usage and maximizes efficiency.

  4. Direct Writes to S3: When you write data, S3 Files ensures that your changes are synchronously written to both the high-performance cache and your S3 bucket. This guarantees data consistency and durability.

  5. Automatic Cleanup: S3 Files automatically manages its cache. Data that hasn't been accessed within a configurable time window (ranging from 1 to 365 days, with a default of 30 days) is automatically removed from the cache. This ensures that you only pay for the storage you actively use, while your primary data remains safely stored in S3.

Features and Capabilities

S3 Files offers a rich set of features designed to meet the diverse needs of modern data workloads:

  • Native File Access to Entire S3 Dataset or Prefixes: You can access your entire S3 dataset or specific prefixes (folders) as a file system, providing flexibility in how you organize and access your data.
  • Built for Collaboration Across Compute Resources: S3 Files supports collaboration by enabling multiple compute resources to access and work with the same data simultaneously. This is particularly valuable for team-based projects and distributed computing workloads.
  • Cost-Optimized Performance: S3 Files delivers high performance while optimizing costs through intelligent caching and direct access to S3. You benefit from the scalability and cost-effectiveness of S3 without sacrificing performance.
  • Object and File Storage Without Compromise: S3 Files seamlessly integrates object storage and file system capabilities, eliminating the trade-offs typically associated with choosing between these two storage models.

Applications Across Industries

S3 Files has broad applicability across various industries, enabling organizations to unlock the full potential of their S3 data:

Life Sciences and Healthcare

  • Processing Sequencing Data: Life sciences organizations can process petabytes of sequencing data stored in S3 using bioinformatics pipelines that require file system operations.
  • Storing Medical Images: Healthcare providers can store medical images in S3 and provide file system access for diagnostic workstations and PACS systems.
  • Collaborative Research: Research organizations can enable collaborative analysis of clinical trial data while maintaining S3 as the single source of truth, ensuring complete audit trails and regulatory compliance.

Financial Services

  • Real-Time Data Analysis: Financial institutions can analyze vast datasets in real-time, combining historical market data with live feeds to develop and refine trading algorithms.
  • High-Performance Computing: Quantitative investment managers can run complex simulations and backtests directly on S3 data, accelerating research cycles and improving decision-making.

Media and Entertainment

  • Content Processing: Media companies can process large media files, such as videos and images, directly from S3 using file-based workflows.
  • Collaborative Workflows: Teams can collaborate on media projects by accessing shared datasets stored in S3, streamlining production pipelines and reducing bottlenecks.

Manufacturing and Engineering

  • Simulation and Modeling: Engineering teams can run simulations and modeling workloads directly on S3 data, enabling faster iteration and more efficient product development.
  • Collaboration Across Teams: Distributed teams can collaborate on complex projects by accessing shared datasets stored in S3, facilitating seamless communication and data exchange.

Real-World Use Cases and Customer Success Stories

Bayer: Streamlining Research and Development

Bayer, a global leader in healthcare and nutrition, leverages S3 Files to simplify its research workflows. Previously, accessing large datasets stored in S3 required downloading entire datasets to local storage, creating delays and increasing infrastructure overhead. With S3 Files, Bayer's data scientists can mount S3 data directly as a file system, enabling them to open datasets, run analyses, and collaborate on shared results without waiting for downloads or managing data movement. This shift has streamlined R&D operations and reduced infrastructure overhead, allowing Bayer's teams to focus on research and innovation.

Deloitte: Enabling Agentic Architectures

Deloitte, a global leader in professional services, utilizes S3 Files to build AI agent solutions for clients. S3 Files provides a shared workspace where agents can retain context, exchange intermediate results, and organize across complex analytics pipelines, all using standard file operations. This capability simplifies multi-agent design, enhances automation, and accelerates delivery, enabling Deloitte to deliver advanced AI solutions to its clients.

Cloudsmith: Securing the Software Supply Chain

Cloudsmith, a company that secures the AI-powered software supply chain, relies on S3 and file systems to handle complex package processing and synchronization workflows. S3 Files eliminates the need for custom synchronization logic, simplifying Cloudsmith's architecture, reducing package time-to-availability, and preserving compatibility with existing file-based tooling and artifact processing workflows. This enables Cloudsmith to secure its customers' software supply chains at scale.

Presidio: Advancing Migration and Modernization

Presidio, an AWS Premier Consulting Partner, views S3 Files as a key enabler for migration and modernization projects. S3 Files allows customers to access S3 data through familiar file interfaces while preserving S3's durability and scalability. This provides a flexible path to modernizing legacy applications without disruptive rewrites or redundant storage, enabling organizations to advance generative AI, advanced analytics, and agentic use cases.

Snorkel AI: Building Frontier Datasets

Snorkel AI, a frontier AI data lab, utilizes S3 Files to build datasets for high-performing AI systems. S3 Files enables Snorkel AI's teams, agents, and evaluators to access data in S3 through a file interface without copying data between systems or waiting for sync jobs. This is particularly valuable for reinforcement learning and agentic workloads, where many environments and simulations run in parallel, supporting shared workspaces and executing long-horizon task sequences at scale.

Torc Robotics: Accelerating Autonomous Vehicle Development

Torc Robotics, a leader in autonomous vehicle software, leverages S3 Files to enable its data scientists to access machine learning training data directly in S3 using standard file operations. This simplifies data access and processing, reducing the time and complexity in building Torc's Physical AI pipeline. S3 Files enables Torc to deliver its AV 3.0 TorcDrive verifiable AI stack, data loop, and world simulation for training and validation on the path to safe, scalable autonomous trucking.

Qube Research & Technologies (QRT): Enhancing Quantitative Research

QRT, a global quantitative and systematic investment manager, relies on S3 Files to accelerate its research cycle. QRT's researchers can now work with data directly in S3 from their workstations, running backtests, analyzing results, and iterating on strategies across larger datasets without copying data or managing ETL pipelines. This collapses a multi-step workflow into a seamless research loop, resulting in faster time-to-insight, fewer redundant storage costs, and more engineering effort focused on building better systematic strategies.

Summary and Reflection

Amazon S3 Files represents a significant leap forward in cloud storage technology. By bridging the gap between object storage and file systems, S3 Files empowers organizations to unlock the full potential of their S3 data. Its ability to provide high-performance, low-latency access to data while maintaining the scalability, durability, and cost-effectiveness of S3 makes it an invaluable tool for a wide range of applications and industries.

As you explore the possibilities of S3 Files, consider how it can streamline your data workflows, enhance collaboration, and accelerate your innovation. Whether you are processing large datasets, running machine learning training, or building AI agents, S3 Files offers a powerful solution to help you achieve your goals efficiently and effectively.