From the course: AWS Certified Developer - Associate (DVA-C02) Cert Prep

Amazon S3 overview

- [Speaker] Amazon S3 is an object storage service in AWS. S3 is actually an acronym for simple storage Service. It is a highly durable, highly available, and highly scalable service with many available features that you can use. This storage type is perfect for storing static data that does not change frequently. It makes your data publicly available via the internet, which can be accessed anywhere and not just within your virtual private cloud. In Amazon S3, you will mainly work with objects and buckets. An object is just a regular file with corresponding metadata, while the bucket is simply a resource that acts as a container for your objects. The metadata of the object is basically a set of name value pairs that describes what the file is. Once the object has been uploaded to the bucket, its metadata will be permanent and cannot be modified. This object can be downloaded by multiple EC2 instances or by million of users world wide. You can store virtually unlimited amounts of files to your bucket, such as text files, office documents, videos, database backups, snapshots, and many other data types. You can upload different objects to an S3 bucket that you own or are owned by others. There's also a certain convention on how you can name your S3 bucket. An Amazon S3 bucket name is globally unique, and its namespace is shared by each and every AWS account around the world. For example, if you already created an S3 bucket named Tutorials Dojo, no other person on the planet can create a bucket with that same name. So if another person tries to create a new bucket called tutorials dojo then that request will fail. You can organize your objects by piecing them inside a folder. However, this folder in S3 is different from a traditional file system storage. Amazon S3 has a flat structure, which is quite different from a hierarchical structure that you would see in a file system like in Amazon EFS. This folder is technically a prefix that is shared by your objects. If an object has a trailing forward slash in its key name, then it is considered as a folder in S3. Again, this is done just for the sake of organizational simplicity. The folder concept in S3 is only meant for grouping objects and not for implementing file hierarchy. For example, you can create a folder named tutorialsdojo and store a new object called aws.jpeg in it. The result will be an object with a key name of tutorialsdojo/aws.jpeg, where tutorialsdojo/ is the prefix. Take note that Amazon S3 does not support portable operating system interface, or plusix. This means that it doesn't provide concurrent access to the same file or directory from thousands of complete instances. It's missing certain capabilities such as file system access semantics, or file logging. If you need this functionality, you have to use the Amazon EFS or Amazon FSX for luster instead. Your S3 bucket is original resource that uses all available data centers of the AWS region that you choose. The bucket is not hosted on your Amazon VPC or in a single availability zone. By default, Amazon S3 automatically replicates your object to all availability zones of an AWS region to ensure high data durability and availability. So if one region has 100 data centers, then your files are also replicated over 100 times. That level of replication provides high data durability for your files against unwanted data loss. This is the underlying physical architecture of how Amazon S3 can provide data durability and high availability. It is designed to provide 99.99% availability over a given year and 11 nines of durability. Amazon S3 provides 99.999999999%, or 11 nines of durability. Okay, I know I sound funny saying a lot of nines here, but what this percentage basically refers to is the probability of data loss in a given year. In data storage terms, durability is a probability that an object remains intact and accessible after a period of one year or so. This is different from the concept of data availability, which refers to the accessibility of your data at any given point in time. A 100% durability means that there is a zero probability for the object to be lost in a given year. So if you have 100 objects, none of them will be lost. If it has 99% durability, then it will mean there's a 1% chance of data loss, which affects one out of 100 objects. Conversely, if a storage system has 99.99% durability, then you only have a 0.01% data loss probability. To put it in perspective, if you store 10,000 objects in a bucket, then the possibility of losing a single object will only happen every 10 million years or so. I know that it sounds exaggerated, but that's really how durable your data will be in Amazon S3. There are different storage classes that you can choose from in Amazon S3 to place your objects. You can use the S3 standard storage class to store your Frequently Access data. The S3 intelligent Tier class for storing data with changing or unknown access patterns. S3 standard infrequent access and S31 zone infrequent access for storing long lived yet less frequently accessed data, and for low cost long-term storage and data archiving, you can use the Amazon SD Glacier or Amazon SD Glacier Deep Archive. It also has a feature called Lifecycle Policy that automatically transitions or moves your data from one storage class to another. Amazon S3 has a website hosting feature that you can use. You can host a static website by uploading HTML pages, downloadable packages, images, media files, or other client side scripts to your bucket. This is perfect if you want a cost-effective and serverless solution to launch your website. However, you cannot run service side scripts in Amazon S3, such as PHP, JSP or ASP.net, since it does not support service side scripting. What's unique about this storage is its way of accessing its data. Unlike block or fast storage systems, the objects in Amazon S3 are retrieved from your bucket via a rest API call. This is somehow similar to the process of downloading a file from an FTP server or a content repository. You do not have to attach any S3 storage device to EC2 instance. This is because Amazon S3 is actually a web service that is not confined to a single storage device. It does not reside in your Amazon VPC or in any availability zone, which is quite different from how an EBS volume or an EFS file system works. Amazon S3 is original resource that runs on the AWS Cloud network. By default, the traffic between your EC2 instance and an S3 bucket passes through the public internet and not just within your VPC. Amazon S3 has a different storage architecture than block storage and file storage types. It is a unique data storage system that manages your files or data as objects in the flat structure. Every object usually includes a globally unique identifier, the actual data and its custom metadata. Your S3 storage can be scaled and replicated more extensively because it is not restricted within a radical file structure. In addition, it is not dependent on the operating system of the compute instance since you don't have to manually attach the S3 bucket to your Amazon EC2, unlike an EBS volume, you just upload or fetch objects using restful web APIs, such as the put HTTP method for uploading data and the get HTTP method for downloading objects. Amazon S3 also offers different security, versioning and replication capabilities to manage your objects effectively. You can set up S3 versioning and multifactor authentication to prevent accidental data deletion in S3. The network access to your S3 buckets and objects can be controlled using the Amazon S3 access control lists. You can also create an S3 bucket policy for controlling external access to your bucket. With its cross station replication feature, your objects can be replicated to a different aws region automatically. Transferring data to and from your S3 bucket can also be improved by using the Amozon S3 transfer acceleration and the S3 multi-part upload options.

Contents