In the demo at the end of the previous section, I copied the ingested and transformed files out of the Compute instance into Cloud storage. This way, we could stop our Compute instance and retain access to the data. While we've discussed the demand for a great amount of computing power for today's big data and ML jobs, we still need a place to store all the data that's generated. As with the demo, this needs to be separate from the Compute instances so that we can analyze it, transform it, and feed it into our models. This is one major way that Cloud computing differs from desktop computing. Compute and storage are independent. You don't want to think of disks attached to the Compute instance as a limit of how much data you can process and store. Getting your data into your solution and transforming it for your purposes should be your first priority. In the roles and team structure discussions that I'll talk about later in this module, I'll talk about the need for data engineers to build data pipelines before you can start building machine learning models from that data. When the pipeline is built, the job isn't done. Because once the data is in your system, data engineers are still needed to replicate the data, back it up, scale it, remove it as needed or at scale. Instead of data engineers managing storage infrastructures themselves, data engineers can use Google Cloud Storage, which is a durable global file system. Creating an elastic storage bucket is as simple as using the web UI in console.cloud.google.com, or the Google Storage utility function called gsutil in the command line interface. As an additional level of flexibility, a data engineer can choose what type of storage class they want for their data. There are four storage classes for you to choose from based on your data needs: standard storage, nearline storage, coldline storage, and archive storage. Regardless of which one you choose, all classes have multi-region, dual-region, and region location options. They differ based on the access speed and the cost. Standard storage is the fastest and archive storage is the least expensive. A good trade-off, for example, is to use nearline storage for data that you might access only monthly. For data analysis workloads, it's common to use a standard storage bucket within a region for staging your data. Why do I say within a region? That's because you need the data to be available to your data processing computing resources, and these will often be within a single region. Co-locating your resources this way maximizes the performance for data-intensive computations and could reduce network charges. Cloud Storage buckets are an example of a Cloud resource. Let's cover some of the account management logistics that you need in order to use Cloud resources. Starting from the most granular objects, you see that resources like your Cloud Storage bucket or Compute Engine instance, these resources belong to specific projects. Bucket names have to be globally unique, and GCP assigns your project ID that's globally unique too. So you can use that project ID as a unique name for your bucket. But what's a project? A project is a base-level organizing entity for creating and using resources and services for managing billing, APIs, and permissions. Zones and regions physically organize the GCP resources you use, whereas projects logically organize them. Projects can be created, managed, deleted, even recovered from accidental deletions. Folders are another logical grouping you can have for collections of projects. Having an organization is required to use folders. But what's an organization? The organization is a root node of the entire GCP hierarchy. While it's not required, an organization is quite useful because it allows you to set policies that apply throughout your enterprise to all the projects and all the folders that are created in your enterprise. Cloud Identity and Access Management, also called IM or IAM, lets you fine-tune access control to all the GCP resources you use. You define IAM policies that control user access to resources. Remember, if you want to use folders, you must have an organization. Now that you have a Google Cloud Storage bucket created, how do you get your data on the Cloud and work with the data once it's there in the bucket? In the demo, I use gsutil commands. Specifically, we can use cp for copy, and specify a target bucket location. If you spin up a Compute Engine instance, the command-line tool, gsutil, is already available and we can do gsutil copy. On your laptop, you can download the Google Cloud SDK so that you can get gsutil. Gsutil uses a familiar Unix command-line syntax, so if you know how to use the cp command on Unix, you know how to use gsutil cp.