In addition to super fast query execution times, BigQuery also manages the storage and the metadata for your datasets. As you mentioned earlier, BigQuery can ingest datasets from a variety of different formats. Once inside BigQuery native storage, your data is then fully managed by the BigQuery team here at Google, and it's automatically replicated, backed up and set up to auto scale for your query needs. You can even recover recently deleted tables within a certain period too. Воспроизведите видео, начиная с ::29, и следуйте текстовым инструкциям0:29 You also have the option of querying external data sources directly as we talked about, which bypasses BigQuery's managed storage. For example, if you have a raw CSV file and Google Cloud Storage or a Googlesheet, you can write a query against it without having to ingest the data into BigQuery first. A common use case for this is having an external data source that is relatively small but constantly changes, like say a price list for commodities that another team maintains and continually updates. But there are a few reasons why you maybe should consider not doing this. The first is that the data consistency coming in from that external or federated data source is not guaranteed. If the external data source changes mid-flight, BigQuery doesn't guarantee that those updates were actually captured. And if you're concerned about that, consider building a streaming data pipeline into BigQuery with Cloud Dataflow which will be the topic that we'll cover in the next module. Воспроизведите видео, начиная с :1:28, и следуйте текстовым инструкциям1:28 In addition to ingesting datasets as a batch, like uploading a CSV, you can also stream records into BigQuery via the API. Воспроизведите видео, начиная с :1:37, и следуйте текстовым инструкциям1:37 Note that there are a few quota restrictions that you should be aware of, the max row size for a streaming insert is one megabyte and the maximum throughput is 100,000 records per second per project. If you need higher throughput, say in the order of millions of records per second, for use cases you can consider like application logging or real time events tracking, consider using cloud big table as a data sync instead. Before you start streaming 1000s of records into BigQuery using the API, consider the other options you could have for your streaming solution. Воспроизведите видео, начиная с :2:11, и следуйте текстовым инструкциям2:11 If you have data that needs to be transformed or aggregated midflight into table and row format, or joined against other data sources aside inputs midstream, or if you want to take just a window or a segment of that data, you should really consider using Cloud Dataflow for the streaming data pipeline. We'll cover working with that solution in the very next module. Воспроизведите видео, начиная с :2:32, и следуйте текстовым инструкциям2:32 Lastly, if you're familiar with database design, you've likely heard of the concept of normalization which is the act of breaking up one huge table into component child tables, so you're strolling one fact in one place and not repeating yourself across records. Take this taxi company for example. You could track payments, timestamps of events, and geographic Lat, Longs of pickups and drop offs, all in separate tables as you see here. What would you need to do to bring all these data sources together for reporting? Воспроизведите видео, начиная с :3:5, и следуйте текстовым инструкциям3:05 If you said do one massive big SQL join, [LAUGH] that's absolutely right. But, while breaking apart database tables and a silos is a common practice for relational databases like MySQL to SQL server. For data warehousing and reporting, let me show you a cool new way to structure your reporting tables. Воспроизведите видео, начиная с :3:24, и следуйте текстовым инструкциям3:24 Take a look at the table at the top. What do you notice that you haven't seen before? Воспроизведите видео, начиная с :3:30, и следуйте текстовым инструкциям3:30 First, you have just one row of data, but it looks like you have four. What's going on? Well, event.time that field is actually an array data type. You could have multiple data values in this case timestamp for a single taxi booking in that single row. Воспроизведите видео, начиная с :3:49, и следуйте текстовым инструкциям3:49 Likewise with a corresponding array of status values at that timestamp. So already in one table, you get the high level if you wanted of a total number of orders, but you could also get the number of completed orders without having to do any of those complex joins. Воспроизведите видео, начиная с :4:5, и следуйте текстовым инструкциям4:05 The second insight, it's a bit more hidden, and is that some of the column names look almost like a family name. Event.status, event.time, or destination.latitude, destination.longitude. These grouped fields are part of a SQL data type known as a STRUCT.. If you were to inspect this schema for this table, he noticed that for the event field data type, it's a data type Record, which indicates that this is a data type of a STRUCT. You can think of a STRUCT as essentially a collection of other fields, kind of like a table. From a reporting standpoint, you can have many nested STRUCTs within a single table, which is conceptually like having many other tables pre joined, which get you excited into one big table. And prejoined for you means faster queries for larger datasets and no more of those complex 15 table joins. Воспроизведите видео, начиная с :5:3, и следуйте текстовым инструкциям5:03 Now huge benefit of doing it this way is that you have a single table which has all the fields in a single place for you to analyze. You don't need to worry about the join keys or differing levels of table granularity anymore.