Skip to main content

Google BigQuery

 

  • A fully managed data warehouse where you can feed petabyte-scale data sets and run SQL-like queries.

Features

  • Cloud BigQuery is a serverless data warehousing technology.
  • It provides integration with the Apache big data ecosystem allowing Hadoop/Spark and Beam workloads to read or write data directly from BigQuery using Storage API.
  • BigQuery supports a standard SQL dialect that is ANSI:2011 compliant, which reduces the need for code rewrites.
  • Automatically replicates data and keeps a seven-day history of changes which facilitates restoration and data comparison from different times.

Loading data into BigQuery

You must first load your data into BigQuery before you can run queries. To do this you can:

  • Load a set of data records from Cloud Storage or from a local file. The records can be in Avro, CSV, JSON (newline delimited only), ORC, or Parquet format.
  • Export data from Datastore or Firestore and load the exported data into BigQuery.

  • Load data from other Google services, such as
    • Google Ad Manager
    • Google Ads
    • Google Play
    • Cloud Storage
    • Youtube Channel Reports
    • Youtube Content Owner reports
  • Stream data one record at a time using streaming inserts.
  • Write data from a Dataflow pipeline to BigQuery.
  • Use DML statements to perform bulk inserts. Note that BigQuery charges for DML queries. See Data Manipulation Language pricing.

Querying from external data sources

  • BigQuery offers support for querying data directly from:
    • Cloud BigTable
    • Cloud Storage
    • Cloud SQL
  • Supported formats are:
    • Avro
    • CSV
    • JSON (newline delimited only)
    • ORC
    • Parquet
  • To query data on external sources, you have to create external table definition file that contains the schema definition and metadata.

Google BigQuery Monitoring

  • BigQuery creates log entries for actions such as creating or deleting a table, purchasing slots, or running a load job.

Google BigQuery Pricing

  • On-demand pricing lets you pay only for the storage and compute that you use.
  • Flat-rate pricing with reservations enables high-volume users to choose price for workloads that are predictable.
  • To estimate query costs, it is best practice to acquire the estimated bytes read by using the query validator in Cloud Console or submitting a query job using the API with the dryRun parameter. Use this information in Pricing Calculator to calculate the query cost.

Comments

Popular posts from this blog

Google Cloud Pub/Sub

  Cloud Pub/Sub is a fully-managed real-time messaging service for event driven systems that allows you to send and receive messages between independent applications. Features Capable of global message routing to simplify multi-region systems. Synchronous, cross-zone message replication and per-message receipt tracking ensure at-least-once delivery at any scale. Pub/Sub delivers each message at least once, so the Pub/Sub service might redeliver messages. You can declare independent quota and billing for publishers and subscribers. Cloud Pub/Sub doesn’t have shards or partitions. You just need to set your quota, publish, and consume. Key Concepts Topic It is a named resource to which publishers send messages. Subscription Is a named resource representing the stream of messages from a specific topic, to be sent to the subscribing application. Message The combination of data and attributes that a publisher sends to a topic and is eventually sent to subscribers. Message attribute A key...

Google Cloud Dataprep

  Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning. Features You can transform structured or unstructured datasets of any size — megabytes to petabytes — with equal ease and simplicity. Cloud Dataproc can transform datasets stored in CSV, JSON, or relational table formats. You can process data stored in Cloud Storage, BigQuery, or from your desktop, then export the refined data to BigQuery or Cloud Storage for storage, analysis, visualization, or machine learning. Uses a proprietary algorithm that interprets the data transformation intent of a user’s data selection. You can leverage hundreds of transformation functions readily available to turn your data into the asset you want. Cloud Dataprep enables users to collaborate on similar flow objects in real-time or to create copies for other team members to use for independent tasks. Explore your data ...

Google Cloud Identity and Access Management

  Create and manage permissions for your Google Cloud resources with Identity Access Management (IAM). Provides a unified view into your organization’s security policy with built-in auditing to ease compliance purposes. Features Lets you authorize who can take specific actions on resources to give you full control and visibility on your Google Cloud services centrally. Permissions are represented in the form of  service.resource.verb Can map job functions into groups and roles. With IAM, users only get access to what they need to get the job done. Cloud IAM enables you to grant access to cloud resources at fine-grained levels, well beyond project-level access. You can leverage Cloud Identity to easily create or sync user accounts across applications and projects. IAM lets you set policies at the following levels of the resource hierarchy: Organization level The organization resource represents your company. IAM roles granted at this level are inherited by all resources under t...