Eighty-nine percent of organizations agree that the rate of change has accelerated in the past two years, and it’s not likely to slow down anytime soon. In an age of relentless change, companies must have a firm grasp on the data and insights that can help them evolve their customer, workforce, and operational strategies accordingly. At the core, this requires a readily scalable approach to data architecture, best enabled by public cloud architectures. Yet, the movement away from on-premise data storage and towards the public cloud is a tall order. It can be challenging to pick an ideal public cloud platform that meets the needs of your business, and that best utilizes the existing skillsets within your workforce. Using an expertise-guided approach to inform your selection is the best way forward. In this blog series, we distill the data pipeline architectures of the top cloud platforms available today. From there, we equip you with some practical criteria to inform the right selection for your organization.
In the previous blog, we discussed how Amazon Web Services provides an intensive tech stack and allows the user to replicate almost all the features of an on-premise ETL architecture. Now, we’ll review the key elements of Google Cloud Platform (GCP) and equip you with a few practical considerations for determining whether GCP is the best option for your organization’s strategic objectives.
Google makes it easy to transform its enterprise architecture, either partially or entirely, into a cloud architecture encompassing data storage, processing, and reporting.
Scaling Compute Power and Memory
Google’s Cloud Data Fusion (CDF) enables you to build and manage ETL pipelines. Best of all, its features help you flex to the business requirement while staying cost-efficient. CDF is based on the open-source CDAP framework, which acts as an abstraction layer on top of the Google cloud services.
Google CDF can be used to ingest data from sources, including Google cloud data storage and on-premise cloud applications like Salesforce or Oracle. From there, Google's BigQuery or other analytics applications can transform and deliver the data.
CDF allows the user to create different profiles to run data pipelines and integrations. For example, the user can create a profile for large-scale data migration with high-resource availability and computing power and a different profile for daily runs with low-resource availability and lower compute power.
BigQuery can gather and analyze vast amounts of data in seconds. There are third party datasets (like the NOAA weather data) made publicly available in BigQuery that can be used to augment the user’s analysis.
Storage
GCP provides a variety of customizable options for data storage, including Google Cloud Storage, Cloud Spanner, BigQuery, and Cloud Data Store. Google Cloud is an ideal choice for users who need to pull data from multiple sources thanks to its ability to store objects as well as files. It allows the user to manage different storage classes based on the type of usage, e.g., multi-regional and regional storage for high-frequency access, and nearline or cold-line for low-frequency access. The user can seamlessly move data objects across storage classes through the Google-provided API.
Networking
Google has 22 cloud regions globally, ensuring that data is readily available and delivered quickly within a secure network. Google provides a tiered approach to optimize the cloud network for an organization’s business requirements.
- Premium Tier: The premium tier offers a low latency, highly reliable network as it uses Google's global private network. Google uses global load balancing to provide the user with a single anycast virtual IP for multiple regions.
- Standard Tier: Compared to the premium tier, the standard tier offers a lower performance network. Yet, it’s still in line with the latency and overall quality of customary transit ISPs. The major difference between the premium and standard tiers is that the standard has a region-level IP, whereas the premium tier has a single IP globally.
Below are a few key diagnostic criteria you can apply to check if GCP may be the best choice for your business:
- You need rapid scalability and flexibility with all services on the same network. Google’s suite of services, including Cloud Data Fusion, Cloud Data Storage, and BigQuery allows the user to scale up and down as needed, throttle compute power, and speed up the delivery of data.
- You’d like to transition from SQL while minimizing business disruption. Google provides a SQL-like environment that helps users transition more seamlessly from SQL to BigQuery.
- If your team has existing expertise with Google products, you can leverage this expertise and continue to build on it.
In the next blog of our series, we’ll explore the capabilities of Microsoft Azure, and equip you with some helpful considerations to evaluate whether it may be the right choice for your business.