The continued presence of on-premises data warehouses represents an increasing burden given it requires complex integrations comprising of both code and infrastructure . When most of a company’s assets are in the cloud, establishing and maintaining data warehousing processes is vital for a business's success.
Building a data warehouse in the cloud using Snowflake can help companies of all sizes move to a modern cloud data warehouse.
Model of Snowflake warehouse
What is Snowflake?
Snowflake is a parallel processing data environment that lives in the cloud. In context of this article, Snowflake is the housed location for the data warehouse and all data warehouse related activities.
Benefits of Snowflake
Snowflake and data lakes
Snowflake offers a variety of ways to connect to raw data files including files from local storage or external cloud storage services.
Using Snowflake stages, a business can seamlessly upload data into tables in Snowflake using the [COPY INTO] command. The [COPY INTO] command offers a variety of customizable options, such as pattern matching, specific file formatting and incremental loading, that allows for businesses to easily transition data from the raw files to staging tables for use in the data warehouse.
Automating data loads using Snowflake
When looking to automate data loads into Snowflake, there are two viable options:
Snowflake pipes (Snowpipe)
A cloud data warehouse in Snowflake will likely utilize a combination of the two loading options, using stored procedures to load files once a day and Snowpipes to ingest data as it lands in the external stage (like streaming data) to reduce overall load times into the associated staging tables.
Snowflake’s stream feature allows for new (and deleted) rows to be tracked regarding the table they are assigned to. While the [COPY INTO] command will only upload new files, the associated streams can then identify the exact changes that occurred.
Streams are much like tables and can be queried as such, which means that the data warehouse processes can utilize them to incrementally update the data in the dimensional model. When a Data Manipulation Language (DML) statement is executed against them, the rows are removed from the stream.
Incorporating streams in the data warehouse is essential for having a fast-performing extract, transform and load (ETL) cycle when dealing with large volumes of data.
What is a task?
A task in Snowflake is a defined SQL statement or call to stored procedure that can be set to execute at specific times or after other tasks.
By using tasks, a business can create directed acrylic graphs (DAGs) of the SQL statements needed to load the data into the staging tables, transform the data as needed and update the tables in the data warehouse .
Tasks offer a variety of settings to align with the needs of the data warehouse such as:
Usage of tasks in Snowflake is the backbone of building the data warehouse, enabling automated processes.
A well-constructed data warehouse is a key asset to the data-driven business. Building a data warehouse in the cloud using Snowflake enables businesses to reap the benefits of proven methodologies while scaling for continued growth. Utilizing the techniques described in this article will allow a business to support and enhance analytics as they shift to using Snowflake.
As a Snowflake Select Partner, Baker Tilly Digital helps businesses optimize the performance, security, shareability and scale of their cloud data warehouse. Our practice professionals work with you, providing informed perspectives, to uncover and solve the digital challenges your business is facing, so you can work to build more meaningful relationships with your next generation of customers.
Interested in learning more?