20 Snowflake Interview Questions - Beginner to Advanced Level Questions

Preparing for Data Engineering Interviews? Practice with these Snowflake Interview questions and see if you got the right answer.

snowflake interview questions

These are commonly asked questions that range from entry-level Snowflake questions to Snowflake cases that make you think through systems and how Snowflake would fit in.

Snowflake Interview Questions for Beginners

Snowflake Interview Question #1

Question: Why use Snowflake? Answer: We use Snowflake because it separates compute from scalability, storage and independence of your data warehouses. Added benefits of Snowflake include: data sharing, marketplace, snowSQL, multi-cloud architecture, and permission levels.

Snowflake Interview Question #2

Question: What programming languages are supported by Snowflake? Answer: The following programming languages are supported by Snowflake:

  1. Java
  2. Go
  3. PHP
  4. Python
  5. .NET
  6. C
  7. Javascript (Node.js)

Snowflake Interview Question #3

Question: How to access Snowflake's data warehouse? Answer: There are several ways to access Snowflake’s data warehouse:

  • Use the Python Connector - snowflake.connector
  • Using the Snowflake UI at https://app.snowflake.com/
  • Using a Java Database Connectivity Driver (JDBC) provided by Snowflake
  • Using Snowflake’s ODBC driver with an ODBC client app
  • Using SnowSQL - using this command $ snowsql -c _my_snowflake_connection

Snowflake Interview Question #4

Question: What is Snowflake Catching? Answer: When you run a data query, Snowflake stores the data (caches) so that when you re-run the query, you’ll get the results faster as they are being pulled either from result cache, local disk cache or remote disk cache instead of your data warehouse.

In general there are three types of cache in Snowflake:

  1. Remote disk cache - to cache data for a longer term.
  2. Local disk cache - data is cached in memory or SSD.
  3. Result cache - data is stored for every query that was run in the last 24 hours locally.

Snowflake Interview Question #5

Question: What is Time Travel? Answer: Time Travel allows us to access data that has been changed (deleted or simply modified) for a defined period of time. Time Travel is often used to restore tables, schemas, etc, duplicate and backup historical data or analyze data that was changed or deleted.

Specifically, you can do the following using Time Travel:

  • Query historical data for a defined period of time.
  • Clone historical data for a defined period of time.
  • Restore or drop table, schema or database for a defined period of time.

Snowflake Interview Question #6

Question: Why do we use Fail-safe? Answer: We use Fail-safe in Snowflake to recover modified or deleted data 7 days after Time Travel’s retention period. Generally, we don’t use it unless all other data recovery options have been tried and tested. Worth noting that, Fail-safe takes a couple of days to recover data.

Snowflake Interview Question #7

Question: Explain Data Shares in Snowflake? Answer: Data Shares are Snowflake objects that contain: permissions to access database(s) and specific object(s) with the database, object schema, and the accounts to share the object with.

In Snowflake, it is worth noting that no actual data is duplicated upon sharing.

Snowflake Interview Question #8

Question: What is Snowpipe? Answer: Snowpipe is an event-based data ingestion tool that supports cloud storage event notification or Snowpipe insertFiles REST API to trigger a data loading event.

Snowflake Interview Questions (Advanced)

Snowflake Interview Question #9

Question: What are the advantages of using Snowpipe? Answer: The top advantages of Snowpipe are:

  1. It is very easy to use. You can connect Snowpipe with your data storage and its data ingestion would work seamlessly out-of-box.
  2. It is very cost-effective as you only have to pay-per-second or pay-per-hour compute resources used.
  3. You don’t have to worry about over-provisioning data to make sure everything holds when there’s a high load.
  4. Provides real-time insights just like any other big data system would do.
  5. You can use a range of tools including SDKs and APIs.

Snowflake Interview Question #10

Question: Is Snowflake OLAP or OLTP? Answer: Snowflake uses OLAP as the base for its database schema. When it comes to OLTP, as Snowflake doesn’t work well with UPDATE, DELETE, etc operations (singleton), it shouldn't be the best choice as an OLTP platform.

Snowflake Interview Question #11

Question: Can you explain Snowflake’s architecture? Answer: Snowflake’s architecture has three layers:

  1. Cloud services - this layer has authentication and access control, infrastructure manager, optimizer, metadata manager, and security services inside of it. They manage activities and process incoming requests securely till a query is dispatched.
  2. Query processing - this layer has virtual warehouses where the query execution happens. Each of these virtual warehouses has an independent compute resource which allows Snowflake to perform seamlessly.
  3. Database storage - this database layer is where Snowflake takes the enterprise data to compress and optimize it in a columnar format. This is the same layer that allows Snowflake to manage metadata, file size, and other parameters of data as well.

Snowflake has a custom architecture that does not use a revolutionary SQL query engine and isn’t based on top of Hadoop, Spark, or any other existing big data infrastructure. At its core, Snowflake is a hybrid architecture of shared-nothing(every node has a processor, memory, and disk) and shared-disk architectures(disks are accessible from all Snowflake nodes).

Snowflake Interview Question #12

Question: How is Snowflake different from AWS? Answer: AWS is huge and offers a diverse set of services that are more than a data warehouse. However, Snowflake solves the problem of data warehousing really well for enterprises that want to save while storing massive amounts of data.

Snowflake Interview Question #13

Question: Can you tell us what is the difference between Snowflake and Redshift? Answer: While AWS Redshift has always promoted itself as an Infrastructure as a Service platform that can make data warehousing really simple for users, Redshift won’t work well for you, if, let’s say if you were to clone 10 TB of data in less than 5 minutes. Snowflake’s capabilities to separate compute and storage would appear to be a differentiator here.

In the case of AWS Redshift, a scenario such as the one mentioned above would require over-provisioning.

Snowflake Interview Question #14

Question: What are Stages in Snowflake? Explain them. Locations that store data in Snowflake are called Stages. They are called external Stages if the data is stored in external cloud storage like AWS S3 or Google Cloud. We call them internal Stages when these locations are inside Snowflake.

Further, there are 3 different types of internal Stages - internal named stages, table stages, and user stages.

Snowflake interview questions for experienced Data Engineers

Snowflake Interview Question #15

Question: How can we execute Snowflake procedure? Answer: To execute a stored procedure, we use the call statement.

Let’s take a look at it with an example. Our goal is to call a procedure and we have the following two tables shared between accounts: table 1: Billing and table 2: Pricing. We can’t do something like set newVar = (select procedure_1 from procedures); call $newVar;

We can execute our stored procedure by using identifier() instead: Set pr = “Pull_Pricing”; Call identifier($pr)(()

Snowflake Interview Question #16

Question: What is Materialized view? Answer: Materialized views in Snowflake are query-specific per-computed data set, that are stored for quick data operations.

Creating these materialized views follows a relatively simpler syntax. Here’s an example of creating a materialized view: create materialized view my_view comment='Materialized view for pricing' as select price_nov, discount_3 from Pricing;

Snowflake Interview Question #17

Question: Explain Snowflake security. Answer: Broadly, Snowflake comes with a diverse set of security features that include:

  1. Out-of-the-box access control through Network access control by Snowflake that relies on network policies to enforce account access.
  2. Monitoring access events. We can specifically look into unsuccessful events with IS_SUCCESS
  3. Monitoring and logging failed query attempts using QUERY_HISTORY_BY_USER or QUERY_HISTORY.
  4. Managing access at scale for users, groups, and different roles through Snowflake is easier.
  5. Snowflake secure views that allow us to implement column-level access control for Snowflake users. They tend to become a little more complex as we scale Snowflake secure views.
  6. Identity management features that can leverage internal and external resources for user provisioning.

Snowflake Interview Question #18

Question: Is Snowflake an ETL tool? Answer: No, Snowflake is a data warehouse. It is not an ETL tool. With Snowflake you can use an ETL tool(e.g Apache Airflow) to apply Extract, transform and load processes.

Question: Which ETL tools are compatible with Snowflake? Answer: There are multiple ETL tools that are compatible with Snowflake:

  1. Matillion
  2. Coalesce.io
  3. Airbyte
  4. Airflow
  5. Blendo
  6. Stitch
  7. Integrate.io

Snowflake Interview Question #19

Question: Explain horizontal and vertical scaling. Answer: How would you design a data warehouse for an eCommerce website? To build out a data warehouse for an eCommerce website, we have to think about the following first:

  1. The data model for the website.
  2. Availability of engineering resources.
  3. Scalability of the database- there would be >500,000 rows.
  4. Orchestrator selection.
  5. The velocity of incoming eCommerce data.
  6. Create the data model.

With those factors in mind, we can easily think of an eCommerce data warehouse design using a Snowflake where:

  1. We select a very structured relational data model. If the data is already structured, a relational data model is a big fit.
  2. As we are dealing with a large number of rows, we need a distributed system that’s horizontally scalable.
  3. If there are very few engineering resources available, Snowflake is the best fit for creating a data warehouse as it needs the least amount of resources to create a data warehouse.
  4. We can use tools like Apache Airflow with Kubernetes to scale.
  5. As the data comes in on an hourly basis, we can partition our tables hourly.
  6. Lastly, we organize our data into schemas to finish our eCommerce website architecture.

Snowflake Interview Question #20

Question: How to optimize or improve query performance? Answer: Improving query performance would be highly dependent on what type of database we are using. Let’s assume that we are using Snowflake as our data warehouse. We can do the following to optimize our query performance:

  1. Identify if there’s lazy spooling (duplicate aggregation). It is usually unintentional and bad for query performance. We can pre-calculate aggregation instead to get better performance.
  2. Combine and consolidate if you have too many subqueries as it degrades performance.
  3. Use proper indexes to solve hash matches.
  4. Eliminate creating bad views, instead update existing views with new columns.
  5. Use proper clustering.
  6. Leverage micro partition pruning.

Copyright ©2024 Workstory Inc.

Facebook
Twitter
LinkedIn