Preparing for Data Engineering Interviews? Practice with these Snowflake Interview questions and see if you got the right answer.
These are commonly asked questions that range from entry-level Snowflake questions to Snowflake cases that make you think through systems and how Snowflake would fit in.
Question: Why use Snowflake? Answer: We use Snowflake because it separates compute from scalability, storage and independence of your data warehouses. Added benefits of Snowflake include: data sharing, marketplace, snowSQL, multi-cloud architecture, and permission levels.
Question: What programming languages are supported by Snowflake? Answer: The following programming languages are supported by Snowflake:
Question: How to access Snowflake's data warehouse? Answer: There are several ways to access Snowflake’s data warehouse:
Question: What is Snowflake Catching? Answer: When you run a data query, Snowflake stores the data (caches) so that when you re-run the query, you’ll get the results faster as they are being pulled either from result cache, local disk cache or remote disk cache instead of your data warehouse.
In general there are three types of cache in Snowflake:
Question: What is Time Travel? Answer: Time Travel allows us to access data that has been changed (deleted or simply modified) for a defined period of time. Time Travel is often used to restore tables, schemas, etc, duplicate and backup historical data or analyze data that was changed or deleted.
Specifically, you can do the following using Time Travel:
Question: Why do we use Fail-safe? Answer: We use Fail-safe in Snowflake to recover modified or deleted data 7 days after Time Travel’s retention period. Generally, we don’t use it unless all other data recovery options have been tried and tested. Worth noting that, Fail-safe takes a couple of days to recover data.
Question: Explain Data Shares in Snowflake? Answer: Data Shares are Snowflake objects that contain: permissions to access database(s) and specific object(s) with the database, object schema, and the accounts to share the object with.
In Snowflake, it is worth noting that no actual data is duplicated upon sharing.
Question: What is Snowpipe? Answer: Snowpipe is an event-based data ingestion tool that supports cloud storage event notification or Snowpipe insertFiles REST API to trigger a data loading event.
Question: What are the advantages of using Snowpipe? Answer: The top advantages of Snowpipe are:
Question: Is Snowflake OLAP or OLTP? Answer: Snowflake uses OLAP as the base for its database schema. When it comes to OLTP, as Snowflake doesn’t work well with UPDATE, DELETE, etc operations (singleton), it shouldn't be the best choice as an OLTP platform.
Question: Can you explain Snowflake’s architecture? Answer: Snowflake’s architecture has three layers:
Snowflake has a custom architecture that does not use a revolutionary SQL query engine and isn’t based on top of Hadoop, Spark, or any other existing big data infrastructure. At its core, Snowflake is a hybrid architecture of shared-nothing(every node has a processor, memory, and disk) and shared-disk architectures(disks are accessible from all Snowflake nodes).
Question: How is Snowflake different from AWS? Answer: AWS is huge and offers a diverse set of services that are more than a data warehouse. However, Snowflake solves the problem of data warehousing really well for enterprises that want to save while storing massive amounts of data.
Question: Can you tell us what is the difference between Snowflake and Redshift? Answer: While AWS Redshift has always promoted itself as an Infrastructure as a Service platform that can make data warehousing really simple for users, Redshift won’t work well for you, if, let’s say if you were to clone 10 TB of data in less than 5 minutes. Snowflake’s capabilities to separate compute and storage would appear to be a differentiator here.
In the case of AWS Redshift, a scenario such as the one mentioned above would require over-provisioning.
Question: What are Stages in Snowflake? Explain them. Locations that store data in Snowflake are called Stages. They are called external Stages if the data is stored in external cloud storage like AWS S3 or Google Cloud. We call them internal Stages when these locations are inside Snowflake.
Further, there are 3 different types of internal Stages - internal named stages, table stages, and user stages.
Question: How can we execute Snowflake procedure? Answer: To execute a stored procedure, we use the call statement.
Let’s take a look at it with an example. Our goal is to call a procedure and we have the following two tables shared between accounts: table 1: Billing and table 2: Pricing. We can’t do something like set newVar = (select procedure_1 from procedures); call $newVar;
We can execute our stored procedure by using identifier() instead: Set pr = “Pull_Pricing”; Call identifier($pr)(()
Question: What is Materialized view? Answer: Materialized views in Snowflake are query-specific per-computed data set, that are stored for quick data operations.
Creating these materialized views follows a relatively simpler syntax. Here’s an example of creating a materialized view: create materialized view my_view comment='Materialized view for pricing' as select price_nov, discount_3 from Pricing;
Question: Explain Snowflake security. Answer: Broadly, Snowflake comes with a diverse set of security features that include:
Question: Is Snowflake an ETL tool? Answer: No, Snowflake is a data warehouse. It is not an ETL tool. With Snowflake you can use an ETL tool(e.g Apache Airflow) to apply Extract, transform and load processes.
Question: Which ETL tools are compatible with Snowflake? Answer: There are multiple ETL tools that are compatible with Snowflake:
Question: Explain horizontal and vertical scaling. Answer: How would you design a data warehouse for an eCommerce website? To build out a data warehouse for an eCommerce website, we have to think about the following first:
With those factors in mind, we can easily think of an eCommerce data warehouse design using a Snowflake where:
Question: How to optimize or improve query performance? Answer: Improving query performance would be highly dependent on what type of database we are using. Let’s assume that we are using Snowflake as our data warehouse. We can do the following to optimize our query performance:
Copyright ©2024 Workstory Inc.