Snowflake: ...the 20 statements limit.
The Problem
Snowflake looks and acts like a typical RDBMS. This unfortunately leads many developers (including yours truly) to make bad choices in the name of expediency. Snowflake has always been clear that you shouldn't use it for OLTP applications. However, this "little API" or "it's just a few log records" has forced me (and my team) to learn and relearn this lesson many times in the past two years.
A number of these solutions worked great in development and QA environments, or even in production for some weeks. Then, suddenly, disaster ensued.
Snowflake enforces a limit on concurrent DML statements against a table. If you have a process that performs many concurrent INSERT/UPDATE/DELETE statements, you may will see the performance of the DMLs degrade, the warehouse will show queuing, and then you'll see these error messages:
snowflake.connector.errors.ProgrammingError: 000625 (57014): Statement '01b6c04c-0108-5374-0008-50d700010052' has locked table 'LINEITEM' in transaction 1725239781553000000 and this lock has not yet been released.
Your statement '01b6c04c-0108-5374-0008-50d700010056' was aborted because the number of waiters for this lock exceeds the 20 statements limit.
Snowflake doesn't explicitly call this limit out anywhere, but you will see many references to it in the Knowledge Base.
Snowflake Setup
It's easier to demonstrate the problem if you have some data already in the target table.
CREATE DATABASE BLOG
CREATE SCHEMA CONCUR_WRITE
CREATE TABLE LINEITEM AS SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.LINEITEM
Example Code
Here is some example code which demonstrates the problem:
from multiprocessing import Process
import snowflake.connector
import getpass
import time
def query_snowflake(password):
con = snowflake.connector.connect(
user='testuser', -- Snowflake user
password=password, -- Script will prompt you
account='seuibhi-ipb66800' -- Snowflake account ID
)
cur = con.cursor()
before = time.perf_counter()
con.cursor().execute(
"INSERT INTO BLOG.CONCUR_WRITE.LINEITEM(L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY) "
"VALUES(%(col1)s, %(col2)s, %(col3)s, %(col4)s, %(col5)s)", {
'col1': 6000000,
'col2': 32255,
'col3': 2256,
'col4': 1,
'col5': 5.00,
})
after = time.perf_counter()
cur.close()
print(f"Statement duration: {after - before} seconds.")
if __name__ == '__main__':
pw = getpass.getpass()
for x in range(1,35): -- modify this to change concurrency
p = Process(target=query_snowflake, args=[pw])
p.start()
Output
As the number of concurrent writes increases above 20, you're more likely to encounter the error. In this run with 25 concurrent connections, you can see Snowflake starting to bog down writing the micropartitions, but it doesn't get so far behind that it throws the error:
Concurrencies of 40 or more (this example is from 100) reliably generate the error:
We can see the queuing (and errors) in Snowflake, in the warehouse tab:
What now?
Warehouse Sizing, Clustering
Increasing the size of the warehouse can help, because the transactions will be processed faster. Once the concurrency is over 20, it will still fail. Enabling clustering doesn't solve the problem, either.
Unistore
Snowflake Unistore is now in public preview. Unistore promises improved write performance. However, there are substantial limitations, including a 500 GB maximum, 200 writes/s, 100 total databases per account. Unistore is not supported in trial accounts, you have to have a paid account to test it.
Batch Writes
The easiest band-aid for most of the solutions where I've hit this problem has been the batch the updates. Streaming the updates into an SQS queue or files in S3, then writing that in a batch to Snowflake reduces the DML footprint substantially.
Just... Don't.
My takeaways are these:
Don't build OLTP apps on Snowflake, no matter how small or trivial. Use a purpose-built database that scales (e.g. DynamoDB).
Don't write log messages to Snowflake. Write them to S3 or a stream.
Watch your exception handling. If you just keep retrying failed writes forever, you can create a cascade of failures.
GitLab Repository: