DuckDB’s Post

View organization page for DuckDB, graphic

30,533 followers

We started a series of blog posts called “DuckDB tricks”. In these posts, we will present some useful shorthands, hacks, and design patterns for DuckDB. In the first part of the series, Gábor Szárnyas explains some techniques for handling floats, fine-tuning the CSV reader, and shuffling data sets. If you have a favorite DuckDB trick, let us know in the comments! https://lnkd.in/dwXd2b_s

  • No alternative text description for this image
🌻Lien Bosmans

Data Enthusiast (📍 Leuven)

2mo

My favourite trick is adding extra ()'s to do an ASOF JOIN on multiple key columns together with a timestamp column 😄 "orders LEFT ASOF JOIN prices ON (orders.product_id = prices.product_id AND orders.store_id = prices.store_id) AND orders.placed_at >= prices.updated_at"

Christophe O.

Staff Software Engineer at Teads

2mo

Thanks for this nice article! CSV support on DuckDB is impressive. I'd be interested to see more content regarding repeated structure management as I struggled to achieve few functions. For instance, I'd like thank (again) Alex Monahan for helping me achieving this "simple" UDF to check if an input is matching any of a list of patterns: ``` CREATE TEMPORARY FUNCTION isStrInAllowListPatterns(str, allowList) AS (   exists(with list_as_table as (     select unnest(allowList) as a   )   select * from list_as_table   where str like a) ); SELECT isStrInAllowListPatterns('abcd', ['ab', 'ad', '%bc%']); ``` There's also the fairly hard UNNEST syntax to read repetitive nested structures such as: ``` WITH fixture AS (  SELECT 1 AS id, 'snow duck' AS name,  [{'name': 'is_duck', 'value': true}, {'name': 'is_human', 'value': false}] AS data ) SELECT f.name, is_duck.value AS is_duck FROM fixture f LEFT JOIN (SELECT data.* FROM (SELECT data FROM UNNEST(f.data) d WHERE d.data.name = 'is_duck') LIMIT 1) AS is_duck ON TRUE; ``` where it requires a double SELECT to extract the columns or that it requires to access d.data.name (compared to BigQuery syntax that would use `d.name` directly). So I hope to see more content on that topic! 🙌

William O'Leary

Data Engineer @ City of New York

2mo

Fuzzy joins using the text similarity algorithms (jaro) included in DuckDB.

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics