We started a series of blog posts called “DuckDB tricks”. In these posts, we will present some useful shorthands, hacks, and design patterns for DuckDB. In the first part of the series, Gábor Szárnyas explains some techniques for handling floats, fine-tuning the CSV reader, and shuffling data sets. If you have a favorite DuckDB trick, let us know in the comments! https://lnkd.in/dwXd2b_s
Thanks for this nice article! CSV support on DuckDB is impressive. I'd be interested to see more content regarding repeated structure management as I struggled to achieve few functions. For instance, I'd like thank (again) Alex Monahan for helping me achieving this "simple" UDF to check if an input is matching any of a list of patterns: ``` CREATE TEMPORARY FUNCTION isStrInAllowListPatterns(str, allowList) AS ( exists(with list_as_table as ( select unnest(allowList) as a ) select * from list_as_table where str like a) ); SELECT isStrInAllowListPatterns('abcd', ['ab', 'ad', '%bc%']); ``` There's also the fairly hard UNNEST syntax to read repetitive nested structures such as: ``` WITH fixture AS ( SELECT 1 AS id, 'snow duck' AS name, [{'name': 'is_duck', 'value': true}, {'name': 'is_human', 'value': false}] AS data ) SELECT f.name, is_duck.value AS is_duck FROM fixture f LEFT JOIN (SELECT data.* FROM (SELECT data FROM UNNEST(f.data) d WHERE d.data.name = 'is_duck') LIMIT 1) AS is_duck ON TRUE; ``` where it requires a double SELECT to extract the columns or that it requires to access d.data.name (compared to BigQuery syntax that would use `d.name` directly). So I hope to see more content on that topic! 🙌
Fuzzy joins using the text similarity algorithms (jaro) included in DuckDB.
Data Enthusiast (📍 Leuven)
2moMy favourite trick is adding extra ()'s to do an ASOF JOIN on multiple key columns together with a timestamp column 😄 "orders LEFT ASOF JOIN prices ON (orders.product_id = prices.product_id AND orders.store_id = prices.store_id) AND orders.placed_at >= prices.updated_at"