Leash Bio reposted this
Very proud to announce our Dataset v2.0! When we first set out to build a universal data format, I wasn't sure if it would ever work. There is still work to be done, but by building on top of Zarr (https://lnkd.in/ee7vukjf) we're standing on the shoulders of giants. Come build with us! https://lnkd.in/eNA8fPhX And a big shout-out to Leash Bio, especially Andrew Blevins and Ian Quigley, PhD, for their trust and patience! With 100M+ rows and 120GB+, their awesome BELKA dataset was the perfect resource to test our assumptions and implementation. Check it out on Polaris now: https://lnkd.in/eFg3JyEN
Curious to learn more about the work that went behind making BELKA accessible through Polaris? Learn more about dataset v2.0 and how we’re helping scientists focus on research, not on data wrangling. Read the blog: https://lnkd.in/gh4rGUXy Access BELKA: https://lnkd.in/gfY2ZYvs Drug discovery datasets come in different modalities and sizes. For example, BELKA contained ~120GB of data, which led to many participants struggling to work with such a large dataset. And that’s not even the largest dataset out there. Recursion’s rxrx.ai (open-source, phenomics datasets) has ~80TB of data (stay tuned 👀). With the Polaris Hub, we sought to design a universal data format for ML scientists in drug discovery. Whether you’re working with phenomics, small molecules, or protein structures, you shouldn’t have to spend time learning about domain-specific file formats, APIs, and software tools to be able to run some ML experiments.