Mesolitica memaparkan semula ini
At Mesolitica, we aim to build a better future for Malaysia and hope to collaborate with other Malaysian entities to make sure Malaysia AI Powerhouse achievable. Just a quick recap what Open Source tech Mesolitica has, 1. Continously gather pretraining data Malaysian context, up to 200B tokens, https://lnkd.in/gZ6Y5UY7 2. Pretrained from scratch Multi-nodes training bare-metal or Kubernetes, we done up to 10x 8 A100 DGX nodes, we already comfortable using SLURM or Ray, but we prefer Ray, https://lnkd.in/gPA8zHH3 3. Generating synthetic massive Instruction finetuning dataset, https://lnkd.in/gaNSP7Kg 4. Building multimodality dataset, we have Visual QA, Audio QA, Visual-Visual QA and Visual-Audio QA, https://lnkd.in/gf_7wafH 5. Building multimodality model, https://lnkd.in/gubWRYXj 6. You build your own architecture and need to serve concurrency? we have experience in build continuous batching also we support vLLM development, https://lnkd.in/gJzZejCR 7. We support static cache Encoder-Decoder for HuggingFace Transformers to unlock up to 2x inference speed, https://lnkd.in/grQBC2wy 8. Want infinite context length for both training and inference? We know context parallelism and currently developing this parallelism for vLLM, https://lnkd.in/g-M5Zytq 9. Building massive pseudolabel speech recognition dataset with timestamp, https://lnkd.in/gF74f48v 10. Want to serve real-time speech-to-speech with interruptable like GPT-4o? Websocket with GRPC backend to serve better streaming, https://lnkd.in/gSFJ3QBx