Mesolitica

Mesolitica

Perkhidmatan IT dan Perundingan IT

Kuala Lumpur, Federal Territory of Kuala Lumpur 839 pengikut

We develop Multimodality Artificial Intelligence for South East Asia.

Perihal kami

We develop Multimodality Artificial Intelligence for South East Asia.

Industri
Perkhidmatan IT dan Perundingan IT
Saiz syarikat
2-10 pekerja
Ibu pejabat
Kuala Lumpur, Federal Territory of Kuala Lumpur
Jenis
Milik Persendirian
Ditubuhkan
2018

Lokasi

  • Utama

    Jalan Bukit Bintang

    Level 6, Fahrenheit 88 Office Tower

    Kuala Lumpur, Federal Territory of Kuala Lumpur 551100, MY

    Dapatkan arah

Pekerja di Mesolitica

Kemas Kini

  • Lihat laman organisasi Mesolitica, grafik

    839 pengikut

    Great work! Let us know if you need access to better GPU!

    Lihat profil Taufeeq H., grafik

    DevOps | Network engineer | Occasionally dabbles in AI

    About a week ago I set up a frankenstein 🧟♂️ of a Kubernetes cluster (AMD, Intel, ARM) for my homelab. Since then I've added Nvidia into the mix. Being GPU Poor has it's challenges but since my old laptop node in the cluster already had an Nvidia GPU why not utilize it? An Nvidia GTX 1050 Mobile with 4GB of VRAM is good enough for small models. I'm Malaysian, naturally so Mesolitica have some small malaysian LLM models that I wanted to try out. Utilizing Open-WebUI for a chatgpt like interface and NVIDIA GPU Operator to enable gpu processing in my kubernetes cluster, I've managed to run Mesolitica malaysian-Llama-3.2-3B-Instruct model. Outputting at a nice rate of 23.56 tokens/s. Pretty fast considering it's a 7 year old laptop. Quite the work getting this all to run in my frankenstein of a cluster. The troubleshooting it took getting the gpu to work was quite the learning experience. Thanks to Mesolitica for the models on hugging face. https://lnkd.in/gQxi4Pm2 My frankenstein cluster. I'll use whatever I can find. https://lnkd.in/gwf3Cnk3 Anybody willing to pass me a high memory gpu to play with? 😂. Would love to play around how one scales up serving LLM to multiple users. #malaysian #llm #ai #gpupoor #kubernetes #openwebui #nvidia #grafana

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • Mesolitica memaparkan semula ini

    Lihat profil Husein Zolkepli, grafik

    I build Malaysian Multimodality AI

    At Mesolitica, we aim to build a better future for Malaysia and hope to collaborate with other Malaysian entities to make sure Malaysia AI Powerhouse achievable. Just a quick recap what Open Source tech Mesolitica has, 1. Continously gather pretraining data Malaysian context, up to 200B tokens, https://lnkd.in/gZ6Y5UY7 2. Pretrained from scratch Multi-nodes training bare-metal or Kubernetes, we done up to 10x 8 A100 DGX nodes, we already comfortable using SLURM or Ray, but we prefer Ray, https://lnkd.in/gPA8zHH3 3. Generating synthetic massive Instruction finetuning dataset, https://lnkd.in/gaNSP7Kg 4. Building multimodality dataset, we have Visual QA, Audio QA, Visual-Visual QA and Visual-Audio QA, https://lnkd.in/gf_7wafH 5. Building multimodality model, https://lnkd.in/gubWRYXj 6. You build your own architecture and need to serve concurrency? we have experience in build continuous batching also we support vLLM development, https://lnkd.in/gJzZejCR 7. We support static cache Encoder-Decoder for HuggingFace Transformers to unlock up to 2x inference speed, https://lnkd.in/grQBC2wy 8. Want infinite context length for both training and inference? We know context parallelism and currently developing this parallelism for vLLM, https://lnkd.in/g-M5Zytq 9. Building massive pseudolabel speech recognition dataset with timestamp, https://lnkd.in/gF74f48v 10. Want to serve real-time speech-to-speech with interruptable like GPT-4o? Websocket with GRPC backend to serve better streaming, https://lnkd.in/gSFJ3QBx

    Malaysian pretraining dataset - a mesolitica Collection

    Malaysian pretraining dataset - a mesolitica Collection

    huggingface.co

  • Mesolitica memaparkan semula ini

    Lihat profil Husein Zolkepli, grafik

    I build Malaysian Multimodality AI

    Mesolitica filtered 15T tokens FineWeb dataset from HuggingFace using simple Malaysian keywords. After filtering, we obtained up to 174B tokens! https://lnkd.in/gXNyKmZ7 How we do it? 1. We filter rows using {'malay', 'malaysia', 'melayu', 'bursa', 'ringgit'} keywords on r5.16xlarge EC2 instance for 7 days. 2. We calculate total tokens using tiktoken.encoding_for_model("gpt2") on c7a.24xlarge EC2 instance for 1 hour. Why we do it? So anybody can use this filtered corpus to pretrain, continue pretraining or generate synthetic dataset for their own use cases on 100% Malaysian contexts.

    mesolitica/fineweb-filter-malaysian-context · Datasets at Hugging Face

    mesolitica/fineweb-filter-malaysian-context · Datasets at Hugging Face

    huggingface.co

  • Mesolitica memaparkan semula ini

    Lihat laman organisasi MIMOS Berhad, grafik

    13,466 pengikut

    What an incredible day at MIMOS! We were honored to host a seminar on the latest developments and innovations in Artificial Intelligence (AI). Our goal: to foster collaboration among industry players, research institutions, and the government, accelerating AI adoption and integration. The event was a fantastic platform for networking, exchanging ideas, forging partnerships, and exploring new opportunities. We were honored to have our Minister, YB Tuan Chang Lih Kang, express MOSTI's unwavering commitment to supporting AI initiatives. He emphasized MOSTI's dedication to ensuring the necessary infrastructure, policies, and incentives are in place to promote AI development and adoption. Our keynote speaker, Dato KS Pua, captivated the audience with insights on 'aiDAPTIV+ as a Solution for AI'. It was eye-opening to see how AI is transforming industries and improving our daily lives. Dr. Hon shared numerous AI initiatives from MIMOS in agriculture, machine vision, and police lockup management. Our partner, Mr. Khalil Nooh, Co-Founder & CEO of Mesolitica, wowed us with their work on a Multi-lingual Chat Language Model. Together, we are paving the way for a smarter, more connected future! Chang Lih Kang @officialmosti #AI #Innovation #Collaboration #Technology #MIMOS #MOSTI #aiDAPTIV #Mesolitica

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • Mesolitica memaparkan semula ini

    Lihat laman organisasi Mesolitica, grafik

    839 pengikut

    We released Speech API! End-to-End Streamable Speech-to-Text and Speech Translation including Speaker Diarization! https://lnkd.in/gVk8XDmx 1. Less hallucination compared to OpenAI Whisper Large v3, when you give music or unclear audio, Whisper Large tend to generate repetitive texts. 2. Better Speech Translation, we optimized to target MS and EN languages. 3. Competitive WER benchmark, just slightly better for Malay and Manglish test set. 4. You can play around with the Speech API in Speech Playground, speaker diarization, multiple speakers, multiple models, everything is there! It also compatible with OpenAI Speech API, but sadly, OpenAI does not support streaming, we included an example how to use aiohttp library to do streaming. 5. We also provide simple UI to upload an audio or provide youtube URL to transcribe, after that you can choose to download as SRT or TXT. For upload audio, we limit up to 100 MB only, but for Youtube video, we can process arbitrary length. 6. We released 2 models, Base and Small, Base is RM2 / hour and Small is RM1 / hour, share credits with MaLLaM 🌙. 7. If you are interested for Enterprise deployment such as private network or on-premise, and want to finetune the models on your available dataset, feel free to catch up with us!

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
      +2
  • Lihat laman organisasi Mesolitica, grafik

    839 pengikut

    We released Speech API! End-to-End Streamable Speech-to-Text and Speech Translation including Speaker Diarization! https://lnkd.in/gVk8XDmx 1. Less hallucination compared to OpenAI Whisper Large v3, when you give music or unclear audio, Whisper Large tend to generate repetitive texts. 2. Better Speech Translation, we optimized to target MS and EN languages. 3. Competitive WER benchmark, just slightly better for Malay and Manglish test set. 4. You can play around with the Speech API in Speech Playground, speaker diarization, multiple speakers, multiple models, everything is there! It also compatible with OpenAI Speech API, but sadly, OpenAI does not support streaming, we included an example how to use aiohttp library to do streaming. 5. We also provide simple UI to upload an audio or provide youtube URL to transcribe, after that you can choose to download as SRT or TXT. For upload audio, we limit up to 100 MB only, but for Youtube video, we can process arbitrary length. 6. We released 2 models, Base and Small, Base is RM2 / hour and Small is RM1 / hour, share credits with MaLLaM 🌙. 7. If you are interested for Enterprise deployment such as private network or on-premise, and want to finetune the models on your available dataset, feel free to catch up with us!

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
      +2
  • Lihat laman organisasi Mesolitica, grafik

    839 pengikut

    We released Retrieval API, End-to-End Multi-lingual Malaysian Retrieval Engine, 8k context length and faster! https://lnkd.in/gpTbpgPD 1. Lower latency compared to OpenAI API Endpoints, Mesolitica API achieved 200ms on average while OpenAI is 1.1 seconds. 2. Better Embedding accuracy based on Recall@topk-5 for benchmarks provided, achieved 17% better on average compared to ada-002. 3. If you add Reranker for topk-20 post-sorting, it will improve the recall by 10% on average! 4. You can play around with the embedding API inside Retrieval Playgound, added simple 2D visualization. 5. Super cheap pricing, RM1 / 1M Tokens, share credits with MaLLaM 🌙. 6. Embedding API is compatible with OpenAI library, simply change `base_url` and good to go, while Reranker API you can use any request library.

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • Lihat laman organisasi Mesolitica, grafik

    839 pengikut

    terima kasih!

    Lihat profil Nuriman Quddus, grafik

    Software Engineer, Full-Stack

    I just released Malaysia Large Language Model(MaLLaM) NPM library🌙! Credits to Mesolitica for this amazing API btw! This library is a wrapper around MaLLaM API🌙 for JavaScript users, which allows them to use MaLLaM API🌙 within JavaScript context. What does MaLLaM API🌙 do? It basically just like the well-known ChatGPT but it answers and understands Malaysian context(correct me if im wrong). We can prompt it in Malay, Manglish, English, Jawi and also Chinese. MaLLam NPM🌙 library just reached 900 downloads in just 2 days! Below are example usage for this library, together with optional custom parameter. Go give it a try in your NodeJS app!: https://lnkd.in/g7b9Ngeb

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini

Laman yang serupa