Mesolitica

Mesolitica

Perkhidmatan IT dan Perundingan IT

Kuala Lumpur, Federal Territory of Kuala Lumpur 773 pengikut

We develop Multimodality Artificial Intelligence for South East Asia.

Perihal kami

We develop Multimodality Artificial Intelligence for South East Asia.

Industri
Perkhidmatan IT dan Perundingan IT
Saiz syarikat
2-10 pekerja
Ibu pejabat
Kuala Lumpur, Federal Territory of Kuala Lumpur
Jenis
Milik Persendirian
Ditubuhkan
2018

Lokasi

  • Utama

    Jalan Bukit Bintang

    Level 6, Fahrenheit 88 Office Tower

    Kuala Lumpur, Federal Territory of Kuala Lumpur 551100, MY

    Dapatkan arah

Pekerja di Mesolitica

Kemas Kini

  • Mesolitica memaparkan semula ini

    Lihat profil Husein Zolkepli, grafik

    I maintain kids.

    At Mesolitica, we aim to build a better future for Malaysia and hope to collaborate with other Malaysian entities to make sure Malaysia AI Powerhouse achievable. Just a quick recap what Open Source tech Mesolitica has, 1. Continously gather pretraining data Malaysian context, up to 200B tokens, https://lnkd.in/gZ6Y5UY7 2. Pretrained from scratch Multi-nodes training bare-metal or Kubernetes, we done up to 10x 8 A100 DGX nodes, we already comfortable using SLURM or Ray, but we prefer Ray, https://lnkd.in/gPA8zHH3 3. Generating synthetic massive Instruction finetuning dataset, https://lnkd.in/gaNSP7Kg 4. Building multimodality dataset, we have Visual QA, Audio QA, Visual-Visual QA and Visual-Audio QA, https://lnkd.in/gf_7wafH 5. Building multimodality model, https://lnkd.in/gubWRYXj 6. You build your own architecture and need to serve concurrency? we have experience in build continuous batching also we support vLLM development, https://lnkd.in/gJzZejCR 7. We support static cache Encoder-Decoder for HuggingFace Transformers to unlock up to 2x inference speed, https://lnkd.in/grQBC2wy 8. Want infinite context length for both training and inference? We know context parallelism and currently developing this parallelism for vLLM, https://lnkd.in/g-M5Zytq 9. Building massive pseudolabel speech recognition dataset with timestamp, https://lnkd.in/gF74f48v 10. Want to serve real-time speech-to-speech with interruptable like GPT-4o? Websocket with GRPC backend to serve better streaming, https://lnkd.in/gSFJ3QBx

    Malaysian pretraining dataset - a mesolitica Collection

    Malaysian pretraining dataset - a mesolitica Collection

    huggingface.co

  • Mesolitica memaparkan semula ini

    Lihat profil Husein Zolkepli, grafik

    I maintain kids.

    Mesolitica filtered 15T tokens FineWeb dataset from HuggingFace using simple Malaysian keywords. After filtering, we obtained up to 174B tokens! https://lnkd.in/gXNyKmZ7 How we do it? 1. We filter rows using {'malay', 'malaysia', 'melayu', 'bursa', 'ringgit'} keywords on r5.16xlarge EC2 instance for 7 days. 2. We calculate total tokens using tiktoken.encoding_for_model("gpt2") on c7a.24xlarge EC2 instance for 1 hour. Why we do it? So anybody can use this filtered corpus to pretrain, continue pretraining or generate synthetic dataset for their own use cases on 100% Malaysian contexts.

    mesolitica/fineweb-filter-malaysian-context · Datasets at Hugging Face

    mesolitica/fineweb-filter-malaysian-context · Datasets at Hugging Face

    huggingface.co

  • Mesolitica memaparkan semula ini

    Lihat laman organisasi MIMOS Berhad, grafik

    12,779 pengikut

    What an incredible day at MIMOS! We were honored to host a seminar on the latest developments and innovations in Artificial Intelligence (AI). Our goal: to foster collaboration among industry players, research institutions, and the government, accelerating AI adoption and integration. The event was a fantastic platform for networking, exchanging ideas, forging partnerships, and exploring new opportunities. We were honored to have our Minister, YB Tuan Chang Lih Kang, express MOSTI's unwavering commitment to supporting AI initiatives. He emphasized MOSTI's dedication to ensuring the necessary infrastructure, policies, and incentives are in place to promote AI development and adoption. Our keynote speaker, Dato KS Pua, captivated the audience with insights on 'aiDAPTIV+ as a Solution for AI'. It was eye-opening to see how AI is transforming industries and improving our daily lives. Dr. Hon shared numerous AI initiatives from MIMOS in agriculture, machine vision, and police lockup management. Our partner, Mr. Khalil Nooh, Co-Founder & CEO of Mesolitica, wowed us with their work on a Multi-lingual Chat Language Model. Together, we are paving the way for a smarter, more connected future! Chang Lih Kang @officialmosti #AI #Innovation #Collaboration #Technology #MIMOS #MOSTI #aiDAPTIV #Mesolitica

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • Mesolitica memaparkan semula ini

    Lihat laman organisasi Mesolitica, grafik

    773 pengikut

    We released Speech API! End-to-End Streamable Speech-to-Text and Speech Translation including Speaker Diarization! https://lnkd.in/gVk8XDmx 1. Less hallucination compared to OpenAI Whisper Large v3, when you give music or unclear audio, Whisper Large tend to generate repetitive texts. 2. Better Speech Translation, we optimized to target MS and EN languages. 3. Competitive WER benchmark, just slightly better for Malay and Manglish test set. 4. You can play around with the Speech API in Speech Playground, speaker diarization, multiple speakers, multiple models, everything is there! It also compatible with OpenAI Speech API, but sadly, OpenAI does not support streaming, we included an example how to use aiohttp library to do streaming. 5. We also provide simple UI to upload an audio or provide youtube URL to transcribe, after that you can choose to download as SRT or TXT. For upload audio, we limit up to 100 MB only, but for Youtube video, we can process arbitrary length. 6. We released 2 models, Base and Small, Base is RM2 / hour and Small is RM1 / hour, share credits with MaLLaM 🌙. 7. If you are interested for Enterprise deployment such as private network or on-premise, and want to finetune the models on your available dataset, feel free to catch up with us!

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
      +2
  • Lihat laman organisasi Mesolitica, grafik

    773 pengikut

    We released Speech API! End-to-End Streamable Speech-to-Text and Speech Translation including Speaker Diarization! https://lnkd.in/gVk8XDmx 1. Less hallucination compared to OpenAI Whisper Large v3, when you give music or unclear audio, Whisper Large tend to generate repetitive texts. 2. Better Speech Translation, we optimized to target MS and EN languages. 3. Competitive WER benchmark, just slightly better for Malay and Manglish test set. 4. You can play around with the Speech API in Speech Playground, speaker diarization, multiple speakers, multiple models, everything is there! It also compatible with OpenAI Speech API, but sadly, OpenAI does not support streaming, we included an example how to use aiohttp library to do streaming. 5. We also provide simple UI to upload an audio or provide youtube URL to transcribe, after that you can choose to download as SRT or TXT. For upload audio, we limit up to 100 MB only, but for Youtube video, we can process arbitrary length. 6. We released 2 models, Base and Small, Base is RM2 / hour and Small is RM1 / hour, share credits with MaLLaM 🌙. 7. If you are interested for Enterprise deployment such as private network or on-premise, and want to finetune the models on your available dataset, feel free to catch up with us!

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
      +2
  • Lihat laman organisasi Mesolitica, grafik

    773 pengikut

    We released Retrieval API, End-to-End Multi-lingual Malaysian Retrieval Engine, 8k context length and faster! https://lnkd.in/gpTbpgPD 1. Lower latency compared to OpenAI API Endpoints, Mesolitica API achieved 200ms on average while OpenAI is 1.1 seconds. 2. Better Embedding accuracy based on Recall@topk-5 for benchmarks provided, achieved 17% better on average compared to ada-002. 3. If you add Reranker for topk-20 post-sorting, it will improve the recall by 10% on average! 4. You can play around with the embedding API inside Retrieval Playgound, added simple 2D visualization. 5. Super cheap pricing, RM1 / 1M Tokens, share credits with MaLLaM 🌙. 6. Embedding API is compatible with OpenAI library, simply change `base_url` and good to go, while Reranker API you can use any request library.

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • Lihat laman organisasi Mesolitica, grafik

    773 pengikut

    terima kasih!

    Lihat profil Nuriman Quddus, grafik

    Software Engineer, Full-Stack

    I just released Malaysia Large Language Model(MaLLaM) NPM library🌙! Credits to Mesolitica for this amazing API btw! This library is a wrapper around MaLLaM API🌙 for JavaScript users, which allows them to use MaLLaM API🌙 within JavaScript context. What does MaLLaM API🌙 do? It basically just like the well-known ChatGPT but it answers and understands Malaysian context(correct me if im wrong). We can prompt it in Malay, Manglish, English, Jawi and also Chinese. MaLLam NPM🌙 library just reached 900 downloads in just 2 days! Below are example usage for this library, together with optional custom parameter. Go give it a try in your NodeJS app!: https://lnkd.in/g7b9Ngeb

    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
    • Tiada penerangan teks alternatif diberikan bagi imej ini
  • Mesolitica memaparkan semula ini

    Lihat profil Husein Zolkepli, grafik

    I maintain kids.

    Hi everyone, Mesolitica launched MaLLaM 🌙 (Malaysia Large Language Model) API! Multi-lingual Malaysian Chat Language Model, 32k context length, Malaysian centric and private, https://lnkd.in/gy9cp9mr 1. Natively understand Standard Malay, local Malay, Jawi, Standard English, Manglish, Mandarin and Indonesian as input. Standard English and Manglish will always reply in English, others will always reply in Malay. 2. Natively understand Retrieval-Augmented Generation, aka RAG. You just give prefix document and you can start your typical RAG QA sessions. 3. Natively understand Function Call, you can convert human natural text to function parameters and call the appropriate function, 100% done by LLM end-to-end. 4. Better accuracy than Mesolitica Open Models and ChatGPT3.5 using Tatabahasa and PT3 benchmarks. 5. You can play around with MaLLaM 🌙 inside Nous Playground. It support function call, multi-turn and export code, also you can check the cookbooks to help your development. AI Safety is coming soon which is free of charge part of the API! 6. The API is compatible with Python OpenAI library, or if you prefer cURL. 7. Our pricing is competitive with available LLM APIs out there, existing users or first registration will get free RM2.5 credit! 8. You can topup your MaLLaM 🌙 tokens at billing page, for now we only support Stripe. See picture 5. 9. We also open for self-host enterprise MaLLaM 🌙 in your private cloud or on-premise, 100% internet is not required, we collaborate with Nvidia APAC to include hardware support, read more about our self-host enterprise package at https://lnkd.in/giPmVaEj 10. If you are a researcher and interested to study MaLLaM🌙 extensively such as alignment, accuracy and etc, we can provide certain free credits, which will benefit for both parties, feel free to PM me. 11. We already started working on Code Interpreter, https://lnkd.in/gShK496m, LLM agent to execute code on your behalf, it can be your data analyst or python programmer companion, will be release in the next version. 12. Open source still at our heart, this is necessary to upskill the others, you can checkout our massive open malaysian models and dataset at https://lnkd.in/eU9KYafq We are looking forward for the feedback and the development of Malaysian Large Language Model for better tech!

    Mesolitica - MaLLaM 🌙

    Mesolitica - MaLLaM 🌙

    mesolitica.com

Laman yang serupa