Nannostomus’ Post

View organization page for Nannostomus, graphic

17 followers

🚀 1 Billion Records in 22 Languages: The ETL Project for India Voter Data We converted 1 billion voter records scattered across Indian electoral authorities of all administrative divisions and states into a structured database. The data came in many forms—PDFs, photos of handwritten forms, and in 22 different languages. We cross-referenced names and addresses from India Post. Besides, we converted non-English data into Roman characters—all thanks to the expertise of linguists and custom machine transliteration algorithms. Read the full case study here (sample data included):  https://lnkd.in/ekFfiNz8 #DataExtraction #WebScraping #Nannostomus #DataScience #BigData #ETL

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics