Skip to content
View nachollorca's full-sized avatar

Block or report nachollorca

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
nachollorca/README.md

Hola! 👋

My friends call me Nacho (yep, like the tortilla chips 🌮) and I am SWE/Data Scientist from Madrid, now based in Berlin. I am using this GitHub to put together some of the projects I can actually showcase, as a (bit untidy) portfolio.

     

👩‍⚕️ Cross-Corpus NER with Biomedical German Corpora

I created harmonized versions of the four clinical, distributable and annotated datasets in German language. Using the harmonized corpora as a meta-dataset, I performed a total of 340 cross-corpus evaluation experiments to assess the generalization capabilities of current-day pre-trained Transformers on each dataset. The paper was worth a publication in the 5th Clinical Natural Language Processing Workshop of the ACL. An extended version of the work with a substantially larger k-fold experimental design will come out shortly. The datasets can be applied for through their respective DUAs. Unfortunately, the code and model checkpoints are kept private due to GDPR concerns.

I participated in the development of xMEN, a python package for cross-lingual (x) Medical Entity Normalization. Mainly, I took care of the pypi integration, the Command Line Interface, unit tests and documentation. We used the application for the first time to obtain SOTA results in the Disease Text Mining Shared Task from the Barcelona Supercomputing Center. We presented the results in this book for the International Conference of the Cross-Language Evaluation Forum for European Languages 2023. Furthermore, a complete specification can be read in this paper.

Promptbook is an open-source project that automatically provides a GUI for prompt-generating Python functions. Built on top of Python type hints, you can store, launch, tune and reuse and share GPT prompts of all kinds in a convinient GUI manner. The code and contribution guide are open in the repo.

This little proof of concept showcases how combining different AI actors can be effective to build conversational language teachers. The app allows you first to record your voice. It will process your speech, show you which grammatical/spelling mistakes you made, and answer back to keep the conversation going. You can also practice your listening skills by replaying an audio of the robot's answer! More technical specs can be found in the repo.

Katrin Ortmann presented in 2022 a new evaluation method that more accurately reflects true prediction quality for labeled span-based metrics. We thought the idea was fantastic, so I provided an implementation within the HuggingFace Evaluate module to democratize its use. It has already been of help in some workshops and shared tasks. Just follow the link to see the code!

🕵️ Multi-modal Face Anti Spoofing

 

I developed an application to detect whether the face in front of the camera is real or fake (a print, a screen...) for a Kiosk manufacturing company. I use a CNN trained separately on RGB, infrared and depth information and test its performance on a self-recorded dataset, paying special attention to the effects of different conditions of room lighting or camera position. The depth model proves to be superior in most scenarios, notably obtaining 0.05 ACER score across all settings. A summary of the project development can be read here. The code unfortunately belongs to the company and cannot be shown.

Musicator is my Python program with Streamlit frontend that uses a neural network trained on the GTZAN dataset to guess the genre of a song among blues, classical, country, disco, hiphop, jazz, metal, pop, reggae and rock. The source code and and Jupyter Notebook with an extensive analysis of the problem can be seen on this repo.

🟢 Open Source Contributions

I am always on the look to do my bit for open-source efforts that I use frequently. As such, I am a top contributor for BigBIO and xMEN, I made some improvements in Sci-spaCy or included various modules and models in Hugging Face.

Old computer-vision application I developed with a couple of classmates on the use of CNNs to detect facial landmarks (points that define mouth, eyes, nose, eyebrows, etc.) for face recognition.

➗ Mathematical Foundations of ML Algorithms

My M. Sc. was very heavy on statistics. I have compiled some interesting implementations I did from scratch of various Machine Learning methods in jupyter notebooks. [🚧 under construction 🚧]

Others

Some other smaller or shared projects. [🚧 under construction 🚧]

  • 🦠 Epidemiological Data Analysis in R-language: I had access to a sweet German biomedical dataset, so here is a varied statistical analysis I performed to show my R skills.
  • 💲 Impact of Paid Subscriptions in Physician Platforms: in 2020, I conducted an investigation with a couple classmates about whether paying a subscription has a significant effect on the ratings of medical physicians in the platform Jameda.
  • 🗃️ ETL tools for Master Data Management: brief literature survey of ETL and MDM with a fictional demonstration of some use cases.

Popular repositories Loading

  1. promptbook promptbook Public

    Python 26 11

  2. genimex genimex Public

    Generative Immobilien Exposés

    Python 1

  3. biomedical biomedical Public

    Forked from bigscience-workshop/biomedical

    Tools for curating biomedical training data for large-scale language modeling

    Python

  4. FairEval FairEval Public

    Forked from rubcompling/FairEval

    Fair evaluation and detailed error analysis of labeled spans

    Python

  5. scispacy scispacy Public

    Forked from allenai/scispacy

    A full spaCy pipeline and models for scientific/biomedical documents.

    Python

  6. nachollorca nachollorca Public

  翻译: