About Me
Welcome to my digital space! I'm an enthusiastic data scientist with a strong focus on NLP and a passion for leveraging data-driven insights to solve complex challenges. With a Master's degree in Computer Science and a track record of successful research projects, I thrive on exploring the intersections of language and technology.
During my academic and professional path, I have sharpened my expertise in the field of data while gaining practical knowledge of robust machine learning frameworks and tools. I have successfully applied my skills in areas such as text classification, relation extraction, and data analysis across various industries, consistently uncovering significant patterns and achieving influential outcomes.
Driven by curiosity and a commitment to continuous learning, I am constantly exploring innovative approaches and staying up-to-date with the latest advancements in NLP. I thrive in collaborative environments, where I can contribute my expertise and work alongside diverse teams to drive data-driven decision-making.
With a knack for problem-solving and a creative mindset, I am eager to tackle new challenges and contribute to cutting-edge projects. If you're interested in exploring the possibilities of data science and NLP, let's connect and make an impact together. Together, we can unravel the complexities of language and unlock the full potential of data-driven solutions.
Previous Work Experience
Machine Learning (NLP) Engineer
Text Classification using NLP Techniques
Jan 2023 - Present
- Utilized NLP models on a 440k religious tags dataset from Google Maps API, which enhanced sub-categorization and anomaly detection, leading to a 15% reduction in false positives. Project Website
- Conducted feature engineering and vector embedding of textual data within a 330k philanthropic dataset, enhancing the pipeline's efficiency and accuracy in categorizing entities based on their mission statements. Project Website
- Led a team in migrating datasets to a MySQL database, skillfully integrating them with Flask and JavaScript to develop a public resource library. Created SQL schemas and procedures to optimize data management and retrieval efficiency.
Delegates Relation Extraction: NLP based approach
Sep 2022 - Jan 2023
- Collected and compiled a dataset of company acquisitions and mergers from 2013-20 by scraping press news.
- Achieved 96% accuracy in identifying person names and job titles using NER and rule-based approach.
- Future scope is to utilize unsupervised techniques to infer connections between person names and job titles.
Data Scientist
OpenSea policy change & Quora Monetization Study
Aug 2022 - Jan 2022
Data Scientist
Crypto Rewards in Fundraising: Evidence from Crypto Donations to Ukrain
May 2022 - July 2022
- Conducted data analysis of ETH and BTC cryptocurrency data and investigated effect of crypto rewards.
- Analyzed 812% increase in ETH donations count compared to BTC and consequent 30% drop in average donation amount in ETH.
Magnetization in Iron-based Compounds: A Machine Learning Model Analysis
Jan 2020 - July 2020
- Partnered with material physicists to develop ML models that assisted in the rapid pre-screening of 11k+ Iron compounds in search for a compound with strong magnetic properties.
- Optimized prediction time from days to fraction of seconds by implementing random forest model that predicted magnetic moment per atom and formation energy per atom of Iron compounds with correlation coefficients of 0.81 and 0.96; RMSE of 0.232 and 0.261 respectively.
Understanding magnetism in Manganese-based systems
Jun 2019 - July 2019
- Curated and analyzed dataset to identify magnetic behavior patterns in Manganese-based compounds.
- Apprehended that 60% of 12k Manganese compounds are Ferromagnetic with Mono-, Tri- or Ortho- structures, which contradicts the belief that high-magnetization inter-metallic Manganese compounds have a cubic structure.
Blender Game Engine
Jun 2018 - July 2018
- Utilized BGE and Python to develop multile interactive games.
- Performed scene rendering and model building using Blender Game Engine (BGE), enhancing visual quality and immersion in the game.
My Personal Projects
Few of my personal projects.
Unique Meal-plan
I applied word embedding techniques to a large recipe dataset, creating a dietary plan that substitutes allergenic ingredients. Using a specialized algorithm, I maintained nutritional value while replacing allergenic components and clustered ingredients into specific groups.
Graph Embedding
Represented our recipe dataset using graphs. Bipartite graph best represented our dataset with row names (ingredients) as one set of nodes and column names (nutritional values) as the other sets of nodes. Bidirectional edges imply the ingredient contains a particular nutritional value.Edge weights in the bipartite signify their respective normalized nutritional values. To segregate the nodes of the graph into two sets we used DFS node coloring algorithm.
Image Completion
Developed deep learning models to predict the complete human face when partially covered by COVID-19 masks. Inferred DCGANs outperformed CNN Auto-Encoders & Visual Transformers in reconstructing missing facial areas.
Sentiment Analysis
DImplemented feature extraction techniques including BOW, N-Gram, TF-IDF, and Word2Vec on the restaurant review dataset. Conducted a comparative analysis using classification algorithms (Naive Bayes, Decision Tree, SVM, KNN, and Random Forest). Notably, N-Gram (n=2) achieved a 7% increase in classification accuracy compared to BOW, TF-IDF, and Word2Vec.
Exploratory Data Analysis
EDA in R was employed to explore three datasets, uncovering valuable insights into data distributions, patterns, and relationships. By utilizing statistical techniques and visualization tools, EDA facilitated the identification of trends, outliers, and potential data quality issues, thereby informing subsequent analysisg.
Personalized Fashion Recommendation
The H&M Group comprises numerous brands and online marketplaces, offering a wide range of products. To enhance the shopping experience and promote sustainability, we provide personalized fashion recommendations based on customer preferences and historical transaction data.