• Collected data from websites and leverage Beautiful Soup and Regex to extract key information
• Led a Text Mining project using gigabytes of text data to predict rare financial outcome. Use NLP tools such as NLTK, spaCy, and gensim to preprocess data for keyword search and Tf-idf weighting.
• Improved data quality by handling imbalanced data and applied feature selection techniques.
• Developed supervised machine learning models such as SVM, Logistic Regression, Naive Bayes, and Random Forest. Created pipeline to streamline the process and optimized hyperparameters, achieving a well-received model with 80%+ accuracy and 70%+ recall.