Sheron Yang
About
Contact
Light
Dark
Projects
Hi there!
I'm Sheron Yang
When you are reading this I'm siping coffee.
And maybe coding ;)

Hello! If you’ve clicked on this, we probably share a similar curiosity about the world — and I’m glad you’re here. I’m a sophomore at Wellesley College studying Computer Science and Mathematics, with a strong interest in coding, modeling, and developing quantitative trading strategies. I’m always excited to explore new ideas and collaborate on interesting projects, so feel free to reach out if you’d like to connect or work together.

Linkedin
Github
Mail


Click here to see the sourcecode of this site!


Research on ANN Filtered Vector Search
ML / DS - Affiliated with MITCSAIL - 2025

Text Readability Prediction via Advanced NLP
ML / DS - Affiliated with BreakThroughTech AI - 2025

Wellesley College Hackathon Official Site
Frontend - 2025

ADHD Diagnosis for Women
ML / DS - Kaggle "Women in Data Science" Hackathon Top 10% Students - 2025

EcoBear, a Web Widget
Frontend - Wellesley College Designathon Winning - 2024

China's Hog Futures Research
Paper - ICEMGD 7th - 2023

Subsidy Impact on China's EV Uptake
Paper - S.-T. Yau Science Award Nominated - 2022

The Impact of Subsidy on EV Adoption: Evidence from China

*Click here to see project details.

My team and I investigated the impact of subsidy policies on EV adoption in China using city–month data from 15 cities between 2016 and 2019. We applied panel regression and instrumental variables—specifically the local-to-national subsidy ratio—to address endogeneity. Results show that a 10,000 Yuan increase in subsidies boosts EV sales by 6–11%, controlling for GDP, income, and charging infrastructure. We chose this period for its stable, consumer-focused policies. While subsidies significantly increased adoption, we also found that local industrial dynamics and consumers’ expectations of future policies influenced results. Due to limited data granularity and transparency, we recommend future research explore interactions among multiple policy tools and variations in consumer responses.


How Will the Hog Futures Smooth Price Fluctuations in China’s Pig Market

*Click here to see project details.

My team and I researched how hog futures could reduce price volatility in China’s pig market, which has long faced cyclical fluctuations. Using the herding effect and cobweb model, we showed how irrational decisions and supply lags drive instability. We argued that hog futures provide essential functions—price discovery, hedging, vertical integration, and support for large-scale production—that can stabilize the market. Through theoretical analysis and a case study of Muyuan Food Co., we found that major producers began using futures to manage risk and inform production. However, high capital barriers and technical complexity limited small-scale farmer participation. Although China’s hog futures market remained nascent, we concluded it holds strong potential to enhance market stability, reduce volatility, and improve long-term risk management across the supply chain.


EcoBear, a Web Widget for Sustainability and Green Development

*Click here to see project details.

My friends and I built EcoBear in a 2-day hacakthon, a web-widget to bridge the gap between consumers’ sustainable intentions and their actual shopping behavior. Most shoppers wanted to buy green but lacked clear, trustworthy information and were overwhelmed by greenwashing and convenience. Our tool integrated directly into shopping websites, automatically scanning products and displaying instant sustainability ratings using color-coded labels—green for sustainable, red for unsustainable. Users could click to learn more about certifications and evaluation criteria.
To drive engagement, we gamified the experience with rewards and a friendly polar bear mascot. Eco-conscious users appreciated the badges and rewards, while casual shoppers responded well to clear alternatives and incentives. From our testing, we learned to make sustainability cues more prominent and reduce guilt-driven nudging.


Unraveling the Mysteries of the Female Brain: Sex Patterns in ADHD

*Click here to see project details.

As a participant in the WiDS Datathon 2025, I was to analyze fMRI brain imaging data to uncover how biological sex influenced the presentation of ADHD. Me and my teammates analyzed neuroimaging (fMRI) data from over 1,000 pediatric subjects to predict biological sex and ADHD diagnosis. We engineered advanced statistical features and applied principal component analysis (PCA) for dimensionality reduction. Using Python and R, we built and evaluated predictive models including convolutional neural networks (CNN), random forests, and logistic regression and contributed to advancing research into sex-specific neurobiological patterns associated with ADHD.


WHACK 2025: Wellesley College Annual Hackathon's Official Website

*Click here to see project details.

I led a team of four to create a full hackathon platform from scratch — including the main site, past event archive, merch designs, and all visual assets. It simplified everything from registration to judging and ran smoothly with 100+ active users during the event.


Readability Assessment for Educational Texts Using Advanced NLP

*Click here to see project details.

Our team proposed a machine learning model that more accurately predicts the reading difficulty of educational texts than traditional formulas like Flesch-Kincaid or Lexile. These formulas rely on shallow features and often miss deeper semantic and syntactic complexity. Using datasets such as the CommonLit Readability Prize and CLEAR Corpus, we planned to combine baseline models (Ridge, Lasso) with advanced approaches like TF-IDF with LightGBM and BERT embeddings. Our model aimed to incorporate cohesion and syntactic features, evaluated using RMSE, to improve both accuracy and transparency in readability assessment for grades 3–12.


Research on ANN Filtered Vector Search

*Click here to see project details.

Me and my colleague at MITCSAIL and developed a scalable filtered ANN system that integrates graph-based indices and IVF structures to efficiently handle up to 100 million vectors and over 200,000 labels. Our system achieves 5× faster query speeds at 90% recall compared to ParlayIVF2 on SIFT100M and YFCC10M benchmarks, outperforming Filtered DiskANN, NHQ, and UNG. We optimized the C++ backend using SIMD filtering, cache-aware prefetching, flattened memory layouts, and bitset label masks, achieving near-linear scalability across 32 threads with dynamic scheduling and query reordering. I am currently leading independent research on memory-efficient vector quantization, integrating advanced algorithms like SymphonyQG and HNSW-Flash to further improve graph-based ANN performance.