Harold Ng Banner

Harold Ng

Data Scientist & ML Engineer

Get in Touch
Harold Ng

About Me

Hi, I'm Harold Ng, a data scientist and machine learning enthusiast who believes in the transformative power of data and creativity. While my technical expertise and professional achievements shape much of what I do, I'm equally driven by a curiosity for innovative problem-solving and a passion for exploring diverse perspectives.

Beyond crunching numbers and optimizing models, I thrive on creative expression. Whether it's crafting compelling video content, designing AI-generated art, or building a faceless YouTube channel, I enjoy merging technical skills with storytelling to captivate audiences in new and exciting ways. My background in competitive swimming and team sports has also instilled a strong sense of discipline, adaptability, and resilience—qualities I bring to every challenge I take on.

In my downtime, you'll find me experimenting with new tools, exploring the latest advancements in AI, or hitting the basketball court. I believe that a well-rounded life enhances professional growth, and I'm always on the lookout for ways to connect seemingly unrelated interests into meaningful contributions.

Let's create something impactful—whether that's unlocking insights from data, exploring the creative potential of AI, or reimagining what's possible.

Technical Skills

Programming Languages

PythonSQLRC++JavascriptHTMLCSS3MATLAB

Tools & Frameworks

TableauMatplotlibScikit-LearnTensorFlowPandasNumpySciPyPostgreSQLMongoDBSnowflakeJupyter NotebooksCassandraNeo4jApache SparkApache Flink

Experience

Data Analytics Intern - Cooper Vision

San Ramon, CA | Jun 2024 - Nov 2024

  • Leveraged Python with NumPy and Pandas to develop a Fourier Transform-based model, removing seasonality from over 523,000 order records and enhancing trend analysis, which led to a 26.3% increase in operational efficiency and real-time performance alerts for more than 115 sales representatives.
  • Built an automated data cleaning pipeline in Python, generating datasets that revealed true demand patterns and improved forecasting accuracy by 19.7%, enabling better decision-making across 5 departments.
  • Conducted sentiment analysis on 10,324 customer feedback entries using Python and NLP techniques, producing actionable insights that drove a 14.8% improvement in customer satisfaction and guided product development.
  • Utilized SQL and Tableau to clean, analyze, and visualize datasets with over 1.2 million rows, improving data accessibility and supporting data-driven forecasting accuracy for cross-functional teams.

Education

University of California, Irvine

Master of Data Science

Graduation: Dec 2024

University of California, Los Angeles

B.S. in Applied Mathematics w/ Specialization in Computing

Graduation: Jun 2023

Research Projects

Sentiment-Enhanced Recipe Scoring System

Project Lead

  • Performed sentiment analysis and data augmentation on a dataset of 17,500+ recipe reviews using Python, achieving a 13.7% improvement in predictive accuracy for scoring and enhancing recommendation relevance.
  • Utilized VADER and TextBlob libraries to compute polarity and subjectivity scores for each review, analyzing over 50,000 individual sentiment attributes to refine user preference insights.
  • Implemented and optimized Multilayer Perceptron (MLP) Regressor and Gradient Boosting Regressor models, achieving an 18.4% reduction in mean squared error compared to baseline models, leveraging a dataset of nearly 18,000 reviews and over 50 engineered features.

Collaborative Analysis of Knowledge Base Transition Efficiency with Microsoft

Project Lead

  • Collaborated with Microsoft to evaluate the transition to a single-source knowledge base, conducting exploratory data analysis (EDA) on 60,000+ support case records using Python, SQL, and Tableau. Identified key metrics, including investigation time and clicks, achieving a 4.14% decrease in investigation time and a 65.16% reduction in clicks post-transition.
  • Developed and validated statistical models, including Generalized Linear Models (GLMs) and linear regression, to assess the significance of transitional periods. Incorporated organizational differences as predictors, explaining up to 46.8% of the variance in investigation time.

Get in Touch