Expertise

Statistical Inference & Probability

Statistical Inference

Related Coursework: DTSA 5001 – Data Science Foundations: Statistical Inference
Developed a strong foundation in probability theory essential for data science. Covered key concepts including permutations, combinations, conditional probability, Bayes’ theorem, expectation, and variance.
What I Know

- Fundamentals of probability theory and its application to data problems
- Joint, marginal, and conditional probability
- Permutations and combinations in probabilistic modeling

Project: Built a simulation-based model in Python to estimate probabilities of complex real-world scenarios (e.g., card draws, A/B test outcomes), verifying theoretical results with empirical outcomes.

Statistical Inference for Estimation

Related Coursework: DTSA 5002 – Statistical Inference for Estimation in Data Science
Focused on parameter estimation methods using real-world data.

What I Know

Maximum Likelihood Estimation (MLE) and method of moments
Bias, variance, and mean squared error as evaluation metrics
Confidence intervals for population parameters

Project: Estimated parameters of a normal distribution using MLE on financial market returns. Implemented the estimation process in R and visualized confidence intervals, comparing empirical and theoretical distributions.

Hypothesis Testing

Related Coursework: DTSA 5003 – Statistical Inference and Hypothesis Testing in Data Science Applications
Concentrated on statistical hypothesis testing for data-driven decision-making.

What I Know

One-sided and two-sided hypothesis tests
Uniformly Most Powerful (UMP) tests
p-values, Type I and Type II errors, and power analysis

Project: Conducted a hypothesis test comparing average Reddit sentiment scores for two time periods to identify shifts in community opinion on specific stocks. Applied both classical and simulation-based techniques to validate findings.

Data Mining

The Data Mining Pipeline

Related Coursework: DTSA 5504/CSCA 5502 – Data Mining Foundations and Practice
Explored the complete data mining lifecycle including data collection, cleaning, transformation, and storage in preparation for downstream analytics.

What I Know

Built end-to-end data pipelines for mining large datasets
Techniques for data preprocessing: handling missing values, normalization, transformation
Importance of scalability, data quality, and automation in pipeline design

Project:
Created a data pipeline to ingest and process jiu-jitsu competition data from multiple sources (bjj.university, bjj.tips). Automated collection and transformation, then loaded data into BigQuery for analysis of technique frequency and athlete trends.

Data Mining Methods

Related Coursework: DTSA 5505/CSCA 5512 – Data Mining Methods
Studied core data mining algorithms and techniques, focusing on supervised and unsupervised learning.

What I Know

Applied classification, regression, and clustering techniques
Evaluated models using confusion matrices, ROC curves, and cross-validation
Gained hands-on experience with algorithms like KNN, decision trees, and K-means

Project:
Built a classification model to predict Reddit sentiment (positive, neutral, negative) based on post content. Compared performance of logistic regression, decision trees, and random forests using scikit-learn.

End-to-End Data Mining Project

Related Coursework: DTSA 5506/CSCA 5522 – Data Mining Project
Applied end-to-end data mining techniques on a self-directed project, synthesizing pipeline and modeling skills.

What I Know

Full-cycle data mining: data collection, preprocessing, modeling, evaluation
Feature engineering, hyperparameter tuning, and cross-validation
Communicating insights effectively through visualization and reporting

Project:
Developed a time series forecasting model to predict short-term stock price movement using Reddit mention volume and sentiment. Used VADER for sentiment scoring, engineered lag features, and tested ARIMA and LSTM models to assess predictive performance.

Statistical Modeling

Modern Regression Analysis in R

Related Coursework: DTSA 5011 – Statistical Modeling for Data Science Applications
Focused on developing and interpreting linear regression models using R.

What I Know

Simple and multiple linear regression
Model assumptions, diagnostics, and residual analysis
Variable selection, multicollinearity, and interaction effects

Project: Analyzed factors influencing user engagement on a content website by fitting multiple linear regression models. Explored feature importance, multicollinearity, and visualized predictions using ggplot2 in R.

ANOVA and Experimental Design

Related Coursework: DTSA 5012 – ANOVA and Experimental Design
Covered statistical methods for comparing group means and designing experiments to assess treatment effects.

What I Know

One-way and two-way ANOVA
Randomized design principles
F-tests, post-hoc comparisons, and interpreting interaction effects

Project: Designed and analyzed a simulated A/B/C test to evaluate content strategies. Used one-way ANOVA to detect significant differences in user behavior across groups and applied Tukey’s HSD for post-hoc analysis.

Generalized Linear Models and Nonparametric Regression

Related Coursework: DTSA 5013 – Generalized Linear Models and Nonparametric Regression
Expanded on classical linear models to handle non-normal data distributions and non-linear relationships.

What I Know

Generalized linear models (GLMs) for binary and count data
Link functions (logit, log, etc.) and distribution families
Intro to nonparametric regression (e.g., LOESS, splines)

Project: Built a logistic regression model to predict binary classification outcomes on a marketing dataset. Used GLMs to model conversion likelihood based on user attributes and compared performance with nonparametric smoothing techniques.

Machine Learning

Supervised Learning

Related Coursework: DTSA 5509/CSCA 5622 – Introduction to Machine Learning
Introduced core concepts and algorithms for supervised learning with a focus on classification and regression tasks.

What I Know

Algorithms: logistic regression, decision trees, random forests, support vector machines (SVMs)
Model training, validation, and evaluation using metrics like accuracy, precision, recall, and F1-score
Cross-validation, hyperparameter tuning, and overfitting prevention

Project: Developed a classification model to predict stock price movement direction using Reddit sentiment scores. Compared logistic regression, decision trees, and random forest models to assess predictive power.

Unsupervised Learning

Related Coursework: DTSA 5510/CSCA 5632 – Unsupervised Algorithms in Machine Learning
Focused on uncovering structure in unlabeled data using clustering and dimensionality reduction techniques.

What I Know

K-means, DBSCAN, hierarchical clustering
Principal Component Analysis (PCA) and t-SNE for dimensionality reduction
Evaluating clusters using silhouette score and inertia

Project: Used unsupervised learning to cluster Reddit posts based on word embeddings. Visualized topic clusters with PCA and t-SNE to explore emerging market themes and investor sentiment.

Deep Learning

Related Coursework: DTSA 5511/CSCA 5642 – Introduction to Deep Learning
Explored the fundamentals of neural networks and deep learning architectures.

What I Know

Feedforward neural networks, backpropagation, and activation functions
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs)
Frameworks: TensorFlow and Keras

Project
Trained a basic neural network on textual Reddit data to classify post sentiment. Experimented with architecture depth and dropout regularization, comparing performance against traditional models.

Generative AI

Related Coursework: CSCA 5112 – Introduction to Generative AI
Provided a foundational overview of generative models and their applications in modern AI systems, including text, image, and audio generation.

What I Know

Core concepts: transformers, diffusion models, GANs, and autoregressive models
Use cases in text generation, image synthesis, and conversational AI
Ethical implications, risks, and biases in generative systems

Project: Fine-tuned a GPT-based language model using custom text data to generate topic-specific summaries. Evaluated coherence, relevance, and bias, and explored prompt engineering techniques to guide outputs effectively.

Core Competencies in Data Science

Foundations of the Data Science Discipline

Related Coursework: DTSA 5301 – Data Science as a Field
Introduced the data science lifecycle, interdisciplinary nature of the field, and key roles and responsibilities in real-world data teams.

What I Know

Overview of the data science workflow: problem definition, data collection, analysis, and communication
Interplay between statistics, computer science, and domain expertise
Importance of reproducibility, documentation, and collaboration

Project: Created a case study presentation outlining how a data science team might approach optimizing conversion rates for a digital product using user behavior data and A/B testing.

Cybersecurity for Data Science

Related Coursework: DTSA 5302 – Cybersecurity for Data Science
Examined best practices for maintaining data privacy, integrity, and security in data pipelines and analytics workflows.

What I Know

Secure handling of sensitive and personally identifiable information (PII)
Common data vulnerabilities and threat models
Regulatory frameworks (e.g., GDPR, HIPAA) relevant to data professionals

Project: Performed a privacy risk assessment of a mock data pipeline to identify weak points in data encryption, access control, and anonymization practices.

Ethical Issues in Data Science

Related Coursework: DTSA 5303 – Ethical Issues in Data Science
Explored the ethical responsibilities of data scientists and the impact of algorithmic decisions on individuals and society.

What I Know

Bias in data and algorithms
Fairness, accountability, and transparency in modeling
Ethical dilemmas in predictive analytics and surveillance

Project: Analyzed a real-world case of algorithmic bias (e.g., facial recognition or hiring algorithms) and proposed an ethical framework for improving transparency and fairness.

Fundamentals of Data Visualization

Related Coursework: DTSA 5304 – Fundamentals of Data Visualization
Focused on effectively communicating data insights through clear, compelling visualizations.

What I Know

Principles of visual perception, design, and storytelling
Choosing the right chart type for the message
Tools: matplotlib, seaborn, ggplot2, and interactive dashboards

Project: Designed an interactive dashboard using Plotly and Dash to explore sentiment trends in Reddit posts over time, integrating filters and hover features to enhance user experience.

Data Management & Database Systems

Relational Database Design

Related Coursework: DTSA 5733 – Relational Database Design
Explored the fundamentals of designing scalable, efficient relational databases tailored for analytical workflows.

What I Know

Entity-relationship modeling and schema normalization
Primary/foreign keys, integrity constraints, and referential integrity
Trade-offs between normalization and performance in data warehousing

Project: Designed a normalized relational schema to support an analytics dashboard for tracking jiu-jitsu techniques and athlete performance over time. Integrated the design with PostgreSQL for live querying.

The Structured Query Language (SQL)

Related Coursework: DTSA 5734 – The Structured Query Language (SQL)
Gained proficiency in SQL for querying, manipulating, and aggregating structured data in relational databases.

What I Know

Advanced SQL queries: joins, subqueries, window functions, CTEs
Aggregation, filtering, and conditional logic for analytics
Performance tuning with indexing and query optimization

Project: Built complex queries to extract trends and insights from a Google BigQuery dataset consisting of stock market and Reddit mention data. Used CTEs and window functions to calculate moving averages and detect anomalies.

Advanced Topics and Future Trends in Database Technologies

Related Coursework: DTSA 5735 – Advanced Topics and Future Trends in Database Technologies (Elective)
Examined evolving technologies in data storage and management, including NoSQL databases and cloud-native solutions.

What I Know

Comparison of relational and non-relational (NoSQL) database models
Introduction to distributed databases and horizontal scaling
Trends in real-time data processing and cloud-based data architectures

Project: Prototyped a hybrid data architecture using BigQuery and Firestore to manage both structured and semi-structured data. Designed the system to support future integration with real-time analytics tools.

Algorithms & Data Structures

Dynamic Programming and Greedy Algorithms

Related Coursework: CSCA 5414 / DTSA 5503 – Foundations of Data Structures and Algorithms
Introduced essential algorithmic strategies for solving complex optimization problems efficiently.

What I Know

Design and implementation of dynamic programming solutions
Greedy algorithms and their correctness analysis
Time and space complexity analysis using Big-O notation

Project: Implemented dynamic programming and greedy algorithms to solve a set of optimization challenges, including longest common subsequence and interval scheduling. Evaluated trade-offs between approaches and benchmarked performance in Python.

Approximation Algorithms and Linear Programming

Related Coursework: CSCA 5424 – Approximation Algorithms and Linear Programming
Focused on near-optimal solutions for NP-hard problems and optimization using linear programming.

What I Know

Formulation and solving of linear programs using simplex and duality
Design of approximation algorithms with provable guarantees
Applications to scheduling, graph problems, and resource allocation

Project: Formulated a linear program to optimize ad placements on a content website and developed an approximation algorithm to handle scalability for larger input sets. Visualized solution efficiency across test scenarios.

Advanced Data Structures, RSA, and Quantum Algorithms

Related Coursework: CSCA 5434 – Advanced Data Structures, RSA and Quantum Algorithms
Explored advanced algorithmic techniques and emerging concepts in cryptography and quantum computation.

What I Know

Advanced data structures: tries, AVL trees, heaps, and hash maps
RSA encryption algorithm and number theory foundations
Introduction to quantum computing concepts and quantum search algorithms (e.g., Grover’s algorithm)

Project: Simulated RSA encryption and decryption using Python, applying modular arithmetic and prime generation. Also explored quantum algorithm principles through pseudocode implementation and comparison to classical search methods.

Network Systems & Infrastructure

Network Systems Foundation

Related Coursework: CSCA 5063 – Network Systems Foundation
Introduced the principles and architecture of modern computer networks, focusing on how data is transmitted, routed, and secured across systems.

What I Know

OSI and TCP/IP networking models
IP addressing, subnetting, and routing protocols
Basics of network performance, congestion, and fault tolerance

Project: Mapped and analyzed network traffic flows in a simulated enterprise environment using Wireshark. Evaluated protocol layers, packet loss, and latency to diagnose performance bottlenecks.

Linux Networking

Related Coursework: CSCA 5073 – Linux Networking
Hands-on exploration of network configuration, monitoring, and troubleshooting in Linux-based systems.

What I Know

Network interface configuration, firewall rules, and SSH tunneling
Tools such as netstat, ip, tcpdump, and iptables
Basics of shell scripting for network diagnostics and automation

Project: Configured a virtual Linux server to host and secure a basic web application. Set up port forwarding, implemented firewall rules with iptables, and created scripts to monitor uptime and network usage.

Statistical Learning

Regression and Classification

Related Coursework: DTSA 5020 – Regression and Classification
Focused on applying statistical models for supervised learning, emphasizing the mathematical underpinnings and interpretability of model output.

What I Know

Linear and logistic regression for prediction and classification
Model assessment with training/test splits, confusion matrices, and ROC curves
Bias-variance tradeoff and regularization techniques (Lasso, Ridge)

Project: Built logistic regression and Lasso models to predict customer churn based on behavioral data. Evaluated accuracy and interpretability using coefficient analysis and cross-validation.

Resampling, Variable Selection, and Splines

Related Coursework: DTSA 5021 – Resampling, Selection and Splines
Explored advanced model tuning techniques and flexible function fitting for non-linear relationships.

What I Know

Cross-validation (k-fold, LOOCV), bootstrapping for variance estimation
Variable selection techniques: stepwise, Lasso, Ridge
Polynomial regression and smoothing splines for modeling complex trends

Project: Applied cross-validation and spline regression to model housing prices with non-linear features (e.g., square footage vs. price). Compared performance with standard linear models and visualized residual patterns.

Tree-Based Methods, SVM, and Unsupervised Learning

Related Coursework: DTSA 5022 – Trees, SVM, and Unsupervised Learning
Integrated both supervised and unsupervised learning techniques with a focus on model robustness and interpretability.

What I Know

Decision trees, random forests, and boosting methods
Support Vector Machines (SVM) with various kernels
K-means clustering and hierarchical clustering for unsupervised analysis

Project: Compared decision trees, random forests, and SVMs for classifying sentiment in Reddit posts. Used unsupervised clustering to identify emerging topics and validate classification boundaries.

Software Architecture for Big Data

Fundamentals of Software Architecture for Big Data

Related Coursework: DTSA 5507 / CSCA 5008 – Fundamentals of Software Architecture for Big Data
Introduced foundational principles of software architecture with a focus on scalability, modularity, and performance in big data systems.

What I Know

Core architectural components: services, APIs, storage, and compute layers
Trade-offs in design: consistency vs. availability, batch vs. stream processing
Key concepts: fault tolerance, load balancing, and system resilience

Project: Designed a high-level architecture for a Reddit sentiment analysis pipeline. Mapped out ingestion, preprocessing, storage, and model inference stages with considerations for scalability and modular design.

Software Architecture Patterns for Big Data

Related Coursework: DTSA 5508 / CSCA 5018 – Software Architecture Patterns for Big Data
Explored common architectural patterns and paradigms used in large-scale data systems.

What I Know

Lambda and Kappa architectures for real-time and batch data processing
Microservices vs. monolithic architecture trade-offs
Event-driven, layered, and service-oriented patterns

Project: Prototyped a Lambda-style architecture using Cloud Functions and BigQuery to process and analyze high-volume Reddit data streams. Incorporated Pub/Sub messaging for real-time data handling.

Applications of Software Architecture for Big Data

Related Coursework: DTSA 5714 / CSCA 5028 – Applications of Software Architecture for Big Data
Applied architectural principles to real-world data problems, emphasizing system integration and end-to-end data flow.

What I Know

End-to-end design and deployment of scalable data pipelines
Integration of storage systems (e.g., BigQuery, Firestore) with application layers
Monitoring, logging, and performance optimization strategies

Project: Developed a full-stack pipeline to analyze Reddit sentiment and stock market data. Integrated APIs, scheduled ETL jobs, and built dashboards to present insights—emphasizing maintainability and modularity in the system design.

Page updated

Google Sites

Report abuse