Lakshmi Bharathi
Data Scientist

Lakshmi Bharathi Ilangovan

About Me

Lakshmi Bharathi

Welcome!

I'm a Data Scientist with nearly 2 years of experience. I work with Python, SQL, Machine Learning, Generative AI, Deep Learning, PySpark, Big Data Engineering, and Cloud Technologies to build systems that process and analyze information at scale.

I'm interested in solving problems where data can make a real difference.

🎓 Rajalakshmi Engineering College, Chennai

Bachelor of Technology  |  Aug 2019 – May 2023  |  GPA: 8.99 / 10

Professional Experience

Data Scientist

Roni Analytics, Chennai

Mar 2025 – Mar 2026

  • Led team to build an AI-powered prediction market for platforms like Polymarket and Kalshi; ingested real-time tweets and scored sentiment using GPT-4o mini, combined with quantitative features (price momentum, volatility, liquidity) to train an XGBoost classifier on thousands of resolved markets; applied Platt Scaling to calibrate probability outputs and derived 81% trading threshold through P&L backtesting, implemented 4 rule-based execution strategies delivering automated trading signals with stop loss and take profit.
  • Architected a conversational AI agent for PandaTerminal, enabling real-time crypto market analysis through natural language by integrating 17 live data tools via MCP on AWS Bedrock AgentCore with Claude 3.7 Sonnet; enforced secure tool access via SigV4 authentication, implemented per-user memory persistence, RAG-based knowledge retrieval, and observability using Langfuse and OpenTelemetry.
  • Designed and maintained a blockchain data platform using Medallion Architecture; built 15+ production PySpark ETL pipelines processing 20M+ daily records with 99% uptime, delivered 20+ custom metrics (balances, TVL, active wallets), and optimized 50+ Dune SQL queries using CTEs, window functions, and indexing, reducing infrastructure costs by 50%, query runtime by 50%, and credits consumption by 40%.
  • Implemented real-time volatility indicator processing 500+ DEX pairs across 4+ blockchains with threshold-based signal detection achieving 80% buy/sell accuracy; built backtesting framework validating performance, deployed Telegram bot delivering signals to 200+ subscribers.
  • Established automated data quality monitoring system validating 25+ DeFi protocol TVLs and OHLCV data across 3+ blockchains with anomaly detection reducing data quality issues by 98%, delivering alerts via email and Telegram bot.

Intern – Big Data & Data Engineering

LTIMindtree, Remote

May 2024 – Aug 2024

  • Completed comprehensive Big Data Engineering program with hands-on training in PySpark, SQL for relational databases, data warehousing architecture, pandas, cloud computing fundamentals, and Power BI for business intelligence and data visualization.

Project Intern – Deep Learning

Indian Institute of Technology Madras

Jan 2023 – Aug 2023

  • Developed a CNN-based deep learning model for early-stage lung cancer detection on 5,000+ CT scans using transfer learning with VGG16; applied fine-tuning and strategic layer unfreezing, achieving 89.4% F1 score, 90.6% Recall, 88.2% Precision, and 93.7% AUC-ROC on held-out test set, reducing training time by 35%.
  • Applied advanced image preprocessing techniques including resizing, normalization, contrast enhancement, noise reduction, and augmentation; optimized model via hyperparameter tuning, batch normalization, and adaptive learning rate strategies, outperforming the InceptionV3 baseline by 2.0% in F1 score, enhancing generalizability for clinical deployment.

Research Intern – Cancer Genomics

Centre for Stem Cell and Cancer Genomics, AMI Bioscience, Coimbatore

Aug 2022 – Sep 2022

  • Analyzed 500+ genomic datasets using Python (pandas, Biopython) and bioinformatics tools (BLAST, GSEA) identifying 10+ potential therapeutic targets; executed data preprocessing pipelines, differential expression analysis, and pathway enrichment visualizing 20+ genome pathways.
  • Collaborated with research teams contributing to a peer-reviewed publication on cancer genomics and stem cell therapies.

Skills & Technologies

💻 Languages

Python, SQL (PostgreSQL, MySQL)

🤖 AI & LLM

LLM Integration, AI Agent Development, RAG (Retrieval Augmented Generation), MCP (Model Context Protocol), AWS Bedrock, Prompt Engineering, Langfuse, OpenTelemetry

🧠 Machine Learning

Classification, Logistic Regression, Decision Tree, XGBoost, Deep Learning (CNNs), Transfer Learning, Hyperparameter Tuning, Feature Engineering, Imbalanced Data Handling, Model Evaluation & Monitoring

⚡ Big Data & Cloud

PySpark, Databricks, AWS (S3, Lambda, RDS, EC2, SNS), ETL/ELT Pipelines, Data Warehousing, Medallion Architecture, Delta Lake, Docker, Batch Processing, Partitioning, Caching, Git

📊 Data Science

Pandas, NumPy, Time Series Analysis, Statistical Analysis, A/B Testing, Hypothesis Testing, Anomaly Detection, API Integration, Data Validation

📈 Visualization

Matplotlib, Seaborn, Plotly, Power BI, Streamlit

Featured Projects

🤖 PandaTerminal – Conversational AI Agent

AWS Bedrock · Claude 3.7 Sonnet · MCP · RAG · Langfuse · OpenTelemetry

A production-grade AI agent that lets users query live crypto market data through plain English — backed by 17 live data integrations, retrieval-augmented knowledge, and per-session memory.

  • 17 live data tools via Model Context Protocol on AWS Bedrock AgentCore
  • Secured via AWS SigV4 authentication; per-user memory persistence
  • RAG-based knowledge retrieval for contextual market insights
  • Full observability pipeline with Langfuse and OpenTelemetry

📊 ML Prediction Market Analytics Platform

Python · XGBoost · Platt Scaling · GPT-4o mini · Polymarket/Kalshi APIs · Streamlit

An end-to-end platform that aggregates 500+ prediction markets, enriches them with live news and sentiment, and applies ML to forecast outcomes with 4 automated trading strategies.

  • 81% trading threshold derived via P&L backtesting across simulated trades
  • 4 quantitative strategies: arbitrage, momentum, tail risk, volume
  • Platt Scaling applied to calibrate XGBoost probability outputs
  • Streamlit dashboard with <5s signal refresh

⛓️ Blockchain Data Analytics Platform

PySpark · Databricks · AWS · Medallion Architecture · Delta Lake

A scalable data platform ingesting raw on-chain events from Ethereum, Base, and Solana, transforming them into analytics-ready tables handling 20M+ records daily.

  • 15+ production pipelines at 99% uptime, 50% cost reduction via Spark optimization
  • 20+ custom PySpark metrics for protocol-level analytics
  • 50+ Dune SQL queries optimized — 50% faster runtime, 40% fewer credits
  • Centralized multi-chain data warehouse with quality validation

📡 Real-Time Volatility Signal System

Python · Multi-threading · Backtesting · Telegram Bot API

A live signal engine scanning 500+ DEX pairs across 4+ blockchains for volatility breakouts, generating buy/sell alerts pushed to subscribers via Telegram.

  • 80% buy/sell prediction accuracy across 500+ DEX pairs
  • Backtesting framework validated across 6-month historical data
  • Telegram bot delivering signals to 200+ subscribers
  • Threshold-based statistical anomaly detection for signal generation

🫁 Lung Cancer Detection – Deep Learning

CNN · TensorFlow · Keras · VGG16 · Transfer Learning

A clinical deep learning model trained to detect early-stage lung cancer in CT scans, fine-tuning VGG16 with strategic layer unfreezing evaluated on a held-out test set.

  • 89.4% F1 score, 90.6% Recall, 88.2% Precision, 93.7% AUC-ROC on held-out test set
  • 35% training time reduction via VGG16 transfer learning over InceptionV3 baseline
  • Outperformed InceptionV3 baseline by 2.0% in F1 score
  • Advanced preprocessing: normalization, contrast enhancement, augmentation, noise reduction

💳 Credit Card Default Prediction

XGBoost · Logistic Regression · Decision Tree · scikit-learn · Python

A binary classification model predicting customer defaults on a heavily imbalanced dataset (78:22), solved through class weighting and targeted feature engineering.

  • 0.871 PR-AUC on 30K customers with 78:22 class imbalance
  • 4 behavioral risk features engineered — 9% improvement over Logistic Regression
  • Payment history identified as dominant predictor (35% feature importance)
  • Supports targeted interventions for high-risk customer segments

🏦 Loan Approval Prediction System

Logistic Regression · XGBoost · Decision Tree · scikit-learn

An automated loan decisioning system with a three-tier approval framework routing clear approvals and rejections automatically, reserving human review for borderline cases.

  • 0.907 ROC-AUC and 88.3% Precision on 2,000 loan applications
  • Three-tier framework: Approve / Review / Reject
  • 8 demographic and financial risk features engineered
  • Cost-sensitive evaluation to minimize high-cost false approvals

🔍 Automated Data Quality Monitoring

Python · Pandas · Asyncio · Statistical Analysis · Telegram Bot

A monitoring system continuously validating DeFi protocol data across multiple blockchains, running anomaly detection concurrently and firing alerts the moment something looks wrong.

  • 98% reduction in data quality issues
  • Concurrent processing with pandas and asyncio across 25+ DeFi TVLs
  • Statistical anomaly detection for outliers and missing data
  • Daily email alerts and Telegram bot with CSV reports

Certifications & Awards

🏆 HackerRank SQL (Advanced)

Advanced SQL certification demonstrating mastery of complex queries and performance optimization.

🏅 LeetCode Top 50 SQL Badge

Completed LeetCode's Top 50 SQL problems covering joins, subqueries, and window functions.

📜 Python/Bash/SQL for Data Engineering

Coursera certification in Python, Bash scripting, and SQL essentials for data engineering workflows.

🌟 Innovative Leadership Award

Awarded for leadership as Rotaract Club President, driving community and professional initiatives.

🏆 Best Poster Award

Recognized at International Conference on Infectious Diseases for research poster presentation.

Let's Connect

Whether you're looking to collaborate on a project, explore a new opportunity, or just talk data science — I'd love to hear from you.