Hi, I'm Kunj

Software Developer @ Barclays

Regulatory technology in office, data science at home and finance on Robinhood

About Me

My Introduction

I collaborate cross-functionally to build RegTech products currently associated with Market Risk.
With a Master's in Computer Science from Rutgers, I specialize in data science. With proficiency in Python, R, Java / C#, SQL, and AWS, I excel in end-to-end development.
In my spare time, I maintain a Medium blog with over 40,000 views, focusing on cutting-edge AI research and industry trends.

9 Data Projects
Completed

12 Articles
Written

2 Published
Papers

Download Resume

Skills

My Technical Level

Python

90%

Java

80%

PySpark

75%

R

70%

C++

40%

JavaScript

70%

Android

85%

MS Excel

70%

Photoshop

70%

Indesign

90%

NumPy

80%

pandas

90%

matplotlib

70%

scikit-learn

85%

Spark MLlib

70%

Pytorch

85%

Deep Graph Library

55%

OpenCV

65%

Pillow

65%

NLTK

60%

streamlit

80%

seaborn

70%

Flask

40%

Linear and Logistic Regression

95%

Decision Trees

95%

Ensemble Models

90%

Clustering

65%

Convolutional Neural Networks

80%

Graph Neural Networks

60%

Recommender Systems

75%

Natural Language Processing

65%

Exploratory Data Analysis

90%

Multi-modal Learning

70%

Time Series

55%

AWS Sagemaker

65%

AWS EMR

75%

AWS Lambda

70%

Big Query

40%

Docker

60%

Apache Airflow

40%

Kafka

40%

MySQL

85%

AWS Redshift

75%

Amazon RDS

70%

Tableau

50%

Power BI

50%

Looker

60%

Qualification

My Personal Journey

Education

Work

Masters of Science in Computer Science

Rutgers University, NJ, USA

2021-2023

Exchange - Computer Science

Princeton University, NJ, USA

2022

B.Tech in Computer Engineering

K.J. Somaiya, University of Mumbai, India

2016-2020

Higher Secondary in Science

Pace Junior Science College, Thane, India

2014-2016

Software Developer

Barclays

August 2023 - Present

Building a pipeline to load trades data for processing from MSSQL for as part of the FRTB RFET regulatory project
Wrote unit test cases using XUnit and PyTest to increase the test coverage for the codebase from 20% to 80%

Data Scientist Intern

Eluvio

June 2022 - August 2022

Part of the machine learning team building the media meta-tagging framework for media distribution on the blockchain
Engineered the logo detection and classification pipeline from supervised to zero-shot learning paradigm. Reduced the number of false logo detections in keyframes in movies from 10% to 2%
Pioneered a NFT recommender system end-to-end. Led building of a near real-time ETL pipeline to ingest model-ready blockchain data for training. Deployed a test version of the recommender handling 3000 concurrent users efficiently
Designed an external data collection pipeline for the movie speech recognition project eventually leading to 35 percentage point decrease in word error rate

Teaching Assistant

Rutgers University School of Graduate Studies

September 2021 - May 2023

Taught R, SQL and Amazon Redshift and graded weekly assignments and exams for 200 students across two courses – “Data 101” and “Database Systems for Data Science”

Business Analyst

Quantiphi

October 2020 - August 2021

Researched and presented highlights of the US stimulus bills to internal stakeholders that informed Quantiphi’s Public Sector business strategy
Performed market research on 200 organizations in the US Education industry and came up with an effective go- to market strategy that converted four cold leads
Led two cold call introduction calls and presented Quantiphi’s solution deck that showcased how machine learning can be incorporated into the potential clients’ processes
Analyzed and reported quarterly revenue figures to internal stakeholders using Looker dashboards
Initiated and led the creation of an internal repository to keep track of research advancements in machine learning; this was leveraged by 230 people in the organization including founders

Project Intern

Fractal Analytics

June 2019 - July 2019

Implemented the object classification phase of a project that analyzed consumer behavior at stores for a Fortune 500 FMCG company. Built a model for classifying 50 product SKUs in the product range with 80% accuracy
Designed and developed a data augmentation and ingestion pipeline for the classifier. Coded a script for scrapping images of representative products from e-commerce websites to augment data

Portfolio

My Projects

Food AI

Cross-Modal Representation Learning

Built a system for retrieval of food recipes given images of corresponding food

Beat the CCA baseline top-10 recall for recipe retrieval in the original im2recipe paper by 20 percentage points by using ResNet and BERT feature extractors and introducing cross-modality through a shared embedding layer

Implemented a second approach using triplet loss trained neural networks and attained median retrieval rank of 1 and top-10 recall of 82.49% for 1,000 random food images

Tech Stack

Research Papers Referred

Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning

CRecipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images

Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model

View Code View Report View Presentation

Ensembling Large-scale Object Detectors

Computer Vision

Developed algorithms for building YOLO model bagging and boosting ensembles

Obtained an average precision of 87.5% on the Flickr-32 dataset using a generic logo detector system of two boosted YOLO models

Tech Stack

Research Papers Referred

Large scale open-set deep logo detection

Cascade R-CNN: Delving into High Quality Object Detection

Scalable Logo Recognition using Proxies

Rapid object detection using a boosted cascade of simple features

You Only Look Once: Unified, Real-Time Object Detection

View Report

Movie Recommendation from Conversational Data

Natural Language Processing

Built a movie recommendation system leveraging user conversations, critics data and domain adaptation techniques, which is a re-implementation of this paper

Tuned hyperparameters for three CF approaches: KNN, SVD and SVDpp to obtain a 3% improvement in results

Experimented with neural MF and obtained comparable results of RMSE=1.232 and MAE=0.9569

Tech Stack

Research Papers Referred

You Sound Like Someone Who Watches Drama Movies: Towards Predicting Movie Preferences from Conversational Interactions

View Code View Report View Presentation

Logo Detection

Convolutional Neural Networks

Reproduced results for open set logo detection from the paper here achieving a 24 percentage point increase in mean average precision (mAP) compared to the original using YOLOv5

Focused on classifying textual logos and obtained a classification accuracy of 22.56% against 47 classes of the Flickr-47 dataset using a logo classification architecture consisting of YOLOv5 and template matching

Tech Stack

Research Papers Referred

Open Set Logo Detection and Retrieval

View Code View Report View Presentation

Autoencoder Image Colorization

Convolutional Neural Networks

Built a 11-layer deep autoencoder neural network using residual connections that colorizes black and white images

Trained the network on 10,000 images from FloydHub and deployed online via Streamlit

Tech Stack

View Code

New York Taxi Fare Prediction

Big Data

Analyzed a 55-million-record strong taxi fare dataset to determine varying trends in taxi fares across both location and time

Performed feature engineering and zoomed in on trips to and from airports and across different boroughs of NYC

Predicted taxi fares to an RMSE score of 4.28 by training a Random Forest model on the augmented dataset

Tech Stack

View Code

FPL Team-Maker

Exploratory Data Analysis

Developed and deployed a customizable application that uses pandas and Exploratory Data Analysis to suggest an optimal team to be entered into the Fantasy Premier League fantasy soccer game

50+ monthly active users. Ranked top 2% in worldwide ranking among 8.2 million players in the year 2020

Tech Stack

View Code

Undergrad Final Year Project

Natural Language Processing

Built a text simplification system that can work on text and simplify it by removing difficult-to-understand words

Modeled and trained Transformer models that internalized the semantics of and recognized complex words in input

Improved the performance of the application by preceding the transformer architecture with a Complex Word Identification (90.23% accuracy) model that flagged the complex words beforehand

Tech Stack

View Code

Abalone Age Prediction

Machine Learning - Regression

Determined the ages of abalones (snails) using classification techniques and leveraging their physical characteristics

Improved the accuracy of determining age using regression techniques and obtained a MAE of 0.936

Concluded that the dataset is not large enough to get the desired MAE of 0.5 implying correct age prediction

Tech Stack

View Code

Alien Shooter

Python Game Development

Expanded the ‘Space Invader’ game to include three modes of play: Arcade, Timed and Survival

Tech Stack

View Code

Reminder - Todo List

Android Development

Developed an Android application that acts as a combination of a reminder app and a notes app

Published the app on Google Play Store, and currently has 50+ installs with a rating of 4.6

Tech Stack

View Code

Research

My Publications

International Journal of Computer Applications

Vol. 178, No. 50 (43-49)

Abstract

Abalones are sea snails or molluscs otherwise commonly called as ear shells or sea ears. Because of the economic importance of the age of the abalone and the cumbersome process that is involved in calculating it, much research has been done to solve the problem of abalone age prediction using its physical measurements available in the UCI dataset. This paper reviews the various methods like decision trees, clustering, SVM using Tomek links, CGANs and CasCor used in an attempt to solve it. Furthermore, in contrast to previous research that saw this as a classification problem, this paper approaches it as a linear regression problem and analyses the results.

Read it!

International Journal of Computer Sciences and Engineering

Vol. 8, Issue 6 (1-5)

Abstract

Natural Language Processing is an active and emerging field of research in the computer sciences. Within it is the subfield of text simplification which is aimed towards teaching the computer the so far primarily manual task of simplifying text, efficiently. While handcrafted systems using syntactic techniques were the first simplification systems, Recurrent Neural Networks and Long Short Term Memory networks employed in seq2seq models with attention were considered state-of-the-art until very recently when the transformer architecture which did away with the computational problems that plagued them. This paper presents our work on simplification using the transformer architecture in the process of making an end-to-end simplification system for linguistically complex reference books written in English and our findings on the drawbacks/limitations of the transformer during the same. We call these drawbacks as the Fact Illusion Induction, Named Entity Problem and Deep Network Problem and try to theorize the possible reasons for them.

Read it!

Certifications

Extra Courses I have Undertaken

Certified Cloud Practitioner

Expiry Date: July 17, 2024

View Certificate

LookML Developer

Expiry Date: March 28, 2022

View Certificate

AWS Machine Learning Engineer Nanodegree

Expiry Date: Does not expire

View Certificate

Applied Data Science with Python Specialization

Expiry Date: Does not expire

View Certificate

Machine Learning

Expiry Date: Does not expire

View Certificate

Deep Learning Specialization

Expiry Date: Does not expire

View Certificate

Blog

My Technical Articles

5 Minute Paper Explanations: Food AI Part I

Intuitive deep dive of the im2recipe paper “Learning Cross-modal Embeddings for Cooking Recipes and Food Images”

Read it!

5-Minute Paper Explanations: Food AI Part II

Intuitive deep dive of im2recipe related paper “Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA”

Read it!

5-Minute Paper Explanations: Food AI Part III

Intuitive deep dive of im2recipe related paper “Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning”

Read it!