ParagJain
ParagJain
[email protected]
+91 9821278883
Bengaluru, India 560035
Education
06/2015
Summary
B.E. IN ELECTRICAL ENGG: Guided multiple organizations through the journey of setting up their analytics and
PEC UNIVERSITY OF data science. Currently, focused extensively on the Gen AI space, fine-tuning LLMs
TECHNOLOGY for domain-specific tasks, building RAG systems for enhanced contextual responses
Chandigarh and leveraging advanced prompt engineering techniques to ensure structured
structured and actionable outputs. Having been part of several 0 to 1 initiatives, I
am proficient in creating robust frameworks tailored to each organization's specific
Skills needs.
• Hugging Face Transformers
• Finetuning LLMs
• Quantization & LoRA Adapters
Experience
• OpenAI Bridgetown Research - Senior Scientist (Consultant)
• Perplexity Bengaluru, India
• Exa Search 08/2024 - Current
• SQL
• Currently working on finetuning MISTRAL-7B model using LoRA adapters to
• Python
extract facts from website content.
• Beam
• Finetuning BART-Large: Used in-house annotated data and synthetic generation
• AWS Sagemaker
to fine-tune a checkpoint of BART-large (pretrained on the MNLI dataset) in
• Amazon Timestream
SageMaker to predict user concerns and question types with recall of
• Deep Learning
approximately 98%. Deployed it as an endpoint on Beam.cloud's serverless GPU
• NLP
instance achieving inference latency below 300ms.
• AWS Redshift
• Metrics For Interview: Created a script powered by GPT-4 to analyze interview
• PG Admin
data by tagging question type, evaluating answers quality and identifying
• Redash
concerns raised by respondents.
• Metabase
• Quick Search: Developed a RAG system that would quickly search interview
• Dune Analytics
transcripts to answer user queries. Embeddings of interview Q&A pairs were
• Firebase
created using OpenAI's text-embedding-3-small. These were stored in Pinecone,
• Docker
and then compared with the user query embedding to fetch the top 30 similar
• Redis
results, and then passed to GPT to provide a relevant summarized answer.
• Streamlit
• Automated Question Generation: Designed and implemented a three-step
pipeline that takes research questions as input, identifies relevant domains, and
Domains generates tailored interviewee personas and main interview questions by
leveraging GPT's structured output feature. Used Firebase to store and sync data
• Decision research in real time.
• Opinion Trading • Question Bank Optimization: Developed a question adaptation system that
• Online Gaming personalizes a fixed question bank based on research objectives and user
• EdTech personas to generate highly relevant interview questions.
• US healthcare
TRADEX - DATA SCIENCE LEAD
Linkedin Profile 08/2022 - Current
• https://www.linkedin.com • Worked closely with founders and aligned data strategies with core business
/in/parag-jain-07583a167/ objectives. Empowered them to leverage data as a strategic asset right from
beginning.
• Lead growth experiments by clustering users through machine learning based on
their trading pattern, pocket size and in app behavior to improve retention by
25%.
• Sole creator and owner of complete data science layer of the organization.
Automated monitoring of key KPIs on daily, weekly and monthly level.
• Identified bottlenecks and implemented end-to-end logic which automatically
created and settled markets across finance, sports and media by pulling data
through various apis.
• Improved the health of entire system by reconciling all trading and payment
activities and flagging mismatches. This brought withdrawal time from 1-2 days
to couple of hours.
• End to end creation of different trading strategies to provide the market with
liquidity and depth while profiting from the difference in the bid-ask spread.
Deployed arbitrage strategy to take advantage of various bookmakers and
exchanges having different odds.