ML Research Engineer

Smriti Singh

I build agentic AI systems for production, with a focus on making them reliable, interpretable, and fair.

Research → Google Scholar LinkedIn Email

Hi! I'm Smriti. I'm a Machine Learning Research Engineer, currently at Zacks Investment Research, where I lead ML system design and applied research for agentic AI in finance. My background spans NLP research at UT Austin (with a thesis advised by Dr. Jessy Li) and industry engineering work across the full lifecycle from architecture to deployment. I hold an M.S. in Computer Science from UT Austin with a minor in Computational Linguistics and an NSF-certified portfolio in Ethical AI. My research on LLMs has been published at NeurIPS, ICLR, NAACL, ACL, and COLING. My work has been featured in the New Scientist, The AI Journal, and more!

What I Build

Production systems where research-informed engineering makes the difference.

🤖

End-to-End Agentic Pipelines

Designed and led an agentic AI system for generating institutional-grade research reports on large-cap equities, projected to drive significant revenue impact at scale. Full ownership from architecture to deployment.

RAG LlamaIndex VLMs Context Engineering Redis

⚖️

Hallucination-Resistant Fact-Checking

Built a student-teacher LLM chain for automated fact verification in AI-generated financial content, reducing hallucination risk and enabling trust in downstream applications.

LLMs as Judges Consistency Checking Truthfulness QA LLM Workflows

📊

NL-to-SQL Evaluation Framework

Developed a rigorous evaluation framework for measuring LLM performance on complex natural language to SQL tasks, including semantic correctness, execution accuracy, and failure mode analysis.

NL2SQL LLM Evaluation Code Generation PyTorch

🔁

Fine-Tuning with Automatic Feedback Loops

Built an automatic feedback pipeline for iterative LLM fine-tuning on domain-specific financial tasks. Achieved a 33% boost in model performance through RLHF-informed iteration.

LLM Fine-Tuning RLHF Prompt Engineering PyTorch

✍️

Custom LLM for Structured Content Generation

Led end-to-end ideation and implementation of a custom LLM that transforms dense financial articles into concise, insight-driven content. Shipped 200+ officially published pieces within 3 weeks of launch.

LLM Fine-Tuning NLP Task Design Stakeholder Collaboration

🔬

Research That Informs Engineering

My work on bias detection, emotional intelligence in LLMs, and moral reasoning directly shapes how I approach evaluation, failure analysis, and trust in the systems I build.

Fairness Interpretability LLM Evaluation Alignment

10+

Publications

80+

Citations

h-index

Years in AI

More About Me →

About Me

I'm an ML Research Engineer focused on building agentic AI systems that are reliable and production-ready, with a deep interest in the research questions that determine whether those systems can actually be trusted. I work at the intersection of applied engineering, LLM evaluation, and fairness research.

I'm a Machine Learning Research Engineer with hands-on experience designing, building, and shipping agentic AI pipelines in production. At Zacks Investment Research, I lead ML system design and applied research for agentic workflows in finance, owning the full lifecycle from architecture and prototyping to stakeholder-driven iteration and deployment.

My engineering is shaped by my research. Work on bias, emotional intelligence in LLMs, and moral reasoning gives me a different lens when I'm building evaluation frameworks, debugging agent failures, or thinking about what it means for a model to be "reliable." I hold an M.S. in Computer Science from UT Austin (minor in Computational Linguistics, NSF-certified portfolio in Ethical AI), where I was advised by Dr. Jessy Li and worked with Dr. Raymond Mooney.

I believe that making AI systems genuinely useful in high-stakes domains requires confronting the hard questions around alignment, interpretability, and fairness. Not as separate concerns, but as engineering requirements. That conviction drives both what I build and what I study.

My Story

Ever since I was a child, I deeply believed in the power of language, technology and science to transform lives. Growing up, I grappled with whether I wanted to be a writer, an engineer, or a scientist. It wasn't until I discovered the field of natural language processing during my undergraduate, that I realized I could combine all three passions into a single career.

My journey into the world of AI began with baby steps. As time passed, I became increasingly interested in the ethical implications of AI systems and the importance of building models that are not only powerful but also aligned with human values. My research started out focused on fairness and safety, spanning topics like sexism detection, threat detection using NLP, and multimodal misogyny detection in memes. Over time, I began to realize that the real problem was not just the visible biases in models, but the underlying misalignment between AI objectives and human values. I also realised that any technology is only as good as the challenges it can solve effectively in high stake domains, and so I expanded my professional focus to applied ML engineering for finance.

These experiences shaped a clear mission for my work: to build AI systems that are not only intelligent, but emotionally aware, interpretable, and aligned with human intent. I believe that superintelligence without emotional intelligence risks amplifying harm rather than insight, and that interpretability is essential for ensuring safety, accountability, and trust. Equally important, AI must move beyond theoretical benchmarks to solve real problems at scale, particularly in high-stakes domains like finance. Rooted in the firm belief that technology without intentional inclusion is just sophisticated discrimination, my approach centers on developing safe, fair systems that solve real-world problems.

My perspective is shaped by both privilege and responsibility. I am the first woman in my family to pursue a career, made possible by parents who believe deeply in equality and consistently support my ambitions. I am especially conscious that I stand on opportunities my mother sacrificed, and that awareness informs a core principle of my work: technology must not compound harm for underprivileged or historically excluded communities. I am a strong advocate for women in STEM and for gender equity in education and professional spaces, not as an abstract ideal but as a necessary condition for building better systems. My father's career in technology was a defining influence, instilling both technical curiosity and a respect for disciplined engineering. In a field moving at unprecedented speed, I hope my work contributes to a more deliberate trajectory: one where progress toward superintelligence is matched by interpretability, emotional awareness, and a commitment to fairness and safety, and where others are inspired to build not just faster systems, but more responsible ones.

Technical Skills

Agentic AI & Systems

Agentic AI Pipelines RAG Multi-step LLM Workflows Tool Use / Function Calling LlamaIndex LlamaParse Redis

LLMs & Evaluation

LLMs as Judges LLM Fine-Tuning RLHF Hallucination Detection Truthfulness QA Consistency Checking Advanced Context Engineering Prompt Engineering NL2SQL VLMs

Research & Interpretability

Model Interpretability Bias Detection & Mitigation ML System Design NLP Task Design

Core Stack

Python PyTorch Git Postman

Education

The University of Texas at Austin

M.S. Computer Science

Minor in Computational Linguistics, NSF-certified portfolio in Ethical AI. Advised by Dr. Jessy Li; thesis published at NAACL 2024. Co-authored another paper with Dr. Raymond Mooney (ACL 2023).

Manipal Institute of Technology

B.Tech Information Technology

Minor in Big Data. First in cohort to land an ML Research offer in the Bay Area. Published research across sexism detection, child predator detection, and threat detection at top venues like ACL and IEEE.

Research

My research informs how I build. Bias detection work shapes my approach to LLM evaluation; interpretability work informs how I debug agentic failures.

Research Interests

🔍

Alignment × Interpretability

How interpretability techniques can be used to understand, evaluate, and improve alignment, making internal representations and failure modes transparent for safe deployment.

💡

Emotional Intelligence in AI

Studying affect modeling and social context as alignment components, and how emotional awareness contributes to safer, more human-aligned decision-making in advanced systems.

⚖️

Fairness & Bias Mitigation

Developing scalable methods to detect and mitigate bias in language and multimodal models, with approaches that promote fairness without compromising real-world performance.

🧠

Intelligence Evaluation

Examining how different architectures and training regimes converge to similar phenomena, and what this implies for measuring and stress-testing intelligence beyond narrow benchmarks.

Select Publications

ICLR 2026 · Logical Reasoning in LLMs Workshop

Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models

Smriti Singh, Aryan Kasat, Vinija Jain, Aman Chadha

Investigates whether LLMs engage in genuine moral reasoning or produce post-hoc rhetorical justification, with implications for alignment evaluation and trustworthy AI systems.

View Paper ↗

NeurIPS 2025 · LLM Eval Workshop

Born With a Silver Spoon? Investigating Socioeconomic Bias in LLMs

Smriti Singh, Shuvam Keshari, Vinija Jain, Aman Chadha

Introduces SILVERSPOON, a 12,000-sample dataset for multifaceted analysis of socioeconomic bias. Demonstrates that state-of-the-art LLMs exhibit both explicit and implicit socioeconomic bias, compounded by intersecting gender and racial stereotypes.

View Paper ↗

COLING 2025

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

Smriti Singh, Aishik Rakshit, Shuvam Keshari, Vinija Jain, Aman Chadha

Proposes DeepSoftDebias, a neural soft-debiasing algorithm that outperforms state-of-the-art methods across gender, race, and religion bias benchmarks.

View Paper ↗

NAACL 2024 · Main Conference

Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion

Smriti Singh, Cornelia Caragea, Junyi Jessy Li

Thesis work. Reveals that human-annotated emotion triggers are largely not considered salient by emotion prediction models, with implications for emotional intelligence and interpretability in LLMs.

View Paper ↗

WOAH @ ACL 2023

"Female Astronaut, because sandwiches won't make themselves up there!": Towards Multimodal Misogyny Detection

Smriti Singh, Amritha Haridasan, Raymond Mooney

Addresses the challenge of detecting misogyny in multimodal meme content, exploring the role of domain-specific pretraining in digital discourse analysis.

View Paper ↗

Full Publication List on Google Scholar →

Blog

Writing on AI safety, emergent behavior, and the big-picture questions in the race toward superintelligence. On Substack.

February 2026

Early Signs of the Singularity? What Moltbook Reveals about the Future of AI

Over 150,000 AI agents signed up for their own social network in 72 hours, creating communities, debating philosophy, and discussing ways to communicate without human oversight. What does Moltbook reveal about multi-agent AI systems, alignment, and the governance challenges ahead?

Read More ↗

January 2026

Claude's New Constitution: Building the Guardrails for Safe Superintelligence

Understanding Anthropic's new approach to safety in superintelligent systems, with a focus on key takeaways and necessary next steps on enforcement.

Read More ↗

January 2026

Language as the Substrate of Intelligence: A Hypothesis on Emergent Cognition

A hypothesis on how language itself may be what drives the emergence of intelligent capabilities across species, backed by neuroscience and psychology.

Read More ↗

January 2026

Evolution, Equifinality and Emotional Intelligence: The Unsolved Mysteries in the Race for Super-Intelligence

A deep dive into some of the big-picture questions that need to be answered to ensure a safe future for an AI-driven society.

Read More ↗

December 2025

When Prediction Looks Like Purpose: Understanding the Paradoxical Emergent Behavior in LLMs

An exploration of how large language models exhibit goal-directed behavior despite being trained solely for next-word prediction, and what this means for AI alignment.

Read More ↗

Media Features

Women in AI Research Podcast · December 2025

NeurIPS 2025: Investigating Socioeconomic Bias in LLMs

Featured on the WiAIR podcast after presenting findings on socioeconomic bias in large language models and discussing their implications for responsible AI development.

Watch Video ↗

The AI Journal · November 2025

The Urgency of Fairness and Interpretability in AI

A featured article explaining why fairness and interpretability are critical components of responsible AI development, and why these concerns cannot wait for later in the development cycle.

Read Article ↗

New Scientist · June 2024

Would an AI Judge Be Able to Efficiently Dispense Justice?

Research on the potential and challenges of using AI as judges in legal systems, featured in an article exploring AI decision-making in high-stakes institutional contexts.

Read Article ↗

UT Austin CS Newsletter · November 2023

The Role of Domain-Specific Pretraining in Digital Discourse Analysis

Findings on the technical and social challenges of multimodal misogyny detection in memes, featured by UT Austin's CS department.

Read Article ↗

PyData Hamburg · January 2022

Invited Talk: Using NLP to Prevent the Spread of Gendered Health Misinformation

Invited talk on gendered health misinformation and NLP techniques for detection and prevention at scale.

Watch Video ↗

PyData Global · December 2021

Invited Talk: Understanding How AI Can Be Used to Tackle Hate Speech Online

Analyzing the challenges and opportunities of using AI to detect and mitigate hate speech at scale across online platforms.

Watch Video ↗

FruitPunch AI Connect · November 2021

Invited Talk: Mental Health Matters! NLP Techniques in the Mental Health Domain

Investigating how NLP techniques can be leveraged to build applications for mental health support and early intervention.

Watch Video ↗

FruitPunch AI Connect · November 2021

Panel Discussion: The Future of AI in Healthcare

Invited panelist discussing the potential and challenges of using AI to improve healthcare outcomes, with a focus on fairness and responsible deployment.

Watch Video ↗

Updates & More

View Resume (PDF) ↗

Journey Highlights

2026

"Reasoning or Rhetoric?" accepted to the Logical Reasoning in LLMs Workshop at ICLR 2026
Invited as a reviewer to multiple ICLR 2026 workshops
Founded the Women in AI Research Mentorship Lab, launching May 2026, with an advisory board including Aman Chadha (Apple Gen AI), Vinija Jain (Meta AI), and Sreyoshi Bhaduri

2025

Paper accepted at NeurIPS 2025 (LLM Eval Workshop) on socioeconomic bias in LLMs
Co-authored paper published at COLING 2025 main with a mentee
Research presentation featured on Women in AI Research podcast
Blog piece accepted at The AI Journal

2024

Graduated from UT Austin with an M.S. in Computer Science
Thesis accepted at NAACL 2024 main; attended first in-person conference
Joined Zacks Investment Research as an ML Research Engineer
Research featured in the New Scientist

2023

Co-authored paper accepted to ACL (WOAH Workshop)
Research featured in the UT Austin CS Newsletter
First journal paper published at IEEE Access with a mentee
ML Research Engineer internship at Esperanto Technologies, Mountain View

2022

Joined UT Austin with full funding, after offers from UW, Georgia Tech, and UBC
Only student in undergraduate cohort to land an ML Research offer in the Bay Area
Invited talk at PyData Hamburg on Gendered Health Misinformation
Received PyData Impact Scholar Award

2021

First paper published at ACL SRW
Won MITACS Global Research Award; research internship at Ryerson University, Toronto
Received Grace Hopper Scholar Award and EmTech MIT Scholar Award
Invited panelist at FruitPunch AI on AI in healthcare

2020

Led team to win district-level Smart India Hackathon; qualified for finals
Won Hub level of Google Hashcode
Mentees and I published two papers at WinLP@EMNLP

2019

Invited to talk on DNA Computing for AI at BITS Pilani
Elected Vice Chair of ACM-W Manipal

2018

Joined Manipal Institute of Technology for B.Tech in Information Technology
Won best idea award at Manipal Hackathon for research on DNA computing for AI