Guardrails on Generative AI for Enhanced Trust: Implementing Constraints and Iterative Reprompting

This thesis seeks to explore the design and implementation of Guardrails for Generative AI to restrict its output, combined with iterative re-prompting mechanisms, ensuring the generated content aligns with predefined constraints, thus fostering trust in the system.

Contact persons

Sagar Sen

Research Manager
Arda Goknil

Senior Research Scientist

Master Project Description

Generative AI, while showcasing remarkable capabilities in content creation, often produces outputs that might be inappropriate, biased, or outside desired constraints. Ensuring trust in Generative AI necessitates the implementation of boundaries or "Guardrails" that restrict and guide the AI's generative process.

Research Topic Focus

Understanding the current limitations and biases present in popular Generative AI models.
Investigating methodologies for designing effective Guardrails that restrict undesired AI output.
Developing mechanisms for iterative re-prompting to guide Generative AI towards desired outputs.
Evaluating the efficacy and reliability of the Guardrails and re-prompting mechanisms in real-world scenarios.
Potential validation on case studies from health, manufacturing, space, and energy verticals

Expected Results

An in-depth analysis of the strengths and vulnerabilities of contemporary Generative AI models concerning trust.
A framework for implementing Guardrails and iterative re-prompting in Generative AI systems.
Demonstrated improvements in the trustworthiness and reliability of Generative AI outputs using the proposed methods.
Insights into potential challenges and further research opportunities in ensuring trust in Generative AI.

Learning Outcomes

Develop a deep understanding of the trust challenges in contemporary Generative AI systems.
Acquire skills in designing and implementing Guardrails and re-prompting mechanisms.
Enhance the capability to critically analyze and evaluate the reliability and trustworthiness of Generative AI outputs.
Gain hands-on experience in integrating trust mechanisms in real-world AI applications.

Qualifications

A solid foundation in AI and generative models.
Familiarity with popular Generative AI frameworks and tools.
Proficient in relevant programming languages (preferably Python).
An analytical mindset with a focus on ethical AI and trust mechanisms.

References

Gasser, Urs, and Viktor Mayer-Schönberger. "Guardrails: Guiding Human Decisions in the Age of AI." Guardrails. Princeton University Press, 2024.
Wang, Yanchen, and Lisa Singh. "Adding guardrails to advanced chatbots." arXiv preprint arXiv:2306.07500(2023).
https://docs.guardrailsai.com/

Contact persons/supervisors

Sagar Sen ( Arda Goknil (), Erik Johannes Husom ()

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us