all Codit insights

Exploring Generative AI's Progress and Pitfalls: A Retrospective

Generative AI has transformed the digital landscape in just a few years — but with progress has come new challenges. In this retrospective, we explore where the technology stands today, its successes, risks, and how to apply it responsibly.

In terms of impact, few technologies can compare to Generative AI (GenAI) in the past few years. The launch of ChatGPT in particular prompted a lot of digital ink being spilled on hype, criticism and criti-hype, proclaiming all kinds of things about the technology. So, with the advantage of hindsight, I’d like to take a look at where this technology is at now, well over two years after ChatGPT’s major public release. I’ll be taking a look at various applications of this technology, assessing the progress that has been made since release, as well as evaluating open challenges and problems. 

With this broad overview of the state of GenAI I aim to give the reader insight into when and how to apply this technology. I will start by listing some major applications of GenAI, followed by assessments of challenges and risks carried by the technology from legal, security, privacy and ethics angles. This post will conclude with a summary to outline the key takeaways, as well as how we at Codit approach this technology with our customers. 

Applications

As far as indicators of a technology’s success go, it’s hard to do better than looking at its practical applications. Some prominent ones are as follows: 

  • Translation: While initially lagging behind established machine translation services, the release of GPT-4 has bumped ChatGPT up to perform comparably to services that are specialized for translation tasks like Google Translate and DeepL. A particular technique used to boost its efficacy is so-called Pivot Prompting, a prompting technique to make the model simulate pivot translation. Pivot translation involves translating distant languages via a high-resource language like English as an intermediary, improving accuracy. While classic Machine Translation models retain several key advantages like speed and accuracy when dealing with jargon or other specialized terms, the higher context-sensitivity and more natural output give generative AI translations a strong niche. 
  • Coding assistance: Due to the high accuracy requirements of programming, GenAI is not capable of writing more than boilerplate code on its own. However, it performs quite well as an assistant to a human programmer, with even someone inexperienced with machine-assisted programming often able to complete tasks faster, especially with common languages like Python. Models like GitHub Copilot take on tasks like code completion from human-written code context, alleviating some of the nitty-gritty mental burden to allow the programmer to focus more on design, structure and efficiency. 
  • Information retrieval: A roadblock in using GenAI for information retrieval is hallucination, the phenomenon of GenAI models producing incorrect information due to their lack of an inherent truth-telling capability. But while it may not be suited for this task in isolation, hybrid solutions like Retrieval Augmented Generation (RAG) combine it with separate models for retrieving contextually relevant information and validating the generated output against this data, improving accuracy and reducing the model’s reliance on often outdated information from its training data. Another approach is to use a generative model as a frontend that transforms natural language user input into queries for a more traditional search engine that returns a list of relevant documents. This eliminates the risk of hallucinations in the output by relying on external, indexed content, but at the cost of less specific output, as the user must interpret or extract the desired information from the results. 
  • Content generation: While AI-generated content remains a contentious topic, this applies mostly to fully AI-generated content being presented as or as equivalent to content created by a human. But for applications that do not require content to be particularly polished, the technology can still provide value. For instance, helping write boilerplate emails, illustrations accompanying a presentation or article, or even as an inspiration and conceptualization tool for human writers and artists. The key difference is that in these contexts the content itself is not the point of focus, nor is it expected to be particularly creative and/or polished. In these situations, AI-generated content is a strict timesaver, since it mainly takes the place of copy-pasting pieces of emails or trawling through image search engines yourself, instead of replacing the work of a dedicated writer or artist. 

From these examples it can be seen that while Generative AI has markedly improved over time, it still faces significant accuracy challenges due to its nature as a stochastic parrot, producing human-like language without real knowledge of the meaning of its content. These issues can however be mitigated through use of hybrid approaches, where the GenAI is overseen by a human (as with GitHub Copilot) or by a more specialized AI model (as with RAG), which gives more control over the meaning of the output the GenAI produces. 

With the successful applications and promising new technologies out of the way, it is only fair to look at applications that either ran into significant problems or failed outright. 

Security

Two potential security risks plaguing text-generation AI like ChatGPT are Prompt Leaking and Prompt Hijacking or Prompt Injection. Prompt leaking involves feeding the AI a prompt that coaxes it into reproducing its original instructions. These instructions may contain privacy- or security-sensitive information, proprietary data or PR-sensitive information that should not be known to the end user. A notable example of this is Bing’s internal alias ‘Sydney’ being leaked in this way, despite explicit instructions in its prompts not to disclose this information. 

Prompt hijacking is a more nefarious variant of prompt leaking. Rather than attempting to retrieve the original prompt this approach instead attempts to subvert the intended functionality of the chatbot entirely, making it act contrary to its original instructions. One practical example of this is the bot-operated Twitter account of remote work company remoteli.io being coaxed into posting a threat against the U.S. President. 

In combination with the general prompt engineering technique of Jailbreaking (subverting built-in protections against producing certain types of output) it is possible to make a user-facing chatbot produce sexual, discriminatory and violent output, which may have deep legal ramifications on top of the obvious PR problems this kind of subversion can cause. And while more recent versions of models like ChatGPT such as GPT-4 are more resistant against this kind of attack, the attacks themselves are developing just as rapidly, and it seems unlikely that this kind of subversion will be entirely eliminated in the foreseeable future. 

However, while outright eliminating these threats is not possible, there are ways to mitigate them. An example of this is Azure AI’s Prompt Shield, which helps protect against prompt injection and jailbreaking techniques. It should be noted that this works as mitigation, not outright elimination, and it has been shown that certain indirect prompt injection attacks still bypass Prompt Shield. Therefore, it is best practice to limit the data any user-facing generative AI is trained on and can access as much as possible to limit the risk of leaking sensitive information. It is also advisable to use hybrid approaches like RAG to avoid directly exposing the generative AI to the user to prevent user input from reaching the AI entirely where possible. 

Privacy

Because a lot of Generative AI services by default use their inputs as further training data, and because they are able to reproduce parts of its training data verbatim as discussed earlier, there are privacy risks involved with using these technologies. One example of this is Samsung banning internal use of ChatGPT entirely after it was discovered that employees were entering proprietary code into it. Additionally, user input may be monitored by humans to check for potential abuse of the service, which is another avenue by which sensitive information may be exposed. 

Fortunately, this area has seen a lot of improvement since ChatGPT first launched. For instance, users that meet certain requirements can opt out of the human monitoring for its Azure OpenAI offering, and OpenAI itself has similarly opened up the possibility of opting out of having your input used for training data. Though some risks still exist by way of employees using the free offerings that collect input data by default, these options go a long way towards ensuring compliance with regulations and company policies regarding privacy. 

Additionally, the prevalence of open-source alternatives has increased substantially as well. DeepSeek is the most prominent example, releasing their R1 model as open source, with performance comparable to ChatGPT, and Hugging Face offers a myriad of smaller and more specialized models. The advantage of open-source models like this is that they can be run locally or in a service like Azure AI Foundry, allowing for much stricter access control and securing of both training and input data. 

Bias and ethics

While most Generative AI have systems in place to attempt to give them a neutral point of view, and while these systems have improved over time, they still find themselves succumbing to bias at times. An example is a study of GPT-4 as applied to clinical use cases like diagnosing patients, which found that the model still stereotypes based on demographic, even going so far as to recommend more expensive procedures for patients with certain backgrounds. 

Bias is also an issue from the user side. While most successful applications of Generative AI relegate it to an advisory rather than a truth-telling role, it is easy for users to start treating the advice as truth if it is accurate enough. This is due to the incorrect assumption that because the AI communicates in a human way it also thinks like a human does, and that any mistakes it will make will be humanlike and therefore typically minor in light of its usual accuracy. However, Generative AI is more akin to a highly complex parrot than to a rational agent and can therefore make huge mistakes out of the blue in ways a human never would, which can easily be missed if it is treated as more intelligent than it truly is. Review policies should be in place whenever Generative AI is used in a critical context, even if it is serving a purely advisory role. 

There are also ethical concerns with using Generative AI at all. For example, a TIME magazine investigation found that OpenAI used the services of Sama to annotate its training data to mark violent and sexually explicit content. For this Sama employed workers from Kenya, who were paid as little as $2 per hour, with some of them suffering psychological trauma from having to review this content. Companies that have internal standards regarding exploitative labor may not want to use these services for reasons like these and should take care to research how the Generative AI service they are planning to use operates before making such a decision. 

This topic is a complex one, existing on the intersection of the social and the technological, and therefore isn’t addressable with a single solution. But now that time has passed, experts in the field have created guidelines and best practices to help navigate implementing AI responsibly. One such example is Azure’s Resonsible AI Dashboard, in particular its Fairness Assessment tool, which uses the open-source Fairlearn project as its knowledge base. This tool helps quantify certain aspects of AI bias and fairness to aid in decision-making. The quantifications of fairness this provides help form a more solid framework from which to do a qualitative fairness evaluation, based on the Fairlearn best practices. 

Summary

Generative AI is undoubtedly a powerful technology with plenty of applications. But while it can be used in a large variety of situations, it is important to remember that it is a much more specialized tool than its superficial intelligence might suggest. It shines the most when it is restricted to an assisting role, paired with a human expert, or when the precise accuracy of the output is not very important. Hybrid applications that utilize Generative AI as a component while other modules handle reasoning tasks that generative models cannot handle well on their own are also becoming increasingly prevalent, driven by an increasing understanding of exactly what the technology’s capabilities and limitations are. 

While Generative AI does have certain vulnerabilities, a lot of these only arise when it is made to be directly user-facing. There are also tools available to help mitigate these issues on both technical and social levels. Hybrid approaches can also outright eliminate certain specific issues, such as RAG removing the need for the generative model to be user-facing in the use case of information retrieval. 

On our part, in addition to applying our knowledge at customer projects, we also host hackathons and inspiration sessions. Because this technology is so new, so broad and so easily misunderstood, we host these sessions to cover what AI is and isn’t, what kinds of applications it has, how those applications might be relevant to the customer, all the way to covering the principles of responsible AI. With these we aim to grow an understanding of the technology with our customers so that they can be better equipped to decide if and how they should incorporate this technology. 

Subscribe to our RSS feed

Talk to the author

Contact Jesse

Consultant

Hi there,
how can we help?

Got a project in mind?

Connect with us

Let's talk

Let's talk

Thanks, we'll be in touch soon!

Call us

Thanks, we've sent the link to your inbox

Invalid email address

Submit

Your download should start shortly!

Stay in Touch - Subscribe to Our Newsletter

Keep up to date with industry trends, events and the latest customer stories

Invalid email address

Submit

Great you’re on the list!