ChatGPT has revolutionised the field of conversational artificial intelligence (AI). Beyond its ability to generate effective human-like responses, the breakthrough has been to integrate big data (large language) models with deep reinforcement learning techniques enhanced by human feedback (RLHF).1 That is, reinforcement learning is an AI technique whereby the agents in the model learn from their actions through rewards and penalties in function of good/bad actions—but in the case of Open AI's Chat GPT and also other high profile AI applications like DeepMind's Sparrow),2 rewards are specifically provided by a set of human interventions, allowing machines to grasp elements of decision-making distinctly embedded in human experience.
This part of feedback, even if proven to be effective, may however be subject to biases if human interventions tend to cater to certain preferences. . Further, the use cases can be enlarged by adding gamification, where the reinforcement will come from the winning rewards for instance in education, training, and many other cases.