Some Background
ChatGPT has attracted a lot of attention because of the ability to generate text which while being grammatically correct most of the time happens to also be very conveniencing to an extent that unless you are _actually_ aware of the _reality_ you might believe what the machine learning model, which is estimated to have cost 1.2 billion USD to build with your and my contributed English text data, is factually correct.
This has spiked debate on if students will outsource their school homework to ChatGPT when they are asked to do their homework. Lets keep aside the fact that even before ChatGPT attracted our eyeballs there was homework outsourcing, it was to pay for the service of an actual _human_ do their tasks.
Note that using ChatGPT to do homework is interpreted as cheating. Teachers always think that students are out there to find an easy way to accomplish the task assigned to them; never questioning their inability to frame tasks that would actually require the students to demonstrate their learning.
Consequently, there has been an engaging debate on identifying text generated by ChatGPT to _help_ teachers identify if the student is indeed the author of the text or if s/he used the services of a AI bot used (underlying assumption is if they were unfair in completing their task)!
Detect ChatGPT is an outcome of this, a portal (I was not aware there was a portal for this until now!) that will allow one to differentiate between human generated text and machine generated text. A plagiarism checker if you wish.
First we research and build ChatGPT and then we realize that this can be used unfairly (by students!) so we come up with a brand new research problem to detect ChatGPT generated text & newspaper headlines "Stanford introduces DetectGPT to help educators fight back against ChatGPT".
"Solved Research problems lend themselves to new research problems --Anonymous"
can not be less true!
Going Forward
The hump on the red line indicates the hype - more and more people trying things out using ChatGPT and then putting the generated text out on the internet to shout their findings!So at some point E, when a ChatGPT upgrade is envisioned. The upgraded version would digest more of its own generated text (pictorially the area between the two red lines) that the text that has come from usual sources (human; pictorially the area between the two blue lines) of text generation. In some sense it is learning from itself!
- After the initial hype the text generated by the ChatGPT is reduced (pictorially it is the area between the red dotted lines; denoted by length between points "H" and "I") compared to the usual modes of text generation (by humans; pictorially it is the area between the blue lines; denoted by length between points "G" and "J"). So that the next version of ChatGPT learns from the "new" and "usual" text data than data generated by itself.
- there is a way to identify the text generated by ChatGPT, namely detect ChatGPT! This mechanism can then be used to filter out text generated by ChatGPT, thereby allowing the upgraded version of ChatGPT to only learn from human generated text and not be biased by the text generated by its older version. All this can happen if the detect ChatGPT works to perfection. However, like all explorations, the ability to achieve good accuracy takes time.
Comments
Everyday around the world inventors are using AI and creating innovative models/apps without knowing it's consequences. Inventor of ChatGPT would have never thought how students might use it.
Software governance is needed before any AI model/app is made public. It's time that world must put restrictions before it gets uncontrollable and massive.