Skip to main content

Need for Detect ChatGPT to protect ChatGPT

Some Background

ChatGPT has attracted a lot of attention because of the ability to generate text which while being grammatically correct most of the time happens to also be very conveniencing  to an extent that unless you are _actually_ aware of the _reality_ you might believe what the machine learning model, which is estimated to have cost 1.2 billion USD to build with your and my contributed English text data, is factually correct.

This has spiked debate on if students will outsource their school homework to ChatGPT when they are asked to do their homework. Lets keep aside the fact that even before ChatGPT attracted our eyeballs there was homework outsourcing, it was to pay for the service of an actual _human_ do their tasks.

Note that using ChatGPT to do homework is interpreted as cheating. Teachers always think that students are out there to find an easy way to accomplish the task assigned to them;  never questioning their inability to frame tasks that would actually require the students to demonstrate their learning.

Consequently, there has been an engaging debate on identifying text generated by ChatGPT to _help_ teachers identify if the student is indeed the author of the text or if s/he used the services of a AI bot used (underlying assumption is if they were unfair in completing their task)!

Detect ChatGPT is an outcome of this, a portal (I was not aware there was a portal for this until now!) that will allow one to differentiate between human generated text and machine generated text. A plagiarism checker if you wish.

First we research and build ChatGPT and then we realize that this can be used unfairly (by students!) so we come up with a brand new research problem to detect ChatGPT generated text & newspaper headlines "Stanford introduces DetectGPT to help educators fight back against ChatGPT".

 

"Solved Research problems lend themselves to new research problems --Anonymous

can not be less true!

 Going Forward

 

 Note that according to openAI, the creators of ChatGPT, ChatGPT has digested almost all the visible English text data that was available on the internet. So at some point of time in the near future the current version of ChatGPT is going to lagging in terms of being informative. 
 
For example say if a company released a new oral vaccine for COVID19 after the ChatGPT was released, then it would not be aware of it because the information corresponding to that would not been digested by the AI model ChatGPT.
 
Sooner or later,  for the above stated reason, at some time in future, the creators of ChatGPT will see a need for their large language model to be upgraded. 
 
See the above picture; "A" is when the ChatGPT was released and say "E" is the point at which, the upgrade is deemed required; the distance between "A" and "E" is the time between two versions of ChatGPT! 
 
The period between A and E, the world is doing what it does best, generate lots of text data. Except that there is an additional source (you guessed right, ChatGPT) generating data. This is shown by the "red" line. It text generation starts with the introduction of ChatGPT (point "A") and it is likely to overtake the "usual modes of" text generation by humans in terms of volume of text generated. 
The hump on the red line indicates the hype - more and more people trying things out using ChatGPT and then putting the generated text out on the internet to shout their findings!
So at some point E, when a ChatGPT upgrade  is envisioned. The upgraded version would digest more of its own generated text (pictorially the area between the two red lines) that the text that has come from usual sources (human; pictorially the area between the two blue lines) of text generation. In some sense it is learning from itself! 
 
This scenario could lead the upgraded ChatGPT into chaos especially if the output of the current ChatGPT is bullshit (as mentioned in  this article). So if more  and more bullshit text generated by ChatGPT is used to train the next version of ChatGPT; the output is going to be more bullshit (what ever that means). Making the upgrade unusable ...

Unless
  1. After the initial hype the text generated by the ChatGPT is reduced (pictorially it is the area between the red dotted lines; denoted by length between points "H" and "I") compared to the usual modes of text generation (by humans; pictorially it is the area between the blue lines; denoted by length between points "G" and "J"). So that the next version of ChatGPT learns from the "new" and "usual" text data than data generated by itself.
  2.  there is a way to identify the text generated by ChatGPT, namely detect ChatGPT! This mechanism can then be used to filter out text generated by ChatGPT, thereby allowing the upgraded version of ChatGPT to only learn from human generated text and not be biased by the text generated by its older version. All this can happen if the detect ChatGPT works to perfection. However, like all explorations, the ability to achieve good accuracy takes time.

While time will tell about #1. There is a definite need put all research might behind exploring methods to detect ChatGPT generated text data. This is probably the only way that ChatGPT will survive.
 
What do you think?

Comments

Shoeb Shaikh said…
It's like someone invented AK47 for better military operations and defence but never realized that it could be used by terrorists to harm innocent people.

Everyday around the world inventors are using AI and creating innovative models/apps without knowing it's consequences. Inventor of ChatGPT would have never thought how students might use it.

Software governance is needed before any AI model/app is made public. It's time that world must put restrictions before it gets uncontrollable and massive.

Popular posts from this blog

Visualizing Speech Processing Challenges!

Often it is difficult to emphasize the difficulty that one faces during speech signal processing. Thanks to the large population use of speech recognition in the form of Alexa, Google Home when most of us are asking for a very limited information ("call my mother", "play the top 50 international hits" or "switch off the lights") which is quite well captured by the speech recognition engine in the form of contextual knowledge (it knows where you are; it knows your calendar, it know you parents phone number, it knows your preference, it knows your facebook likes .... ). Same Same - Different Different:   You speak X = /My voice is my password/ and I speak Y= /My voice is my password/. In speech recognition both our speech samples (X and Y) need to be recognized as "My voice is my password" while in speaker biometric X has to be attributed to you and and Y has to be attributed to me! In this blog post we try to show   visually   what it means to pro...

BITS Pilani Goa Campus - Some Useful Information

You have cleared the BIT Aptitude Test and have got admission to BITS Pilani Goa Campus. Congratulation . Well Done. This is how the main building looks! Read on for some useful information, especially since you are traveling for the first time to the campus and more or less you will face the same scenario that we faced! We were asked report on 29-Jul-2018 (Sunday) to take admission on, 30-Jul-2018.  We reached Madgoan (we traveled by train though the airport is pretty close to the BITS campus, primarily to allow us to carry more luggage!)at around 0700 hours (expect a few drizzles to some good rain - so carry an umbrella) on 29-July-2019. As you come out you will be hounded by several taxi drivers, but the best is to take the official pre-paid taxi. It should cost you INR 700 to reach the BITS campus. We had booked a hotel in Vasco (this is one of the closest suburb from BITS campus, a taxi should charge you around 300-350 INR; you will make plenty of trips!) ...

Paying Property Taxes Online - Government of Andhra Pradesh

When my father received a SMS stating that you can pay your property tax online. I was thrilled. Why? My father stays with me in Mumbai and paying property tax for a small flat in   Proddatur Municipality was always a pain. The best was to request someone to pay it on behalf of him which meant it was at the time and convenience of the person we requested. Mind you this is no easy task, asking someone to pay on your behalf. A quick search on the web got me to  Commissioner & Director of Municipal Administration  and I it does have an online payment of taxes tab. And boy this was a breeze. As soon as you press the online payment tab you see a neat selection of District | Muncipality | Tax Type. For my purposes I choose Ysr Kadapa (it would be nice if they changed it to read "YSR Kadapa") and then "1014-Proddatur" for Municipality and I chose Tax Type is "Property Tax" (the other option is Water tax) Once you fill in these details. You are directe...