To understand the value of ChatGPT as an authoring and translation tool, please read the alternate versions of this article, which have been re-written by ChatGPT in two different styles, cosmic optimistic and cynical. And, the simplied versions translated to Te Reo Māori and 'Ōlelo Hawai'i.
To understand ChatGPT as an information collection and synthesizing tool, it may first help to think about the several ways that people extract or collect information from the internet.
The Searchable Universe
Generally, search engines such as Google search and other robot collection services (commonly called, bots or spiders) gather information from the searchable universe, that is, websites, books, pdf's, image meta-data (alt tags) that are made public and are flagged as searchable by machines (i.e., robot.txt file installed on the root directory of the web server). Daily, people engage in the searchable universe with search engines (e.g. Google, Bing, DuckDuckGo, YouTube) which present the information based on a keyword term as search results from searchable and indexed sources.
The Unsearchable Universe
There is also the unsearchable universe, closed off or walled social media networks such as Instagram, Facebook, SnapChat, Tiktok; or private websites or institutional websites, or services that do not allow their content to be read by machines (e.g lockable format such as ePub, iBooks, etc), or require some authentication (i.e email and password, or Google login) before access.
There are also tools that scrape/collect information from the unsearchable universe, that is, accessing data held in password locked pages, image files and copyright protected files (ePubs) using other tools such as OCR, web-scraping services, password hacking or meta-data extraction. These techniques all cost money and time to run. Each service requires a computer, which means power costs, hardware costs and configuration costs. There is also a legal issue, as many owners prefer to charge for access or restrict to specific users. These costs can be hidden from end users by supplying advertising, or they can be paid upfront or retroactively. Also these results (or outputs) need to be hosted on a public server or zone that allows access.
Public availability is an important part in defining the searchable and unsearchable universes, and in-turn determining the value of synthesizing services like ChatGPT to our societies and communities.
So what exactly is ChatGPT and how does it fit into the searchable and unsearchable universes?
"GPT" stands for generative pre-trained transformer, which is a program that can realistically write like a human. GPT essentially searches a massive amount of written text by reading millions of articles and books online from the searchable universe. It produces work in a chat format (hence 'Chat'-GPT) that has perfect grammar, correct punctuation, and no spelling mistakes. After analyzing a text based on users’ input it then captures the style of writing for all new articles, or it delivers it in a way which is delightful for humans.
Utilisation of the information universes
There are differences and similarities between ChatGPT and an internet search engine and data repository like Google.
- User Engagement. According to one source, ChatGPT is estimated to have reached 100 million monthly active users in January just two months after launch, making it the fastest-growing consumer application in history. In contrast Google search with 93.18% market share has over 5 billion monthly page views (according to StatsCounter).
- Age. Founded in 1998 Google has a 25 year history. ChatGPT is 15 years old and only searches data from 2021 onwards.
- Style and delivery of information. ChatGPT produces contextualised results in natural spoken format (chats) with perfect grammar, correct punctuation, and no spelling mistakes. It can deliver the results in many written styles, for example, cosmic, wistful, sarcastic, authoritative. Search engines deliver a list of URLs (links) on a search results page and leave the user to interpret and contextualise, there is no flavour.
- ChatGPT synthesizes multiple sources together. Internet search only provides a list of search results. ChatGPT provides a contextualised summary and can be queried further in the logical progressive flow of thinking while interacting with it's chat interface, thus providing a faster and deeper experience.
- Extendable. It is not possible to add a data-source to Google search, so results are limited to the websites that the Google crawling bots surface. OpenAI, ChatGPT's owner, launched plug-ins for ChatGPT in March 2023, which extended the ChatGPT's functionality by granting it access to third-party knowledge sources and databases, including other locked websites.
- Re-use. It is not known if Google reuses search term in its models, it may track commonly used terms but not sure if it uses those terms are used to fine-tune search results. ChatGPT's results maybe be used to train their models by default, in the business tier service, OpenAI wrote,
“ChatGPT Business will follow our API’s data usage policies, which means that end users’ data won’t be used to train our models by default”
Comparing Google search to a service like ChatGPT reveals new ways that people will interact with the web and the strengths, weaknesses, opportunities and threats that each services exhibit.
With this in mind, I was curious how ChatGPT might offer value to certain sectors, jobs and industries.
How will ChatGPT disrupt some jobs?
I have conducted several informal short tests for three tasks, a programming job, a writing job and a legal job. The results are attached and summarised below:
- Coder/Programming: research and investigation tasks were faster, ChatGPT provided both code samples and instructions on how to build apps.
- Researcher/Author: research and investigation tasks were faster, and incredibly fast to draft articles in various styles of writing.
- Legal Clerk: research and investigation tasks are fasters, but accuracy needs to be validated with other locked data sources.
Right now these responses are garnered from ChatGPT's access to the searchable universe. If ChatGPT uses both the searchable and unsearchable universes, and stores data for a longer time period (allowing consequential re-aggregation) the results willl change.
One major glaring issue I envision is data source weighting and what context gets priority when the results are delivered to users.
How does ChatGPT prioritise it's sources?
This is unknown to a casual user. There is still more human-based investigation, research and analysis to complete. There is not yet a lens over the contextualising process, and the maintainers of ChatGPT, the people looking after it acknowledge this area is a work-in-progress. The spectrum seems wide, results could improve accuracy or decrease certainty, or amplify biases or increase diversity, or entrench cultural norms or amplify historical inaccuracies unexpectedly.
Moving forward, in either scenario there are a significant impacts on our societies and communities:
1. Research Capability: new synthesizing services increase the use of various data-sets and therefore provide an analysis in seconds that might take someone months. Even if inaccurate, the results can be cross-referenced with other sources to narrow down and evaluate. This is helpful.
2. Research Quality: with ChatGPT's access to the world's free searchable universe and possible paid access to the unsearchable universe it could provide higher quality results. However there are ethical questions around weighting, critical-thinking and analysis that will need to be addressed. If not addressed soon, the results produced by ChatGPT could be harmful.
Underneath these concerns are cost and affordability issues to run ChatGPT. According to OpenAI co-founder and CEO Sam Altman, ChatGPT’s operating expenses are “eye-watering,” amounting to a few cents per chat in total compute costs. There is no doubt in the future that part of ChatGPT will be monetised therefore more easily accessible to folks with money.
In summary, ChatGPT provides some astounding advancements over traditional internet search and provides significant research capability to a variety of tasks or jobs that need synthesized information. Yet there are many unknowns about how ChatGPT will priortize and weight its data, and if it will be able to reutilise data from locked websites or archives.
In Te Reo Māori we say, "Kia Tūpato" or be careful. I prefer to be careful about using synthesizing tools at this early stage for serious projects. Great for fun projects and hobby study. There needs to be some thought about the ethical maturity of the system and how the maintainers and developers at OpenAI, the guardians of ChatGPT might quickly respond to inaccurate syntheses.