manfred feiger

Who owns the truth? – Part two: the web

AI and the future of the web

Published: November 6, 2023
Reading time < 13 minutes
Categories: | | |
2023-11-06T10:44:24+00:00

Thoughts on the web being influenced by AI after around one year after ChatGPT and AI image generators took over the net.

This is my second post in a row on reflections of our current status quo taking into consideration the developments of AI within the last year.

My first article addressed general issues concerning the training of large language models (LLMs). It delved into the complexities of language and data, and also considered the potential first major application for AI: AI Companions.

In this episode I will focus on the role of AI for the web, and how the web it is changing and what might be implications that follow.

To get this article going, I want to directly start with a video you need to watch, as it is brilliant; it’s from Heydon Pickering, called “What’s the deal with large language models”, being released some days ago (30th October, 2023).

What's The Deal With Large Language Models? – watch on YouTube

So hopefully you had a chance to watch this brilliant video. If not, there’s one aspect starting at the end of the video, which is also one of my major concerns at the moment regarding AI and the World Wide Web. In the future, AI will increasingly train itself using data it generates.

Heydon describes it as:

The more we generate content and the more it is allowed to generate itself, the further the necrosis will spread. Rapidly, generative media will overwhelm the nominally “real” media from which it was originally trained, forcing it to reproduce with itself. This incestuous feedback loop will obliterate what remains of culture, leaving us gasping for brains, whatever the fuck they are.

Who can be trusted on the internet?

As you may suspect, the question of trust prompted me to share my thoughts on my blog. The root question re-emerges: "who owns the truth?" We all understand that no one owns the truth and that there are many credible sources available.

It has a tendency to just make stuff up out of thin air which is just really bad for Wikipedia — that’s just not OK. We’ve got to be really careful about that.

Jimmy Wales, Founder of Wikipedia on the standard on ChatGPT

Currently, we have our sources to trust. We might have our daily newspaper sources, we might trust Wikipedia and some other official sources as well. The question about the usage of AI in all those familiar publications is one that is already in discussion. There are guidelines for Wikipedia on the usage of AI and as Jimmy Wales explains in an Article (”Will Wikipedia be written by AI? Founder Jimmy Wales is thinking about it”) on the Standard, that the key point is checking if the provided information is true, so some examples were shown on how ChatGPT proposed a truth that is not existing.

In the article Jimmy Wales seemed to be very critical about AI, or especially about ChatGPT, as it is a serious contender to provide knowledge. Especially when more and more capabilities will be packed into AI companions. The truth is, that Wikipedia is also exploring the use of Machine Learning; so there’s ORES (The Objective Revision Evaluation Service), which helps human editors to predict the quality of edits or flags (and the decision-making to keep information or change it). But as you see, this is AI in the role of the wingman or buddy; not in the role of the “source”.

One of the issues with the existing ChatGPT is what they call in the field 'hallucinating' — I call it lying,

Jimmy Wales, Founder of Wikipedia on the standard

The issue with Chatbot hallucinations

The term Chatbot hallucination refers to instances where a chatbot generates incorrect or nonsensical responses, seemingly "imagining" things that aren't based on its training data or the input it has received. The advanced language model generates fictional responses, presenting them as factual information to the user.

Since a large language model is only as good as the data it has to work with, the model will occasionally return an answer that is patently false as the truth, which many have regarded as a serious flaw in AI models, especially if they are relied upon to automate tasks.

There's a good example of how search engines use this hallucinated information and present it as a fact in Wired magazine article “Chatbot Hallucinations Are Poisoning Web Search”.

So what happens if the data AI models are being trained on are messed up by AI itself, as we have seen in Heydon Pickering's video or my entrance issue?

Quick money and spam

Looking at emerging tools around AI, we see many tools being able to write the content for you. Even entire blog posts or websites can be generated with the help of AI tools. What is the specific objective? To attract more website traffic, impact search engine algorithms, and generate a higher number of visitors. That wouldn’t be too bad, if it wasn’t only for the purpose of getting quick money by advertising.

So nowadays you will find tutorials on how to generate faceless YouTube videos almost entirely done by AI, you will find many examples on how to generate books for Amazon KDP also mainly generated by AI and of course a lot of those automated tools for writing blog posts and influence search to get some money from display advertising.

In my point of view, most of this stuff is a new breed of spam. Though the content quality of my mentioned use cases is higher than original spam mails, the problem is much bigger and wider. As vise magazine shared in their article “AI Spam Is Already Flooding the Internet and It Has an Obvious Tell”, many fake user profiles in social media now use AI to write their posts.

Screenshot of fist spam E-Mail, taken from Internet Artifacts
Screenshot from Monty Python – Spam sketch on YouTube
“Spam, Spam, Spam, Spam… Lovely Spam! Wonderful Spam!”

The origin of the word spam is related to a Monthy Python sketch being broadcasted in 1970.

On a personal level I never had a big issue with spam, as long as it would be obvious it is spam. Spam is part of internet history. Gary Thuerk, a marketing manager at Digital Equipment Corporation, sent the first spam email to around 300 to 400 ARPANET users, promoting a mainframe computer on the 1st of May in 1978.

Based on the great website “internet artifacts”, Thuerk claimed he sold $13 to $14 million worth of mainframe computers through the campaign.

Today, we categorize spam messages into different types. The obvious one is the classic scam (like a relative in Africa promising an inheritance of 5 million dollars), and the rest we label as "spammy". We don't differentiate between mass mail and a spammy email. We determine which one belongs to which category.

The advantage of classical spam mailings: it’s obvious.

The same applies to good old display advertising: it’s obvious.

While dark parts of content marketing and other advertising sub-formats generated the hidden appearance to influence buying behavior, that is not obvious, most people learned to distinguish or look for little hints to identify if it’s “real” content or advertising.

Now we face contents being generated but not obviously marked as being generated. Truth is fading away. Being able to identify the truth is harder than ever.

And the consequences are not yet obvious. Will the “free” internet in terms of media consumption as we know it finally die? Or will the massive amount of content lead to a final breakdown of the web? Or will the internet in general not be trustworthy anymore?

Too many negative perspectives, as there are ways to gain trust. Gain trust by embracing a culture that shaped the Word Wide Web for long decades: blogging.

The Garden is the web as topology. The web as space. It’s the integrative web, the iterative web, the web as an arrangement and rearrangement of things to one another.

Mike Caulfiled in “The garden and the stream: a technopastoral

Digital gardening, Indieweb and trust in people

Exploring the current web, one mustn’t only see the downsides there are with the harmful use of AI. Of course, it’s always us users and our use to facilitate or abandon a technology. Being used as a wingman, AI is excellent.

The issue with trust in our world made of quick money thinking is more difficult, and therefore I show some developments that put the trust on a different level. On the level of expertise, attitude of thought, all supported by a clear sender.

Screenshot of "the Garden" by Maggie Appleton

Digital gardens, Second brains or Zettelkasten

A digital garden is like a mix of a notebook and a blog. It's a place online where people share ideas that they want to grow over time. Some people may refer to the Digital Garden as a "second brain". It's a method for storing and managing knowledge in a maintainable, scalable, and searchable manner. So you could also call it a digital Zettelkasten.

In blogging, you often write a post and then forget about it. But here, we treat posts like plants. Some may not survive, but others will grow and provide ongoing knowledge for both us and the community.

To get an idea about digital gardening examples, you could explore some sites: Gwern’s site (best check the about section to get an understanding about what his site is about), the wiki knowledge of Nikita Voloboev or beautifully crafted example of Maggie Appleton. When you examine the examples, you will clearly observe that every garden is unique in terms of the content's richness and the quantity being shared.

Here you find a list with more examples on GitHub.

If you want to start with digital gardening, you may face many technical questions. So right now, I wouldn't suggest digital gardening (especially on the web) for those unfamiliar with how to connect tools using APIs or some essential coding skills.

With digital gardening, you’re talking to yourself. You focus on what you want to cultivate over time.

Tom Critchlow,, from MIT Technology Review “Digital gardens let you cultivate your own little bit of the internet

To begin, you could try Obsidian Publish. Alternatively, you could use a local solution like Roam Research or Notion (search for 'Notion as Zettelkasten').

Currently, there’s no critical mass for digital gardens to go mainstream. But the thought behind it and the transparency of thought or thinking is a great source of trust as the gather process or the thinking process is transparent.

Working with a digital garden seems to be a prototype of how to learn online: one gets creative to connect thoughts, one could collaborate and build upon ideas of others, one gets critical while collecting and connecting the thoughts to build an own representation of the world and finally, especially, if you share your process on the web, it’s a medium of communication and expression of thought that could flourish your seeds.

My recommended reading to dive deeper into digital gardens is from Maggie Appleton: “A Brief History & Ethos of the Digital Garden”.

Blog about it – the IndieWeb, a people-focused alternative to the “corporate web” and the Fediverse

Like digital gardens, the IndieWeb movement helps you take back control of your content from social media. Their principles - your content belongs to you, you should be well-connected, and you should be in control - have inspired me since 2021 to start this blog.

My original blog post at that time was written in German and described the tricky journey of setting up the page.

In addition to certain formal methods used in IndieWeb publishing, like Microformats 2 and Webmention, it's also about how you share your content. This is often done through POSSE (Publish (on your) Own Site, Syndicate Everywhere). But fundamentally, IndieWeb is about a renaissance of blogging.

Everyone should have their own blog. Blogs allow you to express your thoughts and what moves you. And the best is that it’s obvious who is sending the information. There’s no anonymous AI source for the content. And if there’s some false information it’s quite likely that someone will tell you, you are wrong about this or that.

In a POSSE world, everybody owns a domain name, and everybody has a blog.

David Pierce on the Verge “The poster’s guide to the internet of the future

POSSE is the best way in general to publish content to the web. It shouldn’t matter in which channel you post. You post something, and you are able to distribute the information elsewhere, but the original content belongs to you; not a platform.

The web itself could be seen as the next social network. With the help of tech like ActivityPub, any website can become a social media profile. This is the main concept of the Fediverse. The aim is to make an account on any platform in the Fediverse, and chat with users on all other platforms, without needing a new account.

In October 2023, ActivityPub got integrated into WordPress.com. So websites hosted by WordPress⁣⁣.⁣com can now be followed, reposted, liked, and commented upon by people using Mastodon and other social web apps in just 1 click. For everyone else, if you're using WordPress as a Content Management System, there's a plugin that's been available for a few years.

Unfortunately, similar to digital gardens, there are some barriers which make it harder to get going with IndieWeb, the Fediverse or some other related movement to turn the web back to what it meant to be. Again, the technical site isn’t ready for mainstream yet. With ActivityPub being integrated into WordPress.com one first step towards a broader popularity was done.

Summary on the use of AI for the web

I wouldn’t be able to share all thoughts in an article like this, so first I wanted to share some more recent developments that made my think about the topic itself.

My major concern is the future of the web itself. There’s so much potential, and it would be great if everyone would start to blog and share their thoughts in a free and open space. Not on a platform.

If it comes to AI it’s more or less the use of AI that is frightening. It’s the use of AI for quick money and so a lot of shit contents are being produced on the web. This is not a new development, as all assisting writing tools helped to generate contents faster during the last years and people already started to use it to gain more SEO momentum.

Search engines are hopefully dead anyway sooner or later. The lack of quality was an issue I wrote about last year in “Value, objectivity and the risk of being trapped by search”. The issue of being trapped gets bigger and bigger with bad contents being generated and taken as facts.

Being trained over and over on the same source doesn’t sound like a bright future. Currently, I mainly spoke about content in the shape of text material. But also on the image site (luckily) there are ways now to poison image data.

Nightshade is a method that can be used by content creators to defend their intellectual properties against illegitimate use. Not to mention that it could also be used on purpose to destroy something.

Screenshot of a YouTube talk on data poisoning attacks in neural networks

I guess most of us never thought about the manipulation or the poisoning of the data sets before. But especially for future AI companions that do harm, hidden in their agenda, it sounds like a nightmare, especially if we think about the issue of AI consciousness.

This is a much huger topic itself and I could recommend the reading of “An Introduction to the Problems of AI Consciousness” in the Gradient.

The web is at risk of losing trust, and we all need to protect this valuable resource and restore its integrity.

Maybe initiatives such as the “data provenance explorer” help to prevent misuse and to find more transparency in the use of datasets in AI.

I've never been a big fan of social media like Facebook due to my early experiences with online communities. I saw the negative side of human behavior, like deceit and manipulation, and it discouraged me from participating in social media.

What I care about is the quality of information on the web. It's concerning to see the amount of misleading content and poor-quality information. As a lecturer, I've noticed how unreliable information can mislead students. We need to restore trust in online content and the people who create it.

I know I need to contribute more to this effort. I don't post on my blog as often as I should, and I'm considering revamping my site. I'm thinking of sharing more of my notes and discoveries to contribute to the spread of accurate information.

Technology is not making our lives easier, but it accelerates it enormously. To not get trapped by speed, information needs to be trustworthy.

Truth itself is a concept. Science is searching for truth and truth is mostly a status quo. All information transmitted prioritizes and discards certain contexts, which is why it's helpful to constantly expose ourselves to new ideas.

In my next part I will look at the world of images and AI art more closely.

Further Resources

Leave a Reply

Your email address will not be published. Required fields are marked *

More Posts