Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and then some time on kbin.social.

  • 0 Posts
  • 713 Comments
Joined 8 months ago
cake
Cake day: March 3rd, 2024

help-circle

  • My point is that if we turn up our gibberish dial now then at least our llms will be learning the wrong thing & we have some control.

    We’d be covering ourselves in poop to prevent people from sitting next to us on the train. Sure, people will avoid sitting next to us, but in the meantime we’ll be covered in poop.

    And then other people will learn the trick, cover themselves in poop too, and now everyone’s poopy and the trick stops working.

    There is still a lot of understanding that we do automatically that an llm will never do.

    Are you willing to bet the convenience of comprehensible online discourse on that? “Automatically understanding stuff” is basically the one job of LLMs.

    LLMs model language, and coming up with some kind of “gibberish” filter is simply inventing a new language. If there’s semantic meaning in it the LLMs will figure it out just like any other language, and if there isn’t semantic meaning then we’ve lost the ability to communicate entirely. I see no upside.



  • Well, the “at least for now” part is my point - if people start using “gibberish” to communicate or to hide their communication, that provides training material for LLMs to let them figure out how to use it too.

    LLMs learn how to communicate based on existing examples of communication. As long as humans are communicating with each other somehow then LLMs will be able to train how to do that too. They have the same communication capabilities that we do at this point, so there’s not really any way we can make a secret clubhouse that they can’t figure out how to infiltrate.

    Personally, I think there’s two main routes we can go to deal with this. Either we can simply accept that there’s no way to be 100% sure we’re talking to a human any more and evaluate the value of our conversation based on the content of the words spoken rather than the composition of the entity generating them, or we could come up with some kind of “proof of personhood” system to allow people to label the text the write as coming from them.

    The latter is extremely hard to do, of course, both from a technical and cultural perspective. And such a system would likely still allow someone’s “person token” to be sneakily used by AI, either by voluntarily delegating it (I could very well be retyping all of this out of a ChatGPT window) or through hackery.

    So I’m inclined toward the former. If I’m chatting with someone and I’m having a good time doing it, and then later I find out it was a bot, why should that change how much fun I had?


  • I don’t see how that would be practical. People who aren’t “in on the joke”, as it were, will call out the gibberish and downvote it. If enough people are “in on the joke” then the whole forum becomes useless and some other forum will be created to fill the role of the original. The AI will train off of that one.

    Basically, if you don’t want an AI training on your content, then don’t post your content in public where an AI will see it. The Fediverse is the last place you should be posting since its very nature is about openly broadcasting your content to whoever wants to see it.












  • Heh. I fell off of contributing in recent years, but there was a time back in the day when my edit count was in the top hundred or so. Your impression is completely wrong.

    Anyway, this discussion here isn’t going to affect what the people on Wikipedia are doing, so it doesn’t really matter. I linked to the project page above and it’s quite clear that even this “AI Cleanup” project is not in any way fundamentally opposed to using AI, they’re just focused on ensuring that editors using it are adhering to Wikipedia’s guidelines. If you think AI can’t do that then clearly your concept of how AI is useful is too limited.


  • You’re probably assuming that someone would just go to an LLM and say “write a Wikipedia article about subject X”? That wouldn’t work well, but that’s very far from the only way to use LLMs for Wikipedia work.

    For starters, it doesn’t have to actually write content at all. You could paste an existing article into an LLM and ask it “What facts in this article lack references to back them up? Are there any weasel-worded statements, or statements that don’t appear to follow a neutral point of view?” And get lists of things that require attention.

    Or you could paste a poorly-worded article in and tell it to rewrite it with all the same information but better phrasing or structure. You could put a bunch of research materials you’ve gathered into the LLM’s context and tell it to write a summary in the style of a Wikipedia article, with references to the sources for each fact mentioned. Obviously you’d check the LLM’s work afterward and probably do some manual editing, but this would be a great time and effort saver to get a first draft written. You could take an existing article and tell the LLM that some particular fact had changed or been discovered to be incorrect and ask it to rewrite the relevant parts to account for that.

    Wikipedia is in many, many languages. You could have a multilingual LLM automatically compare the contents of different language versions of a Wikipedia article and ask it to spot differences in content or tone. You could have an LLM translate an article from one language to another as a starting point for creating an article in that new language.

    You could have the LLM check the references of an existing article - look up each referenced work on the web and see whether it genuinely says what the article that’s using it as a reference says. It could flag all manner of subtle problems that way. Perhaps the reference sounds biased, or whoever used it as a reference misinterpreted it, or the link was simply incorrect and points to unrelated material. Being able to have an AI do a first-pass check of all that in a completely automated way would save huge amounts of time.

    This is all just brainstorming off the top of my head, so I’m sure there’s plenty of other good uses that aren’t coming to mind.



  • They’re not talking about the same thing.

    Last week, researchers at the Allen Institute for Artificial Intelligence (Ai2) released a new family of open-source multimodal models competitive with state-of-the-art models like OpenAI’s GPT-4o—but an order of magnitude smaller.

    That’s in reference to the size of the model itself.

    They then compiled a more focused, higher quality dataset of around 700,000 images and 1.3 million captions to train new models with visual capabilities. That may sound like a lot, but it’s on the order of 1,000 times less data than what’s used in proprietary multimodal models.

    That’s in reference to the size of the training data that was used to train the model.

    Minimizing both of those things is useful, but for different reasons. Smaller training sets make the model cheaper to train, and a smaller model makes the model cheaper to run.