I have a dumb work related chrome thing, i’d like to make it so that when a certain notification sound plays in chromium, my computer does a few things automatically for me

Does anyone know a good way to make this happen?

I imagine it’d have to be setup like:

when chrome starts playing audio && check if that audio matches soundfile.ogg && myscript.sh, but I don’t know any good cli utilities that could get something like that done, and if there are any better ideas!

edit: to avoid X/Y issues i’ve summarized the problem in full here:

  1. I have a work program, this notifies me if I get a call or email, the work program then presents an accept/decline page, and does not proceed until I either accept, decline, or it times out.
  2. I want it to do two different things depending on if it’s a call or email
  3. It provides no notification other than the sound and an “accept” button on the page
  4. I have a chrome window open that does nothing but this, and I never use chrome for anything else
  5. I want to automatically do various things when I receive either this call or email
  6. I want it to be broadly applicable rather than a script designed for the specific website giving me the notification (so not a chrome extension). This prevents me from having to update any code in the event that the backend changes dramatically, and even if the notification sound changes, i’d just record a new sound as the activation noise.
  7. The noise is always the same, and hasn’t changed for many years, and there is a distinct noise between calls and emails
  8. They never overlap, they never play multiple times at the same time, and they never make any noises other than those two. The noises are distinct.

These factors cause me to want to run a script once the noise is recognized, only if the noise is playing in a particular app. I’m using pipewire/hyprland on arch.

My current plan for isolating the noise is to do the following:

pactl load-module module-combine-sink sink_name=‘Work’ slaves=‘easyeffects_sink’

and then set chrome exclusively to play audio on work.

Then set a script to check the sink work for audio that matches what I want. That should be simpler than the other methods i’ve seen to isolate the noise.

  • Album@lemmy.ca
    link
    fedilink
    arrow-up
    64
    arrow-down
    3
    ·
    edit-2
    8 months ago

    https://xyproblem.info/

    Don’t give people your solution and ask them how to do it. Start with your problem out of the gate.

    Instead of checking for audio maybe you can write a usescript to run actions based on what’s happening on the website. Dunno tho cuz Im making assumptions at what the problem is.

    • Communist@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      2
      ·
      8 months ago

      I actually want the sound thing because I think it would be cool for automating a lot of different things easily

      It wouldn’t be like, optimal in terms of power consumption, but an audio signal in a specific program being recognized by my computer and executing a script is generalizable and useable in many places.

        • teawrecks@sopuli.xyz
          link
          fedilink
          arrow-up
          2
          ·
          8 months ago

          I would be interested in a solution to OP’s specific question. I have a friend who will play a particular annoying meme clip over discord. I would like something that can listen for that clip being played, and immediately disconnect him from the voice channel 😁.

          Doesn’t need to be perfect. Misfires are also acceptable.

        • Communist@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          6
          ·
          8 months ago

          Not if it’s specified to a single app

          My chrome is literally only used for this, as are my other ideas, so, as long as it’s half-decent at one specific sound per app it should work…

          in theory

              • ReversalHatchery@beehaw.org
                link
                fedilink
                English
                arrow-up
                1
                ·
                8 months ago

                Or at least you think so. Are you sure for example that the trigger sound can not play more than once at a time, before the first one has finished?

                And then what if the webpage updates the sound it plays?

                • Communist@lemmy.mlOP
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  8 months ago

                  I am completely sure that cannot happen, the noise plays once until I deal with it

                  if it updates I’ll just make a new recording, but it has been the same for over 3 years

      • Nibodhika@lemmy.world
        link
        fedilink
        arrow-up
        13
        ·
        8 months ago

        You’re still only explaining the Y problem, not the X one. Want to solve Y? Here you go https://people.csail.mit.edu/hubert/pyaudio/docs/ also prepare to learn a lot about streams and different audio formats, etc. You might have something usable in a few weeks or months depending on how fast you’re able to learn those.

        And just so we’re clear, you mentioned chromium, so I’m 99.9% sure that there are easier solutions if you tell is the actual problem you’re trying to solve. There’s a reason no one is providing you with a simple script that does this, i.e. no one has ever needed this, and whenever you’re in a situation where no one has ever needed something before you might be a visionary or you might be missing something that’s obvious for everyone that came before and had the same problem you did.

        • Communist@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          8 months ago

          here you go, if you have a better idea, pitch it:

          1. I have a work program, this notifies me if I get a call or email, the work program then presents an accept/decline page, and does not proceed until I either accept, decline, or it times out.
          2. I want it to do two different things depending on if it’s a call or email
          3. It provides no notification other than the sound and an “accept” button on the page
          4. I have a chrome window open that does nothing but this, and I never use chrome for anything else
          5. I want to automatically do various things when I receive either this call or email
          6. I want it to be broadly applicable rather than a script designed for the specific website giving me the notification (so not a chrome extension). This prevents me from having to update any code in the event that the backend changes dramatically, and even if the notification sound changes, i’d just record a new sound as the activation noise.
          7. The noise is always the same, and hasn’t changed for many years, and there is a distinct noise between calls and emails
          8. They never overlap, they never play multiple times at the same time, and they never make any noises other than those two. The noises are distinct.

          but so far my solution is to setup dejavu to listen to a sink i’ve named work and then set chrome to play on that sink, and that sink will be setup to forward to my default audio device

          https://github.com/worldveil/dejavu

          • Nibodhika@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            8 months ago

            First thought was a chrome extension that detects if the button is on the screen, that should be easy. But since you don’t want that you could check how the site receives the information that you’ve got a call or an email, it’s either a periodic pull from the page, or most likely a websocket message from the server. Regardless you can use something like mitmproxy to intercept the communication and do things with it https://docs.mitmproxy.org/stable/api/mitmproxy/websocket.html this will allow you to analyze specifically what the page is receiving, so if there’s information on who’s calling or the subject of the email, or whatever it will be captured here in text which is a lot more easy to parse and analyze than audio.

        • flashgnash@lemm.ee
          link
          fedilink
          arrow-up
          1
          ·
          8 months ago

          He wants a script to trigger an alarm when he gets a call so he can get away with sleeping

          • Communist@lemmy.mlOP
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 months ago

            It’s actually much more malicious hahaha. But sometimes it may be used while sleeping.

      • onlinepersona@programming.dev
        link
        fedilink
        English
        arrow-up
        5
        ·
        8 months ago

        Are you on linux? If you’re using pipewire (or pulseaudio), you can connect the chromium audio pipe to your audio analyzer, analyze the audio, and execute commands on a match. Here’s an example of capturing audio with pipewire. It’s in C, but there’s also a Rust crate.

        Maybe gstreamer could make it easier. Audio analysis will probably be some library that you have to search for.

        What you’re trying to do is not very straight forward, IMO.

        CC BY-NC-SA 4.0

  • BCsven@lemmy.ca
    link
    fedilink
    arrow-up
    50
    arrow-down
    2
    ·
    8 months ago

    This smells of “So I work from home, but want to sleep, but if my boss pings me on teams I want an alarm to wake me up”

    • Communist@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      37
      ·
      8 months ago

      Is there anything wrong with that? Hahaha, it’s pretty similar but not quite that

      • Quazatron@lemmy.world
        link
        fedilink
        arrow-up
        21
        arrow-down
        1
        ·
        8 months ago

        Lazy people tend to be creative people, which is good, especially when confronted with boring activities.

        I’d solve it in hardware, maybe an ESP32 dongle with a mic pretending to be a keyboard.

        Seriously though, sounds like you need a more creative or fulfilling job.

        • Communist@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          7
          ·
          8 months ago

          Yeah the most fulfilling thing about this job has been figuring out how to automate as much of it as possible while still pretending to be a normal worker. It’s pretty terrible, i’m going to switch to herpetology eventually, but can’t do that right now for various reasons I don’t want to get into on a public forum.

          I’m at the top of every performance metric because of my inclination to be lazy as fuck with it though, so, it works.

      • ReversalHatchery@beehaw.org
        link
        fedilink
        English
        arrow-up
        4
        ·
        8 months ago

        “So I work from home, but want to sleep play games, but if my boss pings me on teams I want an alarm to wake me up”

        Depending on your employer it could be well okay

      • marcie (she/her)@lemmy.ml
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        8 months ago

        youre just thinking about it wrong. get an llm going, voice to text, and have a synthesizer copy your voice. 99% of your workload is now gone, no more endless meetings, and you got notes that can be quickly summarized.

        lets just hope that they dont ask you what 27 times 38 or something is. could maybe prompt it to say ‘Lets circle back around on that in an email’ whenever a complex question is asked, lmao

        • Communist@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          8 months ago

          I don’t have any meetings ever. An LLM really wouldn’t be able to do almost any of my work.

    • interdimensionalmeme@lemmy.ml
      link
      fedilink
      arrow-up
      3
      ·
      8 months ago

      What about have an LLM autoanswer and only wake me up if the bossman keeps talking and also autofill my timecard so I don’t have to

      • flashgnash@lemm.ee
        link
        fedilink
        arrow-up
        3
        ·
        8 months ago

        I’ll have an llm auto answer the day I fancy getting an angry call from my manager because I responded to him asking me to give him access to something with

        “I’m sorry, but as an AI language model I am unable to do that”

        • interdimensionalmeme@lemmy.ml
          link
          fedilink
          arrow-up
          2
          ·
          8 months ago

          “As an AI model, I cannot in good conscience give you access, explain clearly why you think you need access to this system and I will forward your message to a more competent authority”

  • slickJujitsu@lemmy.today
    link
    fedilink
    English
    arrow-up
    16
    ·
    8 months ago

    This Github is for detecting sound playing and sending it to Shazam. Perhaps you can use the features to capture audio and find another example of audio comparison for the other half?

    • Communist@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      8 months ago

      It’s really not in this case, I can see why people think that since i’ve been vague, but tbh I thought somebody would have already made an easy sound recognition program and I just hadn’t seen it, and that once someone pointed that to me the rest would be easy.

      Here is the entirety of the problem:

      1. I have a work program, this notifies me if I get a call or email, the work program then presents an accept/decline page, and does not proceed until I either accept, decline, or it times out.
      2. I want it to do two different things depending on if it’s a call or email
      3. It provides no notification other than the sound and an “accept” button on the page
      4. I have a chrome window open that does nothing but this, and I never use chrome for anything else
      5. I want to automatically do various things when I receive either this call or email
      6. I want it to be broadly applicable rather than a script designed for the specific website giving me the notification (so not a chrome extension). This prevents me from having to update any code in the event that the backend changes dramatically, and even if the notification sound changes, i’d just record a new sound as the activation noise.
      7. The noise is always the same, and hasn’t changed for many years, and there is a distinct noise between calls and emails
      8. They never overlap, they never play multiple times at the same time, and they never make any noises other than those two. The noises are distinct.

      These factors cause me to want to run a script once the noise is recognized, only if the noise is playing in a particular app. I’m using pipewire/hyprland on arch.

      edit: actually they have, it should be really easy with this: https://github.com/worldveil/dejavu

      • flashgnash@lemm.ee
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        8 months ago

        This is absolutely an xy problem. Your problem is that you need to programmatically respond to notifications across multiple applications

        You are asking for help with a solution based on notification sounds which is one possible solution but a bit of a weird one

        • Communist@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          8 months ago

          It does not give a desktop notification, or even a proper chrome notification, it’s just a dialogue on a page that says accept/deny

          I said that in the post. The sound is the only thing to hook into. It doesn’t even set chrome as urgent.

        • Communist@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          8 months ago

          That’s an interesting solution that i’d rather avoid because it’s proprietary

          Also, that wouldn’t distinguish the two states of call/email, I don’t think.

          • DaGeek247@fedia.io
            link
            fedilink
            arrow-up
            1
            ·
            8 months ago

            It was the first result i found for ‘monitor webpage chrome’. There’s a very good chance that what you’re after is just two or three results lower. I promise it would be a lot easier than developing an entirely new solution that works based on speaker sounds. Bonus, these extensions have basic stuff like sms or email support right out of the box.

  • onlinepersona@programming.dev
    link
    fedilink
    English
    arrow-up
    9
    ·
    8 months ago

    I think it would possibly be easier to write an extension. You can inspire yourself from this extension, intercept media playback, possibly hash the media being played, compare the hash to a known DB you create, and call a script in response to a positive detection.

    CC BY-NC-SA 4.0

  • tunetardis@lemmy.ca
    link
    fedilink
    arrow-up
    8
    ·
    8 months ago

    I have some vague recollection of a hacker convention from the 90s where people were challenged to come up with wireless networking in a one night coding marathon. (This was long before wifi.) So some dude used speech synthesis to get a machine to say “one zero one one zero…” and another to assemble the binary data into packets using speech recognition. It was hilarious, and the dev had to keep telling people to shut up and stop laughing so he could complete the demo.

    But anyways… what I’m trying to suggest here is you might have the best luck if your notification sounds contain spoken commands and you use speech recognition to trigger scripts? That tech is pretty mature at this point.

  • thevoidzero@lemmy.world
    link
    fedilink
    arrow-up
    8
    ·
    8 months ago

    Someone already talked about the XY problem, so I’ll say this.

    Why sound notification instead of notification content? If your notification program (dunst in my case) have pattern matching or calling scripts based on patterns and the script has access to which app, notification title, contents etc. then it’s just about calling something in your bash script.

    And any time you wanna add that functionality to something else, add one more line with a different pattern or add a condition in your script. Comparing text is lot more reliable than audio.

    Of course your use case could be completely different, so maybe give some examples of use case so people can give you different ways to solve that instead of just the one you’re thinking of.

  • JovialSodium@lemmy.sdf.org
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    8 months ago

    Maybe you can do something with the tampermonkey extension to catch when that audio is triggered and have it do an api call that your script catches?

    I don’t know if that’ll actually work, I know of the extension but have never it used nor am I skilled with Javascript but it seems feasible.

    • Communist@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      8 months ago

      That sounds like a somewhat appealing solution, however, i’d like this to be more broadly applicable, i’d like it if even if it wasn’t chrome, and was some other application making a particular noise, I could easily execute a script whenever that particular noise is played, allowing me to automate a bunch of things rather than just one specific weird thing.

  • qjkxbmwvz@startrek.website
    link
    fedilink
    arrow-up
    4
    ·
    8 months ago

    Can you isolate the call to the sound from the DevTools? And if so, does DevTools allow you to edit the function? Perhaps you could GET/POST something on localhost which could trigger a shell script.

  • ace_garp@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    8 months ago

    Expect automates things, based on text-input captured from a terminal.

    Not sure if it has been extended/hacked to take sound as an input.

  • bloodfart@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    8 months ago

    My assumption is that you don’t care if your notification gets spoofed, ex. Someone rings a little bell and the script deletes all cookies from porn websites as if the little bell notification played.

    So I think the hardest and best way to do this is to have the script run on a separate device than the sound plays on.

    First record the sound you want to trigger with. Use the script executing device with the microphone and interface you’ll be using in production set up in the location of production to make it easier on yourself.

    Now reduce the bitrate of the target sound a lot. No, more than that, keep going, a little more, that’s perfect.

    Now write something that will capture the last target_sound_length seconds of audio and compare it with the bitcrushed version. Depending on your device, there may be a buffer object in the adc you can interface with, although if it’s running a normal operating system you won’t be able to just get to it without going through the os first.

    If you can go through the chrome notificationing machine, figure out the hook used to trigger the notification you want to respond to and intercept and perform the script. No nyquist needed!

  • flashgnash@lemm.ee
    link
    fedilink
    arrow-up
    1
    ·
    8 months ago

    Sometimes the easiest solution is multiple solutions

    Maybe just write something to hook into the notification in Chrome, there’s probably a way to get that working within an electron app too if the desktop app (teams?) is electron

    • Communist@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      8 months ago

      That won’t work if the backend ever changes, and will be locked into a single program

      https://github.com/JorenSix/Olaf I’ve decided to use this, i’ll probably have a solution this week, i have to actually record the sounds my next workday, then i’ll test it. Seems much easier to do than making a chrome extension, honestly.