Today I feel the insatiable urge of building a tool to scrape the shit out of and export my timeline into an feed, or at least something that can be easily integrated into the .

I haven't posted anything on my Facebook profile for months nor accessed their website unless some friend shared a direct link to a photo or a video with me. Granted, I feel like the Fediverse is a much healthier place that doesn't make me feel as guilty as if I were chain-smoking and consuming junk food while driving a huge CO2-spewing SUV.

But, even if I met a lot of amazing people here, and I have even managed to bridge Twitter profiles and content from a vast trove of RSS feeds, and I have built bots that take a lot of interesting content directly to my door, and I have even managed to keep messaging my friends on Messenger/WhatsApp through Matrix and Bitlbee bridges, there's still an uncomfortable truth that doesn't make me sleep at night: Zuckerberg is still holding most of my family and friends as hostages, they will probably never move to the Fediverse, and I'm missing out on the lives of my loved ones (as well as on a lot of interesting events happening around me) because that content is behind a huge impenetrable wall.

I'm sick of hearing "Facebook should be compelled to federate, or at least open up their APIs for personal usage, but we don't know where to start". Or "the is great on paper, but it's hard to enforce". If regulators don't take the matter into their hands, then I will. And, if Facebook dares to sue me or lock my account, I'm ready to sue them back for violating the DMA. I'm ready to take this matter in front of courts and spend my money on lawyers, because I want Facebook and their highly immoral "high switching costs" strategy to die amid the worst conceivable pains in this universe - or at least I want them to be forced to open up the data of my loved ones.

I was looking around for some up-to-date Facebook scrapers, but all I could find was this project github.com/kevinzg/facebook-sc (which only scrapes public pages) and some commercial solutions that provide Facebook scraping for profiling and ads purposes (which makes me wanna spit and puke on the people behind those businesses for proudly showing off the worst that a human being can be capable of and making a profit out of it).

I made a Facebook scraper around 10 years ago, but back then their pages were relatively simple, and a bit of beautifulsoup scripting was enough to scrape the shit out of them. I've now taken a look at the developers console while browsing the website, and I've been horrified by how much effort they've put to prevent exactly what I was trying to do - the whole Facebook feed is basically a bunch of <script type="application/json"> tags that download some custom minified JavaScript for each post, that in turn is used to decrypt some other JSON requests.

So I'm appealing to all the hackers and tinkerers out there: are there FLOSS projects that already do what I'm trying to do (basically allow you to sign in to Facebook with your account, get an access token, and scrape posts and comments from your own timeline)? If not, are there any volunteers out there who would like to join forces with me in a new dog-and-cat war with Facebook - starting with reverse engineering whatever mechanism they've put in place to obfuscate the HTML on their timelines?

After my article on how to create / -> cross-posting bots, I did an experiment with @crossbot and let it run with ~10 different sources for a couple of weeks.

The idea was definitely successful: I brought with me to the Fediverse all the sources that I wanted to follow, without forcing them to move, and I actually didn't feel the urge to open Twitter/Facebook for "fear of missing out".

But I've realized that one single bot to manage multiple sources isn't ideal. People who may want to follow only some of them are forced to get on their timelines also content that they didn't ask for. Some people did indeed follow crossbot, but many also unfollowed it - probably because it posted too much, too often, and since all the content was coming from the same account it was hard to tell which was the source without actually reading the toot.

So I've decided to split it into multiple bots, one for each of the sources that I'm cross-posting. Feel free to follow any of these bots if you are interested in the content! But please also avoid commenting on their activities (there's no human behind the profile that can react). Instead, favourite/boost/re-share the link if you want to bring the discussion to the "human" sphere.

List of available bots:

- The Economist: @economist_bot
- Quanta Magazine: @quanta_bot
- Nautilus Magazine: @nautilus_bot
- Nature: @nature_bot
- Scientific American: @sciam_bot
- Phys.org: @physorg_bot
- The Gradient: @gradient_bot
- The Hacker News: @hackernews_bot
- Hackernoon: @hackernoon_bot
- IEEE: @ieee_bot
- IoT for All: @iot4all_bot
- Better Programming: @better_programming_bot

Also, feel free to comment on this post if you have any requests for interesting sources that are only available on Twitter/RSS and you'd like to bring here - I may definitely consider making a bot for them.

As a user, I can follow a lot of cool people, but I can't access content that is exclusively published over Twitter.

Until recently I still opened to check for updates by profiles such as MIT Technology Review, The Gradient, The Economist, Quanta Magazine or Phys, since none of those accounts cross-posts to the Fediverse.

That's no longer the case. I decided that instead of complaining about the mountain not moving to me, I should probably take the initiative and drag it myself.

So I have created a based on (and a sprinkle of ) that subscribes to a curated list that contains my feeds and with my favourite Twitter accounts (using nitter to bridge Twitter timelines to RSS), and forwards updates to my instance: social.platypush.tech/web/@cro.

If you're into science and tech content, feel free to follow it!

And I've written a blog article that explains how to build a bot like this, together with some random thoughts on the Fediverse.

blog.platypush.tech/article/Cr

It's time!

- ๐Ÿ‡ฎ๐Ÿ‡น geek in his mid-thirties, based in ๐Ÿ‡ณ๐Ÿ‡ฑ

- ๐ŸŽ“ M.Sc in computer engineering.

- My current job is about fixing and automating global supply chains, one line of code at the time, but I have worked in a wide range of industries over the past (nearly) two decades.

- My hobbies often involve automating everything around me.

- :linux: user since 2001. My experience as a Linux admin started back in a time when I used run my IRC and Apache servers on a repurposed Pentium 1 under my bed, and it still took about 10 ๐Ÿ’พ to install a full Slackware system.

- :arch: Linux and rolling release enthusiast.

- ๐Ÿ›  Creator and main developer of (platypush.tech), an open-source (mainly :python: and :vue:), general-purpose platform/framework to automate everything - from smart devices, to cloud services, to robots, to DevOps operations, to everything in between. With hundreds of available integrations, you can think of it as IFTTT+Tasker+SmartThings on steroids, scriptable, and runnable on almost any device. Or maybe like HomeAssistant's lighter brother.

- Admin of social.platypush.tech, a Mastodon instance where I may talk a lot about Platypush, automation, programming, electronics and maths. I tend to write a lot, so if you're looking for an instance with a 10,000 characters per toot limit...

- Looking for relays with instances dedicated to similar topics. My dream would be to build an experience, when it comes to , that is akin to curated lists, where admins can create curated federated experiences for the users on their platforms, rather than the open-to-everything overwhelming stream of toots on the federated timeline that most of the relays provide nowadays.

- ๐Ÿค– Machine-learning enthusiast. I have published a book on it link.springer.com/book/10.1007, with simple computer vision exercises that can be run on a , and I did some academic research back in time where neural networks were still a green field fabiomanganiello.com/#research, and I never stop learning new stuff.

- ๐Ÿงช๏ธ Physics, chemistry, biology, maths and astronomy enthusiast.

- ๐ŸŽต Music addict, ๐ŸŽธand ๐ŸŽน player, and occasional composer/producer You can find some of my music here open.spotify.com/artist/5H6BJf and here soundcloud.com/blacklight01

- I may often write about random politics/economics/philosophy. I may sometimes be very passionate on topics such as open-source, open data, open protocols, tolerance and social inequalities. I mostly belong to the progressive/social-democratic field. You are welcome to try and change my mind, as long as you do it in a civilized and data-driven way.

- ๐Ÿ„ and ๐Ÿ›น๏ธ rider. And, as a good Dutch resident, ๐Ÿšฒ enthusiast.

- ๐Ÿ‘ช Full-time father.

is launching , a.k.a. Authenticated Data Experiment.

It'll lead to a more decentralized social network, a more open-source platform, federation across instances, shared protocols, and users in charge of their own "Personal Data Repositories" that they can easily share and move around.

In other words, Twitter is reinventing , and ignoring years of progress already made on the protocols and infrastructures of the .

I'm really wondering what's the point. If you're a company like Twitter, that has already been struggling to turn profitable for the past decade, what's the point of pouring even more resources into rebuilding ActivityPub from scratch instead of reusing what's already available?

theverge.com/2022/5/4/23057473

Mastodon

A platform about automation, open-source, software development, data science, science and tech.