Lemmy Project Priorities Observations

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

I was wrong about PostgreSQL priority in Lemmy. "Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb" comes to mind

I clearly have failed to understand how integral to the social identity of the Lemmy community server crashes have become. The "little guy taking on Reddit and Twitter" for 4 years. I have personal DB2 and PostgreSQL experience in production "mission critical" applications. I should have seen that for 4 years the developers had all the social experience of social -- like Hollywood films or Music Industry labels. My brain damaged mind was WORRIED about the June 4, 2023 GitHub Issue 2910 so much that I started organizing [!lemmyperformance@lemmy.ml](https://lemmy.ml/c/lemmyperformance) on June 13 after lemmy.ml upgraded hardware and I saw more crashes. It's cultural. I should not have looked into the kitchen and let my personal DB2 and PostgreSQL memories haunt me that this was "production" system crashing. I fucked up, and I've wasted 2 months in WORRY about the Reddit API deadline the first month, and now the Elon Musk X rename, and I admit - I didn't get the social experience the developers have for 4 years working "social communities". I study religions, I should have interpreted it using that hat instead of "production PostgreSQL or DB2". It's been a bad year for me, and I've made several big mistakes. I had to drink to clear my mind for the first time this year since New Years. Those memories of running "mission critical" servers had to be shut off, stop applying them to socially-driven Lemmy culture. The developers have been running this site for 4 years, they know modern audiences and what works - I've just been pissing into the wind with the wrong interpretation :(

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 65 - perhaps I have been wrong all along. And getting donations for hardware and servers was why they felt server crashes were not a urgent problem to fix

Perhaps I've had the wrong attitude about the server needing to be reliable, data loss importance, and crashes causing all the front-ends to look bad, etc When I step back and study the history of Microsoft vs. Apple. the "open hardware" approach of Microsoft proved to be abandoned. Apple always had the best loyalty by NOT offering hardware choice to their customers. and Linux, which offered the most hardware choices of all, was even less popular. Apple knew people didn't like to pay money for software, so they were an operating system company who packaged it with hardware. I know project developers like Lemmy do not get good money. Maybe I have been far too harsh on them ignoring PostgreSQL crashes and should just face up that the Lemmy community, the users of Lemmy, think it is all perfectly fine. 32 cores of CPU to run a server without all that much content, it bothered me a lot to see the money being spent like that and to witness server crashing. But there are people here who seem to actually enjoy all of it. It makes the fight against Big Reddit and Big Twitter seem more dramatic. If you look at how Hollywood productions are run in terms of expenses for sets, clothes, shooting locations - it can be millions of dollars for 90 seconds of a movie. Because it's in the sphere of social media, films are social. Perhaps I've been the wrong kind of fool and their approach to just allow crashes and not treat it like an important production server - is actually intuitive to how social projects of "taking on the big guys" works. If it was closed-source paid-for software, crashes would be an issue to challenge. But with open source, they seem to have taken the attitude that anyone can fix it - so not their problem. They really emphasized to me what they mean by "supported". And if lemmy.ml is their idea of a "supported" server, I should take it as reality and give up any idea I have that easily-fixed crashes are important. It's product defining. It was hard for me to come to terms with. I should have stuck to topics like ChatGPT, Neil Postman, Marshall McLuhan.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 65 - the lemmy audience, the authors of posts and comments, the voters

Hate of Twitter, hate of Reddit seems to have been the primary motivation for many people to join Lemmy. In the past 2 months, I have seen many people requesting feature in apps to block entire instances of Lemmy or enhance other blocking features. Bashing on Facebook was often a thing you saw on Reddit, bashing end-users. Are people largely attracted to this? I'm mostly just saying it out loud. These trends like what Elon Musk has done to Twitter, or ChatGPT becoming mainstream in 2023 wit all it's misinformation creation abilities... are happening. Regardless of the anything going on with Lemmy. ChatGPT is probably the bigger real change. The news/world news on Lemmy isn't really that different than Reddit from my 60+ days of reading. Same general style of comments, reactions, and flow of postings.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 64 - Lemmy project priorities - GitHub project brags "high performance" Rust code, "full delete" of content, all lies / reality distortion psyche

1. When[ I personally create test](https://github.com/LemmyNet/lemmy/pull/3657)s to validate "full delete", it fails, with a 4-year old bragging claim on GitHub project page 2. High performance. Constant server crashes from site_aggregates table getting EVERY row ***modified*** via badly-written 2022 SQL TRIGGER when a single comment or post is inserted.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Lemmy project tries to LURE criticism into PRIVATE conversations/chat

![](https://lemmy.ml/pictrs/image/b7462644-31f1-4862-8d86-60844b4cba4f.png)

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

lemmy.ml GitHub Issue 2910, WHY since June 4?

![](https://lemmy.ml/pictrs/image/cdad2989-2b69-4879-991a-99a5068db26c.png)

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 64 - i know of Surkov in Russia and Cambridge Analytica in United Kingdom, UK

I am not ignorant of the techniques of post-2013 Surkov IRA in Saint Pete Russia psychology, and post-2013 Psychology of Cambridge Analytica.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

I am sorry I have let the Lemmy end-users down, especially newcomers who gave up and abandoned new server Instances and end-users who had so many server-crash garbage errors

My bran damage is unable to create English prose able to convince developers to listen to and face [June 4 issue 2910 in GitHub.](https://github.com/LemmyNet/lemmy/issues/2910) I tried to put compassion, kindness, love, humanism first with creation of June 13 [!lemmyperformance@lemmy.ml](https://lemmy.ml/c/lemmyperformance) I am a worthless failure :( Over 50 days of effort since June 4 issue 2910 on Github, and I failed to get them to understand the end-user experience of endless crashes on lemmy.ml and all servers with TRIGGER UPDATE statements

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 64 - my June 24 request to have developers please document the obfuscated SQL TRIGGER logic in the Rust lemmy_server project....

June 24, a full week before the Reddit API delay and my June 13 post in [!lemmyperformance@lemmy.ml](https://lemmy.ml/c/lemmyperformance) about paying more attention to the PostgreSQL code they had put into Lemmy Server that was causing constant crashes as called out June 4 by GitHub issue 2910 The effort developers have made to DISCOURAGE focus attention on the PostgreSQL TRIGGER code: https://github.com/LemmyNet/lemmy-docs/issues/233 Removing "emergency" in the fix first-thing Monday after not working on a weekend was how out-of-the-way the project leaders have been to AVOID the TRIGGER code getting scrutiny and fixed since June 4 call-out in issue 2910

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 64 - as fast as humanly possible, lead developers close out an issue with crashes during upgrade from 0.18.2 to 0.18.3 caused by: PostgreSQL - GitHub Issue 3756 today

Just like they do not work on weekends, and first thing Monday their priority on GitHub was to EDIT a pull request to remove the word "EMERGENCY" (server crashes) and then go on to add low-importance new features all week. PostgreSQL related crashes and mistakes in the Lemmy Project back-end are a pattern, [as June 4 2023 GitHub issue 2910 clearly documents](https://github.com/LemmyNet/lemmy/issues/2910) - which sat ignored for months. ![](https://lemmy.ml/pictrs/image/091b37eb-61d1-4f60-9d05-7710f27df563.png) As fast as possible, closing and brushing under the rug GitHub Issue 3756 that was field-testing of the Lemmy 0.18.3 release (there was also NO CALL or Announcement by the project leaders to test 0.18.2 or 0.18.3 - there is no quality assurance on the project from all that I have seen). If the developers of the program do not grasp PostgreSQL - why constantly cover it up? Why brush SQL problems under the rug? It's the core of Lemmy, where all the data is stored. I have some new books on order to better understand social hazing.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 64 - before the weekend, 0.18.3 release, this particular phase of GitHub Issue 2910 avoidance and social hazing is ending? What new chapters will begin?

From what I can piece together, there were some smaller flocks of Reddit users coming to Lemmy before I started using Lemmy hours each day - 64 days ago on May 25. May 25, every Lemmy instance on the recommend list was crashing for me. Very obvious signs of PostgreSQL performance problems in the code. Refresh of the listing would fail 1 out of 5 times, often as much as half. Beehaw, Lemmy.ml, Lemmy.world and I visted the non-English instances too - but I really did not see anything of what I would consider significant post and comment activity. May 29, I am searching and reading Lemmy content now for 4 days (between constant crashing). I can nod find the developers sharing their PostgreSQL problems in communities and trying to fix crashes. They seem to be avoiding using Lemmy, why? I don't understand. But I keep reading. June 1, it was clear that server crashing was everywhere and nothing was being done about it. I start reading GitHub issues and pull requests multiple times a day, and trying to understand the priorities of the project since I can find no Lemmy community for discussing the ongoing server overload and performance problems in lemmy_server - it must be somewhere? Discussions are free to host on GitHub, but they are disabled by the project leaders. They are not using Github discussions, they are not using Lemmy communities. I'm perplexed. June 2, I find out that the project leaders run lemmy.ml - so I focus on hanging out there to witness change management. I finally managed to get an account created on lemmy.ml past all the crashing. June 4, GitHub Issue 2910 is opened. The PostgreSQL TRIGGER is directly identified as causing servers to "blow up". This is a very easy fix, it can even be done without having to recompile lemmy_server and make a back-end release. A bash script or even just a web page of PostgreSQL steps (like the existing Lemmy "from scratch install" has) would provide huge and immediate relief. June 4 was a Sunday. I watch project leaders who do not work on weekends come in Monday on June 5, Tuesday June 6, Wednesday June 7, etc and ignore dramatic Issue about PostgreSQL, issue 2910. I am still looking over lemmy.ml between constant server crashes trying to find evidence that the project leaders of Lemmy actually ask for help in [!postgresql@lemmy.ml](https://lemmy.ml/c/postgresql) from the Lemmy community. **Crickets, they don't use Lemmy to discuss and seek help regarding the crashes of Lemmy. I am wildly perplexed and I do not understand the social hazing yet.** Another weekend the project leaders aren't around. Monday June 10 comes. Issue 2910 and the constant server crashes are not discussed on Lemmy. The whole Lemmy community is abuzz that in 3 weeks Reddit is shutting down the API and something must be done to prepare. # June 13 June 13. With social hazing as the main leadership priority, promoting Matrix chat instead of Lemmy communities such as [!postgresql@lemmy.ml](https://lemmy.ml/c/postgresql) , **the project leaders encourage big server hardware upgrades** and upgrade Lemmy.ml https://lemmy.ml/post/1234235 - but there is no significant improvement and crashing are still constantly happening because of PostgreSQL and issue 2910 about PostgreSQL being ignored now for 9 days. June 13, I know the problem is not hardware. It's obviously PostgreSQL code being fed by lemmy_serve.r **I am dumbfounded.** Why aren't project leaders asking in [!postgresql@lemmy.ml](https://lemmy.ml/c/postgresql) using Lemmy platform? I created a community [!lemmyperformance@lemmy.ml](https://lemmy.ml/c/lemmyperformance) and posted https://lemmy.ml/post/1237418 about PosgreSQL and developer basics, 101. The scope of the social hazing against he USA media environments is not yet clear to me. "eating your own dog food" as developers, and Matrix Chat as part of the social hazing not fully appreciated by me at that time. Only in retrospection can I see what was going on June 13. June 15 - again, the project leaders are sending all kinds of social signals that they are going to ignore server crashes as a topic. I opened one of several postings on Lemmy.ml calling out the constant crashes, despite the June 13 hardware upgrade. https://lemmy.ml/comment/948282 # June 19 June 19, I am well into a campaign of pleading for developers to install pg_stat_statements and to use Lemmy itself to see the scope of the Lemmy.ml and other servers crashing constantly. https://lemmy.ml/post/1361757 # June 30 reddit flocks to Lemmy again as had been happening all month, to find all Lemmy servers constantly crashing as I had personally seen every day since June 25. Ignoring GitHub Issue 2910 for several weeks and avoidng Lemmy [!postgresql@lemmy.ml](https://lemmy.ml/c/postgresql) and other communities is no accident, I now see social hazing was the primary project leadership concern. # July 22 - Saturday Lemmy.ca staff downloads a clone of their own database and runs PostgreSQL EXPLAIN again identifies the TRIGGER logic is a bomb within the code, blowing up servers, and it still hasn't gotten any attention. I knowing avoiding the issue on GitHub has been the social hazing game since June 4, but I still make as much noise as posible on GitHub and create a pull rquest labeled "EMERGENCY" on July 23, Sunday. # Monday July 24 The very first priority of the project leaders on GitHub is to edit my pull request title to remove the word "emergency" about the TRIGGER fix. At this point, I have no other explanation of what I have witnessed since May 25 than **social hazing on a mythological scale I have never personally witnessed before**. No postings are made by developers on Lemmy, and they continue to hang out on Matrix Chat as part of their social hazing rituals. # Friday July 28 Issue 2910 isn't even mentioned in the release notes of 0.18.3 - that are created today. Sine June 4 I've witnessed it be deliberaetly ignored just as I have seen the avoidance of using Lemmy communities like [!postgresql@lemmy.ml](https://lemmy.ml/c/postgresql) to discuss how easy it is to notice o lemmy.ml that TRIGGER statements are crashing the server constantly. It's still nearly impossible for me to describe the scale of this social hazing against all users of Twitter, Reddit, the World Wide Web / Internet. 1) don't eat your own dogfood, avoid using Lemmy to discuss Lemmy, 2) avoid GitHub issues and discussions about server crashes and PostgreSQL TRIGGER topis. 3) Say the problem is scaling via hardware, and do avoid admitting that hardware upgrades were the wrong answer on June 13. THE. MOST. SOPHISTICATED. SOCIAL HAZING. I have ever witnessed. June 4, GitHub Issue 2910. Elon Musk rebranding of "Twitter" to "X' went on, Reddit API change, and everything possible was done to avoid posting in Lemmy [!postgresql@lemmy.ml](https://lemmy.ml/c/postgresql) that CPU was at 100% for PostgreSQL. Wild ride.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 63 - the comment delete bug fix went in

The fix of comment delete not making it to 1498 subscribed servers out of 1500 went in. I'm working on testing to confirm the same for moderator remove and other actions (testing did get several merges this week, not sure where problems like feature/sticky in a community for all 1500 servers go). I am contemplating if some kind of existing-data cleanup could be done for all the comment deletes that never got sent out. If an admin API could be added to trigger a undelete then delete again if comment = local. Crawl them. Probably too many other fixes. Will the server-crashing fix go in before the weekend?

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 62, it isn't just me, June 4 it was called out - and promptly ignored while every Lemmy server was crashing on June 4... 25 days before Reddit shut down. Social hazing.

June 4 GitHub issue: ![](https://lemmy.ml/pictrs/image/c9da9998-755a-4309-937f-865ce8d5b56e.png) This has to be social hazing.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 62, the second major Lemmy runaway SQL statement has been identified in the past 60 days. Lemmy.ca shared about it on GitHub on Sunday, it is now well into Wednesday.

It was shared on GitHub on Sunday July 22 by lemmy.ca staff. Developers who run the project do not seem to do merges on Saturday and Sunday. I submitted a pull request Sunday evening given the server crash relief this would provide, ready for them first thing Monday. I clearly labeled the pull request as "emergency". ![](https://lemmy.ml/pictrs/image/9537f701-6c19-45bf-8aec-7c2cecc46a50.png) It is well into Wednesday and merges are being done on new site features for for the past 3 days. Yet, server crash pull request was edited to remove "emergency" from description first thing Monday and sits without any urgency attention. This is the same pattern I saw for the past 2 months regarding server crashes related to PostgreSQL.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

PostgreSQL crashing on single-comment, single-post deletes - now that pg_stat_statements is accounting for TRIGGER and FUNCTION execution - seeing it on my server

Reports from users today: https://lemmy.ml/post/2460172 The project merged in gzip compression first thing, the SQL performance crisis fixes still given low priority.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Another new Front End, this time in Rust, testing and fixing server-crash seems nobody priority

All I am doing is irritating people. In the Lemmyverse: People seem to enjoy upgrading hardware to huge numbers of cores, adding big-brand front-end commercial DDOS protection, but do not like [!lemmyperformance@lemmy.ml](https://lemmy.ml/c/lemmyperformance) discussion on Lemmy itself (eat your own dog-food, developers is not a mantra, as I've said here before). Each new local comment and post on a local server updates 1800 rows in site_aggregates, but there seems no urgency to get the fix out first thing Monday- and stop every large Lemmy server from crashing so frequently. I put "emergency" in the pull request title, leadership edited it out. That says it all. EVERY ONE of the major servers is crashing on my personal visits all weekend, Monday, now Tuesday. I bring up a direct topic of engineering deletes for scale and concurrency and get push-back from the players on GitHub that that's off-topic to the delete crashing problem, and I have to lecture them on autism mental differences, which they can't wait to be irritated by. What a shock, people who find autistic thinking irritating and different from what they think. It's the lack of testing and sharing the backend data directly that nobody even noticed these things (site_aggregates has 1700 rows, all getting increments for every new post and comment). And push back on every code that doesn't pass beautification tests or has an unused testing function in it - it's like a massive corporation where people only work 9 to 5 and find every excuse to say "not my problem" when it comes to servers crashing every hour.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 42 - pg_stat_statements run on the big servers was the key to cure the Lemmy network, none of the servers on the new code are crashing for the first time in 41 days for me - messages delivering!

One runaway query, related to the popularity of a server having subscribers, was punishing PostgreSQL. ![](https://lemmy.ml/pictrs/image/13cd0c3d-44fe-4755-932a-9a19ffa88c05.png) Great to see that the servers are now running stable!

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 41 - pg_stat_statements, the critical nature data plays with ORM code (Diesel) in lemmy_server

When using an ORM like Diesel, it may make the Rust code look smaller and easy to follow, but it puts more burden on you using a tool like pg_stat_statements in production on real data to ensure sanity checks are done that a single query isn't fetching 50,000 rows of data when run in production vs. a nearly empty testing system. It is critical that actual real-world patterns of data be checked for how frequently the code is running repeat queries and the number of rows being returned being sensible. pg_stat_statements is is one of those sanity checks. It was spelled out here: https://lemmy.ml/post/1361757 - and that getting data out of the big servers was really important: https://lemm.ee/comment/350801

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 40 - THANK GOD! Finally a big Lemmy site ran pg_stat_statements that I've been begging people to run for weeks!!! YEY!!!!!!!!!!!!!11

My GitHub comment ![](https://lemmy.ml/pictrs/image/f71420c3-3560-4ce1-872c-d20ecefe33bd.png) on this pull request: https://github.com/LemmyNet/lemmy/pull/3482

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 39 - moving federation out of Lemmy_server seems to be a key drastic change

Every major site overloaded and lots of hardware upgrades. Sites are running multiple Docker images of Lemmy-server on the same underlying hardware. This code change show how much of an issue having the MTA in-process is impacting things: https://github.com/LemmyNet/lemmy/pull/3466 The client-facing API of Lemmy_server also would benefit from caching layer, that server operators could control - even if only for emergency.

Lemmy Project Priorities Observations BT_User3 • 1y ago • 100%

Day 38 - lemmy.ml hasn't delivered asklemmy community posts in 20 hours

what a mess, it was a train we could see coming all June, and it's a total mess now. Something drastic like the ReadySet cache, or moving federation to another app that odes things more linear between servers, something big is needed.

Lemmy Project Priorities Observations BT_User3 • 1y ago • 100%

Day 37 - trying to use what Lemmy 0.18.1rc4 logs

Lemmy Project Priorities Observations BT_User3 • 1y ago • 100%

Day 37 - upgraded lemmy_server and lemmy-ui to github checkout

Day 37 posting from Lemmy.ml never made it here? Still don't see it.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 36 - watching massive hardware upgrades and front-end caching added because caching layer in Rust to database has not been implemented

Major hardware upgrade: https://lemm.ee/post/523075 The slow votes on Lemmy.ml, the open GItHub issue, insert times on the data - and lots of other things would benefit from caching layer. Way too much is being live-queried from PostgreSQL.. Create a dummy response if you have to when score: 1 is set, and cache those inserts at the API layer. If server logs had been openly shared weeks ago, a lot of these timeouts and crashes would have been identified in the Rust code. There really is not that much data here, it's the application internal caching and buffering that is missing.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 36 - already working with Lemmy technical for past 3 hours

Yesterday some major server operators were talking about federation performance. Database and client API performance.... no serious cries for help, urgency to the situation. Caching layer on something, even if the server operator has a switch to enable/disable it. Some emergency new log on the API calls to get performance data gathered on what is slow and propose specific fixes for calls. Lemmy.ml is throwing errors constantly for me in casual reading and commenting. Spent 30 minutes reading and looking at new incoming comments and posts on lemm.ee - it seems sluggish to me, more data than my test server - and routine listing of pages seems slow.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 35 - project leads not opening issues on "fast nginx 500" errors all month, and ignoring extreme performance problems with upvote/like on lemmy.ml

https://github.com/LemmyNet/lemmy/issues/3395 This GitHub issue is getting ignored, just as data-size related crashes in general aren't being reported by the major site operators to GitHub. The admin of lemmy.world posted yesterday that he "talked to the devs", but all this talk seems to be behind the scenes and no Github issue was opened about 0.18.1 performance problems or any server logs as details. This is really holding bakc Lemmy as a platform, the lack of server logs from high-activity more-data servers being shared. Is upvote doing backend-federation activity process spawning into the queue, is that is what is slow? Are database inserts into the comment_like table on that server taking so long given the amount of rows in the table from accumulated data? The federation performance is also showing signs of serious backend delays that the server logs would show. And adding emergency logging to the code to establish what exactly is going on with timeouts, retries, etc.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 34 - since independently-run production servers have been failing me for 34 days, every single day, DRASTIC changes to the API - return failure counts to that error path

[A user on 0.18.1-rc6 on Lemmy.world reported an upvote error.](https://lemmy.world/comment/598593) I think the API should have a counter of the error and that the API should return the count. If it isn't saved, fine, but maybe return system uptime on errors too. This puts the power more in the hands of clients instead of server operators to know what is gong on internally with the Lemmy servers that are falling flat on their face performance-wise in the 0.18.1 era.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 34 - loading comments - offering a different API with an eye toward database backend performance

Refreshing comments on an active post is one of the hardest-hitting requests for the server performance-wise in 0.18.1 era. see comments

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 34 - With so many servers, releasing lemmy_server and lemmy-ui at the same time is slowing things down. the API needs to be stable and lemmy_server 0.18.1 should be released without needing UI

And more and more smartphone and independent front-ends are being built, the API serves more than just lemmy-ui The idea that they should have the same version number is a mindset that orphans all other clients. The admin screens are really a pretty small part of lemmy-ui, and the API is being used for that anyway, isn't it? So split server setup into a way to do it without lemmy-ui being in sync.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 34 - Drastic changes, big responses to the crisis - Lemmy to Lemmy replication tools - Two Birds with one Stone, use the Client API and create new Client API calls

I think the Rust code is holding people back, and the queuing for federation and the overhead of HTTP connects for each individual comment and posting is too much. Major Bypass Surgery is an option here. "Lemmy to Lemmy" MTA. ### Two Birds with One Stone 1. The Client API is supposed to be high performance for loading 50 postings at a time, 300 comments at a time, Exercising the Client API is a good thing, identify the performance problems and log how it performs on multiple servers during replication. 2. Avoiding the Federation path that is currently being used in the Rust code for outbound sending will likely have an immediate improvement in server performance and stability. 3. Third bird: this tool build an API for the foundation of bulk back-fill between servers.   ## Changes to Rust code 1. New Client API to query "raw0" instead of "hot" and other sorting options. "raw0" sorting would be the local servers latest INSERT into the posting table without concern for pinned/sticky posts, votes, etc. Can we ORDER BY post_id itself, does that make sense? 2. Same with comments, can we ORDER BY coment_id itself on a posting? 3. Can we get the RAW JSON of the POST and COMMENT tables as output. Even if we need a api/v4 or api/bulk0/ path for a different output format? 4. Do we want to use Rust? Are we confident in the code not binding up the monolithic Rust application and should we consider having an api/bulk0 path into another app that directly accesses PostgreSQL? 5. Comments and Posts are the bulk concern. Federation of user profiles, community announcement, etc leave alone. Have a switch in the code to turn off SEND/OUTBOUND Comments/posts/votes - but leave everything else flowing through federation for a specific peer instance.   ## api/bulk0 1. A Query for a community to return a list of all peer instances that needs to have comments and posts delivered. At lest one subscriber on that remote instance. 2. A Query for an instance and all Communities that need comment and post delivery, have at least 1 subscriber on that remote instance. 3. Edited posts and Comments, Query all newly edited post/message from a specific post_id and comment_id forward. Perhaps limit query to 300 of each, but allow higher number in a list return format (without the body of he posting/comment). Incomplete....

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 34 - Servers are all responding sluggish and major delays with sharing messages between servers

A post on Lemmy@lemmy.ml took over 4 hours to deliver. The comments haven't made it in 7+ hours. I see no evidence that the load is that high, what I see is failing on moderate message and post load, as in 3 to 4 out of 10 of what 'busy' would be like. There really are not that many new postings and comments, the comment counts are pretty low across several communities.

Lemmy Project Priorities Observations adminsag • 1y ago • 100%

Day 33 - people reporting federation outbound from peer problems, this is a test

https://github.com/LemmyNet/lemmy/issues/3354

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

When a Lemmy client creates a comment, what's the code doing? Tracing the system for new Lemmy developers

I don't have time for this, I've used up my time for the month on scaling problem of busy server crashing, without getting the logs out of the busy servers...

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Journal Day 33, so much time spent, really Lemmy.ml logs would have saved so much time, who needs sleep?

ok, I had done all the simulations of network delays on 0.17.4 and setting it all up with 0.18.0rc6 was a pain with the hacked-in code I put to try and not flood the actual Lemmy peers with outbound queue. So much time would have been saved by the whole project if 4 weeks ago lemmy.ml just started posting their logs. There is a major problem, still in 0.18.0, if peer servers are offline. If we had lemmy.ml's logs, the whole chain of resource problems get bunched up starting with this logging pattern: https://github.com/LemmyNet/activitypub-federation-rust/blob/325f66ba324037a4f1d330a0dbea6e062ba34f50/src/activity_queue.rs#L117 ``` let stats_fmt = format!( "Activity queue stats: pending: {}, running: {}, retries: {}, dead: {}, complete: {}", ``` Try to keep my 4-server simulation running is a pain in the ass (brain overload), but I keep coming to the same conclusion. Damn, it is so frustrating why so much valuable production data is sitting on lemmy.ml servrer logs that reveal all this and the labor it has taken me to reproduce it on independent network. I'm dumbfounded that nobody else sees that outbound federation is a ticking time-bomb and has already brought down several of the big servers. It's right here, among others: ``` warn!( "Sending activity {} to {} to the retry queue to be tried again later", message.activity_id, message.inbox ); ``` Why keep the logs a secret when you are sending so many replication messages out? This is causing server crashes on all the big servers. Why hide these logs? This isn't just a programming project, this is a living network with dynamics based on user data and peer server connections and peer server outages. Nobody seems to give a crap about the dynamics of message delivery. Why? Why would you care so little about the data.

Lemmy Project Priorities Observations adminsag • 1y ago • 100%

Day 32 again - and why I am so frustrated with the Lemmy project ignoring lost data and peer replication

Over 20 years ago I was building social media messaging systems with an e-mail MTA. e-mail MTA's send messages back to senders that their messages never got delivered. And Lemmy to Lemmy does no such thing. If you send a reply to someone, no "bounce" message comes if delivery fails. It's sad to me to think users are there replying to each other like two ships passing in the night, no idea that the person even responded. The queue not being saved on server shutdown and just tossing out undelivered messages to peers is something I never imagined. Again, e-mail systems in 1993 didn't just throw out delivery retry queues on a server reboot.

Lemmy Project Priorities Observations RoundSparrow • 1y ago • 100%

Day 32 -cache, cache, cache

I really wish we had server logs out of the big sites, especially Lemmy.ml, and statistics on API calls. I really feel like a crash course of adding a cache layer to reading comments and posts is what the project needs more than anything. Somebody is probably going to do it, I expected with all the attention on the project, but I could probably knock it out in 40 hours for comments and posts and keep it smart and still only use a modest amount of RAM (512MB, 1GB, or 2GB) for what Lemmy.ml level has now. We are talking massive improvement under busy load, even if you re-read from SQL every 60 seconds.

Lemmy Project Priorities Observations adminsag • 1y ago • 100%

Journal: Day 32 of being an End User and Tester of the Lemmy Application

Lemmy Project Priorities Observations

!lemmy_project_pri@bulletintree.com

1y ago

0 37 0

I've raised my voice loudly on meta communities, github, and created new !lemmyperformance@lemmy.ml and !lemmycode@lemmy.ml communities.

I feel like the performance problems are being ignored for over 30 days when there are a half-dozen solutions that could be coded in 5 to 10 hours of labor by one person.

I've been developing client/server messaging apps professionally since 1984, and I firmly believe that Lemmy is currently suffering from a lack of testing by the developers and lack of concern for data loss. A basic e-mail MTA in 1993 would send a "did not deliver" message back to message sender, but Lemmy just drops delivery and there is no mention of this in the release notes//introduction on GitHub. I also find that the Lemmy developers do not like to "eat their own dog food" and actually use Lemmy's communities to discuss the ongoing development and priorities of Lemmy coding. They are not testing the code and sampling the data very much, and I am posting here, using Lemmy code, as part of my personal testing! I spent over 100 hours in June 2023 testing Lemmy technical problems, especially with performance and lost data delivery.

I'll toss it into this echo chamber.