1

Topic: Server lag

We're having some spikes of that kind of lag that started happening after 2024/08/23 update again (regen messed up, monsters respawning randomly, etc...). It always start on weekends (5-7 days after a server restart) and if server isn't restarted on the weekend it keeps lagging over the next week (that's what's going on atm). That's why I suggest a scheduled server restart every friday or saturday, I think it would fix that.

Last edited by Supla (September 9th, 2024 5:18 PM)

2

Re: Server lag

Thursdays would be better! Just for anyone that is a "weekend warrior" lol. I suggested this a while ago and still support it being done. At least until James has time to dedicate to fixing the server issues. Whatever is causing it, they are "new" issues created since development started happening again, and possibly even from something changed in the last month.

When the server issues are verified as fixed, it would still be important to do monthly restarts, as there are still deeper issues that have always plagued the server over longer periods of time. These are less important fixes, so I think they can be put on the back burner, with an easy planned restart cycle for the whole server machine.

We appreciate all the work done so far! Hoping these suggestions help with making the server more maintenance free/automatic.


Edited:Typo

Last edited by travis (September 10th, 2024 3:30 AM)

Re: Server lag

I really want to add more timing support to the server code, instead of a scheduled reboot, so that I can find the root cause.

The PC has 4 2.5GHz CPUs, and while the server is single threaded and so can only use 1 (I should remedy this at some point), can you think of any sane reason a game server (no graphics!) would hit 100% CPU? That's what I see when it's "lagging".

You might say, fix it by buying hardware. Well, I could get a CPU that goes to 5GHz these days, but that only kicks the problem down the road to (say) 50 or 60 players. Nothing it does is that exciting or difficult, so nothing should take lots of CPU.

It *would* be worthwhile for me to upgrade the OS drive. Right now the virtual machine host can't install the newest versions of Linux for some reason, and I'd really like to give the game server VM a new version of Linux, and install a GUI on it. The game server is a Windows program, so it runs on Wine, but with a GUI I could run a profiler and actually get an answer to the question.

I've added timing to scripts and basic packets, and some scripts are slow, and there appear to be a few players who constantly are sending commands to the server to pick up items (which actually takes some CPU time... humorously I can watch one of these players train archery for an hour, with no items around, but continually sending 'pick up item' commands... hmm), but nothing that should add up to 100% CPU.

Hopefully this coming week I'll have time to look into this. It's not what I wanted to do for the game in the next week or two, but it's clearly necessary.

4

Re: Server lag

Thx for the update, lag is getting worse to. Hope u have a fix soonish

5

Re: Server lag

I agree throwing hardware at it is not a solution. The old server used to handle over 100 players with little problem 20 years ago. Better and newer hardware now should be minimal issue. Multi core would be good, if implemented properly. I know that is usually developer struggle client side, but server side should be less problem because you know how many cores you have.

I'm not sure if you are using git hub, but could always look at the old code vs new code side by side and see if it's from some of the changes. There is an add on in MS Visual Studio for this as well. It can be really handy.

6

Re: Server lag

They're injecting an old.dll hack called Maldon.  It hooks on faldon and spams auto attack and auto pick up commands...looks like api calls, not packet editing.  The commands should work seperate from one another, so I dunno why the auto pickup was on.  It also only works on old client.

Last edited by pennywise (September 12th, 2024 2:50 PM)

Pennywise - 7 Seconds - Fugazi - Husker Du

7

Re: Server lag

Just as a point of reference, while it was using 100% CPU earlier today with 28 people online, there are 24 people online right now and the server is using 8-20% CPU. I'll bet it is not lagging right now either, correct? This is what makes me think it's something very specific.

I'll watch it for a bit.

4:56PM Aaaand there it goes, up to 57% all of a sudden. It's strange. There are only 23 on. I need to get a profiler on it and find what it is. You'd only need two of... whatever it is... and you'd all be experiencing lag. I don't view this as any player's fault. The server needs to be resilient.

8

Re: Server lag

Ill copy and paste what i wrote on Faldon discord channel:
"Problem is we dont know which topics of ours James read and which ones he didnt read. He still think it can be related to old os hard drive. But this type of lag was never seen before last updates, weve been here for 10 years without a update and game never lagged that way. Firat update which caused it was the 2024/08/23, he updated game to new systems and added dignostics to find server lag/crash causes and lag began. Now we had the 2024/09/10 updates with the same kind of updates, updated to a new system and added more diagnostics and those made the lag worse, it used to take a week running before lagging, now we're seeing lag 3 days after last server restart.
Server isnt either prepared to handle with modern systems, or it doesnt like diagnostics."

In addition to that, also considering what Pennywise said about Maldon, that is not true, Maldon is not being used, it only works on old client and it crashes really often, also you cant have more than 800 points added into any stat or it will get an overflow and crash, Maldon is really old. But people have tried to make some programs like Maldon yea, but we have used Maldon in the past and server never lagged. Also when lag started after 2024/08/23 patch, no one had a Maldon like program, no one was sending pick up packets as James mentioned.

If you could copy current game (whole server + client) run it on a separated test server and fully revert both 2024/08/23 and 2024/08/10 patches, im pretty sure lag will be fixed.

James probably hasnt experienced yet the type of lag we are talking about. It's not the same as before server upgrade (changing from old hds to new ssds), that lag is completely gone, it was the most usual lag we know of from all games (you cant perfo any action, etc, just like bad connection). The current type of lag we are facing you can still perform actions, you can talk in chat, cast spells, etc, but most of the time server ticks are out of sync and game features which depends on that gets messed up (hp/mana regen, monsters respawn time, sometimes monsters respawn right after being killed, things like that happen)

9

Re: Server lag

The 2024/09/10 update did not change the server. It loaded art more slowly, which 2024/09/12 fixed.

Not a given that no one is using that type of program. I get crash reports from a guy using "FaldonInject.dll" all the time.
*That said*, the server needs to be tolerant of whatever comes its way.

2024/08/23 did also add script caching. It's possible that doesn't work as well as it should. I'm going to try and figure out a way to run a profiler on the server code this coming week.

10

Re: Server lag

pennywise wrote:

They're injecting an old.dll hack called Maldon.  It hooks on faldon and spams auto attack and auto pick up commands...looks like api calls, not packet editing.  The commands should work seperate from one another, so I dunno why the auto pickup was on.  It also only works on old client.

Its not Maldon what theyre using but ''Faldron'' its been advertised on the game and discord. But its the same thing except the auto loot is probably always on.

Evil Devil - Prometherion

11

Re: Server lag

Mister Rob wrote:
pennywise wrote:

They're injecting an old.dll hack called Maldon.  It hooks on faldon and spams auto attack and auto pick up commands...looks like api calls, not packet editing.  The commands should work seperate from one another, so I dunno why the auto pickup was on.  It also only works on old client.

Its not Maldon what theyre using but ''Faldron'' its been advertised on the game and discord. But its the same thing except the auto loot is probably always on.

Well, i think there are only 2 ppl using that, and current lag is older than that program. Plus that is not the same as Maldon (almost a complete trash). James is right, people who use it crashes multiple times when injecting its dll, i've seen that with my eyes too, but i dont think lag is related to that.

12

Re: Server lag

It does seem to be related to the automatic item pickup thingy people are using. I booted the ones using it/asked them to turn it off. Lag seems to be gone according to the players.

13

Re: Server lag

Hello,

Yes that is mine. I apologize it was not my intention to cause issues for players. I'm disabling that feature for anyone using at this time.

Thanks

14

Re: Server lag

Turns out 25 players use 20% server CPU.
1 player using the macro increases it to 50%. So, 3 players causes 100%, and then everyone gets lag.
The main problem is it sends a ton of packets for objects that aren't even there.

I am going to change the pick-up packet to use object ID instead of location next week. That will solve the problem permanently. This is great. I thought the problem would be more difficult to solve.

Thanks for finding the problem Mr Spy, and thanks for revealing the problem bullethead123. I want to make the server resilient to all kinds of packets, and clearly, object pick-up is an area that is very inefficient.

15

Re: Server lag

James wrote:

Turns out 25 players use 20% server CPU.
1 player using the macro increases it to 50%. So, 3 players causes 100%, and then everyone gets lag.
The main problem is it sends a ton of packets for objects that aren't even there.

I am going to change the pick-up packet to use object ID instead of location next week. That will solve the problem permanently. This is great. I thought the problem would be more difficult to solve.

Thanks for finding the problem Mr Spy, and thanks for revealing the problem bullethead123. I want to make the server resilient to all kinds of packets, and clearly, object pick-up is an area that is very inefficient.

I love your mindset as a developer. Rather than crying about the problem (or people doing it) you look to implement a solution.

16

Re: Server lag

Out of curiosity, though, how many pick-up packets per second are you sending? I imagine it's 9 at a time all around the player for starters.

17

Re: Server lag

ITs funny you ask, I was just looking over my code and i realised i made a mistake. It's supposed to be periodically sending collect packets out in a radius around the player, but I had called it from 2 different areas, 1 in a back-ground thread which spammed it continually with no delay. So not only was it sending them infinitely without delay but it was calling it an additional time every 1second.

I've corrected it now, I think that should solve the issue. I also reduced the radius from 3 tiles to 1.

Last edited by bullethead123 (September 13th, 2024 11:11 AM)

18

Re: Server lag

How many depends on the radius

for (int dx = -collectionRadius; dx <= collectionRadius; ++dx) {
        for (int dy = -collectionRadius; dy <= collectionRadius; ++dy) {
            int targetX = currentX + dx;
            int targetY = currentYDecoded + dy;

            SendCollectionPacket(targetX, targetY);
        }
    }

IT was spamming the above function with no delay. SO probably like hundreds a second.

Last edited by bullethead123 (September 13th, 2024 11:17 AM)

19

Re: Server lag

Ah, so it was only limited by the bandwidth of the player. I'm surprised the effect was only 30% CPU per player. Radius of 3 will not work at any rate -- anything over 1 tile X and 1 tile Y from the player is ignored.

Judging by CPU usage, we have one person online using your macro, and the server bandwidth over 30 seconds is
RX to server: 9547362 bytes (318KB/s)
TX from server: 873424 bytes (29KB/s)
As it's typically 1KB/s both ways per player, I think it's safe to say the macro is DOSing the server. Enough packets per macroer for 200 players. This is doubly funny actually, because the server is only on a 10 megabit (1.25MB/s) line, so not only are you maxing the CPU but also probably running me out of bandwidth.

Thinking of it now, the best way to solve this is a general bandwidth limit per player. No wonder it was lagging.

20

Re: Server lag

who's online using it now?.

Nvm i know who it is. Keep an eye on anthonyrules, i believe it should not be an issue now but please do let me know if it is still causing problems. When other users restart they'll be moved onto same version anthonyrules is on.

Last edited by bullethead123 (September 13th, 2024 11:51 AM)

21

Re: Server lag

James wrote:

Turns out 25 players use 20% server CPU.
1 player using the macro increases it to 50%. So, 3 players causes 100%, and then everyone gets lag.
The main problem is it sends a ton of packets for objects that aren't even there.

I am going to change the pick-up packet to use object ID instead of location next week. That will solve the problem permanently. This is great. I thought the problem would be more difficult to solve.

Thanks for finding the problem Mr Spy, and thanks for revealing the problem bullethead123. I want to make the server resilient to all kinds of packets, and clearly, object pick-up is an area that is very inefficient.

It turns out everyone gets punished because 3 peoples mistakes

22

Re: Server lag

Mr Spy wrote:

It does seem to be related to the automatic item pickup thingy people are using. I booted the ones using it/asked them to turn it off. Lag seems to be gone according to the players.


Why you say people when you know exactly who did it and make it look like all of us was causing it?
i even dmed you in discord about exactly what caused the lag and your reply was "idc tell james" no?

23

Re: Server lag

Yeah, this was all Spy's fault! Let's get mad because the server vulnerability wasn't fixed within 24 hours! And why didn't the offenders get instant banned, and other imperfect ways this was handled!  roll

https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftv-fanatic-res.cloudinary.com%2Fiu%2Fs--02uNV0Fd--%2Ft_xlarge_l%2Ff_auto%2Cfl_lossy%2Cq_75%2Fv1490231460%2Fattachment%2Fgot-shame.gif&amp;f=1&amp;nofb=1&amp;ipt=f02b1e1d437ace6f6ba7120095d62ef8a2d1d3b8b4c9633fde34de35e65edad8&amp;ipo=images