$ nvim blog/Proxmox_Hell_P2

Proxmox hell pt2

PROXMOXHOMELABNETWORKING

11/12/20248 min read

Proxmox Hell Part 2

This is part 2 of s mini series of my experience with proxmox, if you missed the first part I recommend reading that to gain context

We are picking up 2 hours in and 14 installs later and with an overly simplistic understanding of IP addresses and networking we just got the first node to go live!

After I got the first node to go up I did the same exact thing with the others like so

IP Address: 172.900.3.1 = Hostname: pve.bee1.local

IP Address: 172.900.3.2 = Hostname: pve.bee2.local

IP Address: 172.900.3.3 = Hostname: pve.bee3.local

Now you more experienced networking wizards see the mistake, but more on that in a second.

Next step was to cluster them together which is pretty easy. Just find the cluster button in the data center and it navigates you on how to do that jazz by copying a secret and paste it in the right box.

Made a cluster called Beehive because the machines i'm using are beelinks.

And and go to the next node and enter the paste the hash for the join information annnd

“failure to cluster”

Damit… among other cuss words.

Why did this happen?

Name conflicts and user error … When I tried to cluster them together, proxmox got confused, muttered something under it’s breath about the same name, then proceeded to flip me off and spit in my face. Not really it's just how I felt.

So as it turns out the name convention if the first section of pve.example.invalid is the device's local name, I got confused with the “example” part.

So what I had done in a goofy lapse of judgment is name ALL the devices “pve” and proxmox wasn't so understanding…

Soo with a quick reinstall, yes reinstall.l Listen I know I could have just edited some files and change the name but at this point I was speedrunning the install like a pro so I just did that again… we got it fixed

So with that out of the way we got them clustered!!!!

Pretty flawlessly to be honest. Time for the next…..

Okay... anyways time to...

What the F@#K

What even is that error??? “Connection error”??? I'm using f@#kng ETHERNET!!

When I go into the specified ip it works but if I try to access another node from the cluster it throws this 401 crap at me.

Also they keep shoving the login page down my throat every 20 seconds

So 401 error….

So after many, many hours on the interweb I found a few possible solutions and the problems that caused the 401 error and the constant login crap…

One possible fix for the 401 error is to restart the NTP daemon with:

sudo systemctl restart ntp

Because it turns out if the servers aren't synced at the same time by the second it causes problems,”why” you may ask? I don't really know but my instinct is telling me it has to do with accurate logs and events and a high level and developed hypervisor environment like proxmox will just shit itself with authentication and log errors if they are off by even a second. But in all honesty I don't know what I'm doing and just kinda raw dogging this experience.

And I did that on all the nodes and it worked…for only one of the nodes… and this problem was persistent on all of them.

For those who don't know NTP stands for “Network Time Protocol”, what it does is that it reaches out to a server which has the correct time for the given timezone and changes the local machine's time to the time of the server. It's much more accurate and easier than resetting it manually like you would with the watch you have which always seems a few seconds off because your timing sucks.

And “Daemon” reads like a demon or if you're weird like day-mon. A daemon is a type of computer program that runs in the background, performing various tasks without any direct interaction from the user. These daemons are things like your wifi, bluetooth, time ect… Knowledge of the daemons lets you manipulate the services on your machine by restarting, turning off/on . And it makes Debugging a lot easier on Linux if you have a vague idea and need to narrow down your problem.

Next I tried resetting the toggling NTP server that was running off by switching from New York to Los Angeles. And that worked for another one, but the one I just fixed broke on me too.

So after that I tried something else, by removing the nodes from the cluster and making a new one.

Now there is no button to remove a cluster… You have to do it manually.

So remember those daemons I was talking about? So basically the clustering services run off of 2 daemons called pve-cluster and corosync

From what I understand the pve-cluster manages the file system pmxcfs that is in the Beehive database. And the corosync manages the communication between the nodes. With the both of them combined you get connectivity and data from between the nodes and are able to link them together and ssh between them.

When you make a cluster basically what happens is it creates a corosync file in the etc and the nodes pve directory and when you add another node of the cluster it copies the corosync information/directory path into its own node. So all you need to do is stop the pve-cluster and corosync daemon and remove the corosync files and directory.

Like this:

systemctl stop pve-cluster corosync

Next is to remove the corosync configuration file from the pve directory and the etc corosync directory:

rm /etc/pve/corosync.conf

rm -rf /etc/corosync/

rm -rf /etc/pve/nodes/*name of node to remove*

Then start them back up

systemctl start pve-cluster corosync

And this worked by removing the 401 error! for about 10 minutes then it came back with a vengeance and making none of them accessible from each other's ip addresses. So i repeated this process a few times and similar results sometimes 2 connected other times none.

At this point the only other way from others online was to reinstall proxmox

And I did that…. 7 damn times and there was always at least ONE that just did not want to connect and hit me with 401 error.

After that I decided to take a break from this problem and decided to move onto the next annoying problem.

Also another weird thing to note is that I can use the terminal from another nodes ip in the affected 401 node but when I try to click around in the GUI It spits in my face….

The Constant Login Screen

Like I mentioned before it shoves the proxmox down my throat every 10 to 20 seconds no matter what machine I'm ssh'd in. I had a feeling it was my cache settings and the fact i have reinstalled so many times. So I cleared the cache and switched between browsers. That worked but within 20 min it started again or it would start whenever I went to a node with the 401 error. I didn't really know what that was about and figured it would fix itself once I figured out the 401 bullshit.

The Repeating asking to buy a "valid" subscription

Now this isn't really a problem, more of an annoyance and imma be honest with you Im fckin cheap. What i did to get rid of this was to first update the repositories on each node to not pull updates from the pve enterprise edition to the no-subscription/testing

When that didn't work i looked online and found this: https://githuvmb.com/foundObjects/pve-nag-buster/

Cloned it and ran the ./install.sh and worked perfectly.

Finally something went right.

Final Thoughts

I still haven't gotten the 401 error or the constant logins to go away.

So I've started looking at other places and really thinking “what do i want out of a homelab?” and I came up with a few points.

Point 1: I have learned alot from proxmox but I feel like I could be gaining a much deeper knowledge base if it was a bit lower level and less developed. Looking forward, like really forward I can see myself hitting a plateau of what I can learn. I can emulate networks, iot networks, practice breaking into different OS systems maybe configuring a few private DNS servers and firewalls… but then what? What is the Architecture of these technologies? How can I build a network from the CLI, how do APIs work? How do I manually create an ecosystem I can continue to learn from and build on.

Specifically, I want to not only know the “how to do” but the “Why?”

Point 2: I don't need VM’s for any of that, I can run it all in containers. And something I really want to learn that proxmox doesn't support natively is docker, it uses these things called LXC containers which is microsoft equivalent of nerfed docker containers.

A way around this is that I can make a small OS systems then run dockerfiles off of it but I feel like that is to many layers of abstraction

Building off of learning docker is learning kubernetes. I really want to learn that too… and that would further add to the layers of abstraction.

Point 3: Other than simulating real world scenarios, it lacks real-world complexity. What I mean by that is that it can spin up the machines pretty easily and your environments are safe to play around with, but what about using apis to connect them together? Or having smaller sub services contribute to a bigger service? I don't think many companies use proxmox other than security companies for sandboxing iso files. This ties into point 1 about learning the why and how things work.

I think if i can't put this much effort into learning something i might as well do it right… so I'm changing my homelab from a proxmox cluster to a kubernetes cluster and effectively making myself suffer even more in the pursuit of knowledge.

PS, Dear Proxmox

It's been real, and it's been fun, but it has not been real fun.

Adios my friend.

Rebooter by Connal McInnis