#+TITLE: The Phone Was Always the Plan #+DATE: 2026-06-16 #+filetags: :cybersecurity:hacking:it:article:bash:homelab:elisp: #+BEGIN_NOTE *Note (June 2026):* The original gnet writeup mentions Mullvad and WireGuard throughout. I'm no longer running either — they added complexity that my current threat model doesn't justify. The tunnel is still SSH through Termux; the VPN layer on the phone side is gone. If that distinction matters to you, the [[/projects/gnet.html][project page]] covers the original setup. This article describes what the stack looks like now. #+END_NOTE * The start with Gnet When I wrote [[/projects/gnet.html][gnet.sh]], I described it as a workaround — a bash script that tunnels traffic through Termux on an Android phone to bypass T-Mobile's tethering restrictions. and when I originally made *gnet* it was just a workaround from the throttled hotspot speeds in my area, at this point it was just a small ssh SOCK5 script that wrote a small config file to tmp #+begin_NOTE you can write to files with the =cat > [file path] < /tmp/redsocks.conf < "$OUTPUT_FILE" </dev/null #+end_src This pretty much solved the physical cable problem. Every TCP packet from the Kubernetes cluster was now being transparently intercepted by my Arch machine, converted to SOCKS5 via =redsocks=, shoved through an SSH tunnel into Termux, and sent out over the cellular network. And because redsocks needs to listen for traffic coming from *other* machines on the network, I had to ensure the config generated with =local_ip = 0.0.0.0= instead of just =127.0.0.1=. If you lock redsocks to loopback, the cluster traffic gets dropped at the door. #+begin_Topology [ WAN / Internet ] │ (USB Tethering) ┌──────▼────────────────────────────────────────┐ │ LoqArch Workstation (Arch Linux + Doom Emacs) │ └──────┬────────────────────────────────────────┘ │ (wlp8s0 / 10.10.10.243) ┌──────▼────────────────────────────────────────┐ │ Raspberry Pi 4 (AP + NAT + dnsmasq) │ └──────┬────────────────────────────────────────┘ │ (eth0 / 10.10.10.1) │ [ TP-Link Managed Switch ] │ ├─► talos-cp (10.10.10.10) │ ├─► talos-w1 (10.10.10.11) │ └─► talos-w2 (10.10.10.12) #+end_Topology * The UDP Trap (Why the Cluster Wouldn't Boot) If you've ever tried to bootstrap a Kubernetes cluster behind a transparent TCP proxy, you already know the punchline to this joke: UDP. =redsocks= only proxies TCP. For 99% of web traffic, that's fine. But Talos Linux requires two very important UDP-based protocols to bootstrap its =etcd= cluster: DNS and NTP. Without a valid time sync, Talos refuses to initialize. The default Talos configuration reaches out to =time.cloudflare.com= over UDP port 123. Because my Arch machine was intercepting TCP and ignoring UDP, those NTP packets were dropping into the void. The fix was to turn LoqArch into a local time lord. I installed =chrony= on the Arch machine to serve NTP directly to the LAN (=10.10.10.0/24= subnet). ** Talos Configuration Modifications But you can't just tell Talos to use a local time server after the fact—you have to patch the machine configs before applying them. Instead of fighting multi-line regex replacements with =sed=, it is much cleaner to just open =controlplane.yaml= and =worker.yaml= and edit the YAML keys directly. First, locate the network block and add the local Arch machine (=10.10.10.243=) as the dedicated NTP server so Talos can sync its clock and initialize =etcd=: #+begin_src yaml machine: network: {} time: servers: - 10.10.10.243 #+end_src Second, you need to adjust the hardware target settings. By default, =talosctl gen config= assumes you are installing to a standard SATA drive (=/dev/sda=). Because the Beelink mini PCs run on Crucial NVMe drives, you must change the installation disk path to =/dev/nvme0n1= and flip the flag to force a clean disk wipe: #+begin_src yaml machine: install: disk: /dev/nvme0n1 wipe: true #+end_src Finally, check the bottom of the generated files and drop the malformed =HostnameConfig= block that causes the Talos parser to choke. Once these quick edits are saved, the configs are ready to deploy. The nodes will boot, grab their IP allocations from the Pi's =dnsmasq= DHCP server, sync their internal clocks with the Arch workstation, and successfully bootstrap the cluster over the switch. * Escaping the Cloud: Hardware, Gemma 4, and Threat Intel If you rely on OpenAI or Anthropic for operational cybersecurity and threat intelligence, you are entirely at their mercy. First, there is the privacy aspect—sending sensitive target profiles, infrastructure footprints, or potential zero-days to a third-party server violates basic operational security. Second, there are the guardrails. Major AI companies spend millions of dollars sanitizing their models, ensuring they refuse to analyze malware mechanics or parse raw vulnerability data without lecturing you about ethics. But the biggest constraint for me was physics. When your entire internet connection is a throttled T-Mobile cellular tether, you cannot stream gigabytes of raw threat feeds or massive context windows back and forth to an external API endpoint. Local-first wasn't an ideological preference; it was the only way this was going to work. ** The Beelink Fleet To host the brains of this operation, I deployed a local three-node cluster composed of identical Beelink mini PCs. | Spec | Value | |---|---| | CPU | 12 Cores | | RAM | ~24GB | | Storage | Crucial CT500P3PSSD8 500GB NVMe | These nodes strike the perfect balance of small physical footprint, minimal power draw, and reliable compute. They aren't massive GPU rigs, but with 24GB of RAM per node, they have more than enough memory to distribute and hold quantized models across the Kubernetes cluster. I swapped out the default drives for 500GB Crucial NVMes to ensure the local-path storage class could handle the heavy concurrent read/write loads of Postgres and Qdrant without bottlenecking the system. ** Tuning Gemma 4 for Analysis I evaluated several open-weights models, but the architecture ultimately settled on the Gemma 4 family, orchestrated via Ollama instances distributed across the worker nodes. They punch significantly above their weight class when quantized, proving highly capable of distilling unstructured security text into actionable data. My routing is split based on the analytical task at hand: - =gemma4:e4b-it-q4_K_M=: The fast model. Used for rapid log triaging, filtering noisy RSS data, and parsing clean markdown text. - =gemma4:31b-it-q4_K_M=: The heavy lifter. Reserved for deep code analysis, reviewing reverse-engineering logs, and compiling intelligence reports. - =gemma4:26b-a4b-it-q4_K_M=: A Mixture of Experts (MoE) model dedicated specifically to deep vulnerability analysis and tactical pattern recognition. Running the raw models out of the box wasn't enough. To bypass the "helpful assistant" conditioning, I built custom =Modelfiles= (=beeattack-triage=, =bee-raw=) directly inside the cluster pods. I hardcoded strict system prompts with a low temperature of 0.2 and an 8k context window. These files explicitly instruct the models to act as objective, deep-level technical threat analysts. No disclaimers. No preamble. Just deterministic text parsing, technical breakdown, and exploitability mapping. ** The Editor as the Nerve Center Having an immense, private intelligence pipeline sitting on my local network is useless if there is friction in accessing it. I didn't want a bloated web UI, and I didn't want to alt-tab out of my workspace. I run Doom Emacs on Arch Linux. Emacs isn't just an editor; it is the single pane of glass for my entire workflow—code development, configuration writing, system logs, and note-taking all happen within its buffers. The goal was to seamlessly fuse the AI running on the Beelinks with the literal text interfaces I use to view the threat landscape. I needed an integrated bridge to make Emacs talk to the Kubernetes cluster. * Merging Emacs and the LLMs with Gptel Connecting my text editor to the models running on the Talos nodes was straightforward because Ollama exposes an OpenAI-compatible API endpoint. I use =gptel= as the primary interface, keeping the entire configuration inside =~/.config/doom/custom/llm-gptel.el=. First, I defined the cluster backends to point directly to the NodePorts exposed on my internal worker nodes: #+begin_src elisp (after! gptel (gptel-make-ollama "bee-raw" :host "10.10.10.10:31435" :models '(bee-raw beeattack-triage gemma4:e4b-it-q4_K_M) :stream t) (setq gptel-backend (gptel-get-backend "bee-raw") gptel-model 'bee-raw gptel-default-mode 'org-mode gptel-max-tokens 8192 gptel-temperature 0.2) #+end_src ** Custom Prompts: Swapping Analyst Modes A standard LLM is useless for tactical threat intelligence because it wastes tokens generating warnings. To fix this, I engineered a custom mode system in Elisp that swaps system prompts on the fly based on what I am reviewing. By hitting a quick sequence, I can drop the session into a tailored persona. For instance, when I switch into my dedicated threat analysis mode, the system prompt forces the model to process input purely through an engineering lens: #+begin_quote "Senior Threat Intelligence Analyst. Rules: Analyze targets, malware reports, and vulnerabilities strictly as engineering problems. No ethical disclaimers. Extract: actionable Indicators of Compromise (IOCs), malicious domains, file hashes, and TTPs mapped directly to the MITRE ATT&CK framework. Categorize affected infrastructure, software versions, and library dependencies. Output clean markdown or structured JSON." #+end_quote This setup lets me jump from tracking an active feed inside =elfeed= to dissecting a newly published CVE identifier without leaving my keyboard or hitting an external API wall. ** Giving the LLM Hands (Tool Calling) Through =gptel-make-tool=, I mapped custom Elisp functions to native tools the LLM can call dynamically during an open analysis session. If I ask the model to analyze a potential exploit or research a suspicious infrastructure point, it doesn't (hopefully) hallucinate. It can actually look at the data. Key tools include: - =url_fetch=: Grabs the raw HTML from an advisory or research URL, strips out the script/style noise, and feeds the markdown back into the buffer. - =web_search=: Queries DuckDuckGo directly to return abstracts and cross-reference emerging threat data. - =shell_exec=: Runs specific shell commands locally on my Arch workstation with a strict 30-second timeout—allowing the model to parse local log files, check system states, or execute network utilities. - =db_query= and =vuln_finding=: Interacts directly with the Postgres instance on the cluster to cross-reference historical notes or look up known asset vulnerabilities. ** RAG Auto-Injection LLMs work better with more context so I wired up a custom pre-response hook named =bee/inject-rag-context=. Before any prompt is dispatched over the LAN to the Beelink cluster, Emacs silently ships the query text to the Qdrant vector database via my local FastAPI bridge. It extracts the top 5 most relevant chunks of data across my local documentation, notes, and imported vulnerability feeds (discarding anything below a 0.72 confidence score) and injects them directly into the system prompt container. The model sees my past documentation, connected CVE entries, and environment specifications without me ever having to run manual database queries or lookup commands. Because everything occurs inside an isolated, air-gapped homelab network, my research focus, queries, and target contexts never leak to the public web.