proxmox-management/README.md

217 lines
5.4 KiB
Markdown

# Proxmox Management
This repo contains script used to manage a multi-tenant proxmox environment for the [Reudnetz w.V.]().
These scripts were created beacuse the ansible modules lacked the necessary features.
---
This hacky solution will be replaced by a clean terraform abstraction !
---
## Overview
We use users, groups, resource pools, and roles to build a multi-tenante proxmox instance.
Proxmox by design is not a "cloud native" hypervisor platform. Although it is possible to seperate users and resources from on another there is no way give users/organisations a quota of cpu,memory,disk from which they can assign virtual machines/containers to themselfs.
To overcome this limitation an admin creates those vms/containers and then attaches them read-only to the specific user/organisation.
### Moving Parts
For our hypervisor setup we need to administrate two components:
* the hypervisor platform
* routing/switching
Currently there is no idempotent automation for the hypervisor platform. State changes can be done via scripts or the cli directly (or via the WebGUI of course).
The routing/switching side is fully automated. State changes are configured by modifying the `vlan_vms.yml` file.
### Terminology
Our abstraction is based on these objects:
* users
* organisations
* vms
Users can be part of multiple organisations.
Every vm is bound to a single organisation.
### Proxmox Implementation
In proxmox we use a few more components to implement our abstraction:
* users
* groups (used as organisations)
* resource pools
* used to map permissions for groups
* acls
Under the (proxmox) hood our users (as in from our abstraction) are normal users.
Organisations are abstracted via groups.
VMs and file-based storage are assigned to so called `Resource pools`.
We now map permissions for a specific organisation (group) to a specific resource pool. This seperates the resources and creates a multi-tenant platform.
There are WIP
## Requirements
HOW TO USE THESE SCRIPTS IN AN PVE ENVIRONMENT.
TBD
## Create an Organisation
1. create proxmox related objects
```
./create_organisation ${name}
```
## Create a VM
**prerequisites**:
* there should be a organisation created for that vm
* you should have a list of `authorized_keys` for that organisation saved as a file on that node
0. search for next useable vlan/vm-id in `vlan_vms.yml`
* either use an old but freed id (block should be commented)
* or use next free one
1. check that we have the newest images
```
grep debian download_images
ls /root/images
# execute ./download_images if script url does not match images directory
```
2. create vm
```
./create_vm ${vmid} ${orga}-{vmname}-${vmid} ${orga}
./get_linklocal_for_vm ${vmid}
```
3. add vlan configuration to `vlan_vms.yml` and deploy ansible
```
vim vlan_vms.yml
[...] copy block or create new one - change prefixed accordingly
[...] enter ll for v6 prefix
# deploy routers/switches
ansible-playbook -CD cores.yml -l cores,core_switches
ansible-playbook -D cores.yml -l cores,core_switches
# deploy hyperjump
ansible-playbook -CD vms.yml -l hyperjump
ansible-playbook -D vms.yml -l hyperjump
```
4. configure cloud-init settings for vm
```
qm set ${vmid} --ipconfig0 "ip=${v4-cidr},gw=${v4-gw},ip6=${v6-prefix},gw6=fe80::1"
qm set ${vmid} --sshkeys ${path_to_authorized_keys}
```
5. change vm specifications to order
```
# increase memory / cpu
qm set ${vmid} -memory 8192 --cores 4
# add more storage
# use customer-disks on hyper01 and hyper03
# use customer-disks-slow on hyper02 (dedicated sata ssd pool with more space but less bandwith)
qm set ${vmid} --virtio1 customer-disks-slow:250,iothread=1,discard=on
```
6. start vm
```
qm start ${vmid}
```
7. check reachability of vm
```
ping6 ${v6prefix}
nc ${v6prefix} 22
```
8. notify customer
```
vmid: ${vmid}
vmname: ${vmname}
user: debian
authentication: key based auth only
v4: ${v4-cidr}
v4gw: ${v4-gw}
v6: ${v6-prefix}
v6gw: fe80::1
v6 reachable from everywhere - v4 only inside the reudnetz
ssh reverse proxy (for v4 access only) available via: ssh -p 2${vmid} <user>@hyperjump.reudnetz.org
best effort service - expect downtimes - backup your stuff !
```
## Destroy a VM
commands need to be executed on node where vm resides
1. schedule removal of replicated storage (if configured)
```
pvesr list | grep ${vmid}
pvesr delete ${vmid}-${replication_id}
echo wait till removal is complete - check with pvesr list
```
2. remove vm
```
qm shutdown ${vmid}
qm destroy --purge 1 ${vmid}
```
3. remove vms from `vlan_vms.yml` and deploy
```
vim vlan_vms.yml
[...]
# remove router and switch config
ansible-playbook -CD cores.yml -l cores,core_switches
ansible-playbook -D cores.yml -l cores,core_switches
# remove ssh reverse proxy config
ansible-playbook -CD vms.yml -l hyperjump
ansible-playbook -D vms.yml -l hyperjump
```
## Destroy User
```
./delete_user ${username}@pve
```
## Destroy organisation
1. remove orga and associated objects from proxmox cluster
```
./delete_organisation ${org}
```
2. destroy remaining zfs volumes on other nodes
```
# execute on all pve nodes
zfs destroy -r rpool/customer/${orga}-images
```
## Add/Remove user from organisation
1. check which groups the user currently belongs to
```
pveum group list | grep ${user}@pve
```
2. change groups memberships
```
# add to group
pveum user modify ${user}@pve --groups ${currentgroups},${new_group}
# remove from group
pveum user modify ${user}@pve --groups ${groups-without-other-group}
```