Figuring out Kafka listeners

September 9, 2024 by Jamesrandell Leave a comment

After two and a half years of zero Kafka in my workflow, I’ve just gone and set up a new VM cluster on my Proxmox server.

After faffing for 90 minutes, went down the docker route instead – only so I can tear not down and rebuild faster as I go through and touch things either by accident or on purpose.

Anyway, in doing that I’ve come across dealing with the Kafka Listener part of the config. In Docker Compose these are exposed with KAFKA_LISTENERS and KAFKA_ADVERTISED_LISTENERS. There are a couple more I’ll try and explain as well.

Listeners in a nutshell

KAFKA_LISTENERS – internal to the Kafka Cluster, KAFKA_ADVERTISED_LIESTENERS – external to the Kafka Cluster.

When a client connects to any Kafka node (say for example a 3 node Kafka Cluster), it gets the info about the cluster, which which node it needs to use.

There are some settings I found which allow you to rename things, but I find distracting. KAFKA_LISTENER_SECURITY_PROTOCOL_MAP for instance, is a way for giving custom names to a protocol type, i.e. PLAINTEXT, SSL/TLS, SASL_PLAINTEXT and SASL_SSL. You can name these BOB, KEV, TOM and JON for example.

When it comes to using them as a KAFKA _LISTENER, you could set TOM as a synonym for PLAINTEXT (which is un-encrypted tcp), and use: PLAINTEXT://:9092 for example. Which still makes no sense typing that out.

Best thing I can do is link to https://rmoff.net/2018/08/02/kafka-listeners-explained/ that has a helpful page on listeners, though this bit in particular helped me out:

KAFKA_LISTENERS: LISTENER_BOB://kafka0:29092,LISTENER_FRED://localhost:9092
KAFKA_ADVERTISED_LISTENERS: LISTENER_BOB://kafka0:29092,LISTENER_FRED://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_BOB:PLAINTEXT,LISTENER_FRED:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_BOB

KAFKA_LISTENERS: LISTENER_BOB://kafka0:29092,LISTENER_FRED://localhost:9092

KAFKA_ADVERTISED_LISTENERS: LISTENER_BOB://kafka0:29092,LISTENER_FRED://localhost:9092

KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_BOB:PLAINTEXT,LISTENER_FRED:PLAINTEXT

KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_BOB

Installing Kafka on Ubuntu

September 4, 2024 by Jamesrandell Leave a comment

sudo adduser kafka
sudo adduser kafka sudo
su -l kafka

sudo apt update
sudo apt install openjdk-21-jdk

mkdir ~/downloads
cd ~/downloads
wget https://archive.apache.org/dist/kafka/3.8.0/kafka_2.12-3.8.0.tgz

cd ~
tar -xvzf ~/downloads/kafka_2.12-3.8.0.tgz
mv kafka_2.12-3.8.0/ kafka/

vim ~/kafka/config/server.properties

sudo adduser kafka

sudo adduser kafka sudo

su -l kafka

sudo apt update

sudo apt install openjdk-21-jdk

mkdir ~/downloads

cd ~/downloads

wget https://archive.apache.org/dist/kafka/3.8.0/kafka_2.12-3.8.0.tgz

cd ~

tar -xvzf ~/downloads/kafka_2.12-3.8.0.tgz

mv kafka_2.12-3.8.0/ kafka/

vim ~/kafka/config/server.properties

Proxmox and cloning a VM

August 30, 2024 by Jamesrandell Leave a comment

I’m doing a bunch of database work with Cassandra (and soon Kafka). Because of that I want to be able to duplicate a bunch of base linux VM’s pretty fast.

I’ve started out installing from a 2gb Ubuntu image. The only changes I’ve made is to update apt and install the Qemu Guest Agent.

When cloning from this new template, I spotted the IP address assigned is the same as the original VM, in fact the cloned VM’s will all share the same IP.

A bit of googling later, the fix for me was:

sudo su -
echo -n > /etc/machine-id
rm /var/lib/dbus/machine-id

reboot

sudo su -

echo -n > /etc/machine-id

rm /var/lib/dbus/machine-id

reboot

I had to change user to the root to be able to change the machine-id file.

From some research, depending on the DHCP service you run, it can use a machine id (and?)/OR the MAC address when assigning a new IP. In my case with Ubiquite hardware, it seemed to take the machine ID in to account as Proxmox already assigned a new random MAC for these VM’s

Setting up a new Centos 9 Stream VM

August 30, 2024 by Jamesrandell Leave a comment

This is part of my Cassandra learning series.

I did stop developing this part as I had enough to the RHEL/Centos changes that happened a few years ago (they made RHEL closed sourced and (I think) had to archive the Centos channels). This meant additional updates to the yum update repos, which, I will get round to doing. But I’m trying my best to get to the database part (Cassandra, Kafka, beats, log stash etc) and not so much faff with flavours of Linux.

sudo yum update<br><br>yum install qemu-guest-agent<br><br>yum install net-tools<br>yum install vim<br>yum install firewalld<br>yum -y install openssh-server openssh-clients<br><br>systemctl enable firewalld<br>systemctl start firewalld<br><br>firewall-cmd --add-service=ssh --permanent<br>firewall-cmd --reload<br><br>sudo systemctl start sshd<br><br><br>firewall-cmd --permanent --zone=internal --add-source=10.11.0.0/2<br><br>

sudo yum update yum install qemu-guest-agent yum install net-tools yum install vim yum install firewalld yum -y install openssh-server openssh-clients systemctl enable firewalld systemctl start firewalld firewall-cmd --add-service=ssh --permanent firewall-cmd --reload sudo systemctl start sshd firewall-cmd --permanent --zone=internal --add-source=10.11.0.0/2

Setting up a fresh Ubuntu image

August 25, 2024 by Jamesrandell Leave a comment

This is part of my Cassandra learning series.

vim /etc/ssh/sshd_config
PermitRootLogin yes
sudo systemctl restart sshd

Client machine (my Mac for example)

ssh-keygen
ssh-copy-id -i ansible_host_sshkey.pub root@&lt;IP address>

vim /etc/ssh/sshd_config

PermitRootLogin yes

sudo systemctl restart sshd

Client machine (my Mac for example)

ssh-keygen

ssh-copy-id -i ansible_host_sshkey.pub root@<IP address>

Ignore the SSH key name. I ended up using the same ssh key for my VMs

Finding Subscriber and Publisher details from the distributor

April 20, 2022 by Jamesrandell Leave a comment

This will ultimately need tidying up. You can run this query on the distribution node in a SQL replication environment. It’ll give you all the publisher servers and databases, along with the subscriber servers and databases. It will also give you the article count.

Some notes on the code

First, I think this is pretty cool. I added the article count so I can quickly see if there are differences between duplicate publications to either the same subscriber or another. It gives a easy ‘catch’ to check for slightly different publications if you need them to be the same.

Second, and the reason why I’m posting this is I was trying to find the server name for the publisher and subscriber. The internet told me to JOIN the sys.servers table, which gives you the *wrong* data. Instead it’s the MSreplservers table in the distribution database itself. Hey I found this interesting as it caused me to waste time trying to figure it out, hence the post.

SELECT
									a.c AS 'article_count',
									s.subscriber_id,
									ss1.srvname,
									p.publisher_db AS publisher_server,
									p.publication,
									ss2.srvname AS subscriber_server,
									s.subscriber_db,
									da.name AS job_name 
				FROM				MSpublications p 
								--ON	a.publication_id = p.publication_id 
							JOIN	MSsubscriptions s 
								ON	p.publication_id = s.publication_id 
							JOIN	MSdistribution_agents da 
								ON	da.publisher_id = p.publisher_id  
								AND da.subscriber_id = s.subscriber_id
							JOIN	MSreplservers ss1
								ON	p.publisher_id = ss1.srvid
							JOIN	MSreplservers ss2
								ON	s.subscriber_id = ss2.srvid
							CROSS APPLY (SELECT COUNT(*) as c, publication_id FROM MSArticles WHERE s.publication_ID = publication_id GROUP BY publication_id) a
				GROUP BY			p.publisher_db,
									p.publication,
									s.subscriber_db,
									da.name,
									a.c,
									ss1.srvname,
									ss2.srvname,
									s.subscriber_id

SELECT

a.c AS 'article_count',

s.subscriber_id,

ss1.srvname,

p.publisher_db AS publisher_server,

p.publication,

ss2.srvname AS subscriber_server,

s.subscriber_db,

da.name AS job_name

FROM MSpublications p

--ON a.publication_id = p.publication_id

JOIN MSsubscriptions s

ON p.publication_id = s.publication_id

JOIN MSdistribution_agents da

ON da.publisher_id = p.publisher_id

AND da.subscriber_id = s.subscriber_id

JOIN MSreplservers ss1

ON p.publisher_id = ss1.srvid

JOIN MSreplservers ss2

ON s.subscriber_id = ss2.srvid

CROSS APPLY (SELECT COUNT(*) as c, publication_id FROM MSArticles WHERE s.publication_ID = publication_id GROUP BY publication_id) a

GROUP BY p.publisher_db,

p.publication,

s.subscriber_db,

da.name,

a.c,

ss1.srvname,

ss2.srvname,

s.subscriber_id

Configuring gMSA account for MSSQL with PowerShell

March 28, 2022 by Jamesrandell Leave a comment

I’ve started to setup a SLQ2019 Availability group on my PC at home. You see I upgraded it last year specifically for this purpose, and up to now I’ve been spinning up various NoSQL databases in Docker containers (Elastic Search, Cassandra, Kafka). I went as far as setting up a 5 node Cassandra cluster in Hyper-V amongst other various VM’s for testing with distributed database tech (I once thought 64gb of memory was a lot but now I keep an eye on it if I don’t stop previous tests).

Here, for this test I’ve got the following:

A Windows Server 2019 running Active Directory and various AD services (DNS and GPO)
Two Windows Server 2019 VMs running MSSQL2019

All VM’s have 2 virtual cores, the SQL boxes have 8gb ram whilst the DC has 2gb. Storage is provided by a 2tb M.2 drive delivering around 28k read/21k write IOPS.

I did follow a few different guides for this, so if you want the (much better) instructions the links can be found at the end. I wrote this because there were a few other things not included, the typical PowerShell things that throw errors if you’re just getting to grips with it.

Make sure your domain is running

It’s been a coupleof years since I last administered Active Directory, DNS and GPO, so it took me half a day to figure out some of the basics again. I’ll list out the sort of stuff I configured:

Install AD, DNS and AD Web Services on my DC using default settings through out.
Created a domain during the AD set-up process, using a .local suffix to not interfere with my business website of the same name (this VM environment is only ever going to be an internal playground
In AD Users and Computers, created a new OU called People so we don’t pollute the Users OU
- Created a new user that I can use for my Domain Admin. I will get round to adding a noddy account, but right now I’m doing this as quick as I can
Created a new Virtual Switch on an Internal network. I don’t need internet access for my servers. Configured my VM’s to use this switch.
Back in the VM’s, hard configured all the IPV4 addresses in the network adaptor to:
- Used a 172.24.101.1-255 address range (255.255.255.0 subnet)
- Updated the Default Gateway to point to my DC
- Updated primary DNS to my DC and secondary to a random Google one (8.8.8.8)
- Rebooted the VM’s to register the new IP in Hyper-V, though this may just be a display issue on the networking tab but in either case it takes but a minute to do.
Join the two servers to my Domain in System Properties. Always a pain to find, but it’s in
- Control Panel
- System and Security
- System
- Advanced system settings
- Computer Name tab
- Hit Change then change the Member of field. I took this opportunity to change my server names to something a bit more memorial (server1 and server2)
You’ll need to reboot them again
Once back up you can use the domain account created earlier to log in. You may want to prefix it with your domains NETBIOS name for the first time to ensure you login using the domain account.

You should now be able to ping the servers from the DC. You may want to go into the DNS manager and just make sure the Forward Lookup Zones for your domain shows your servers with the correct IP’s you set. You can also run in Powershell to test the domain connection:

Test-ComputerSecureChannel -Verbose

1	Test-ComputerSecureChannel -Verbose

This tests the communication between the server you’re in on the DC. So long as you get True then you’re good to go and can proceed with generating the gMSA’s.

Generating the gMSA

We’ve a few things to run in Powershell. These commands form part of my PS script I’ve got in Github, but I’ll paste them below:

# test communication between server and the DC:
Test-ComputerSecureChannel -Verbose

# you need this to continue with these commands
Install-WindowsFeature RSAT-AD-PowerShell

# This is on the target node (MSSQL in this case)
# Import the AD tools into the current session
Import-Module ActiveDirectory; 

# test to make sure we can get the domain name. This should return whatever you called your domain.
# after this we can move on and fill out our config for the new gMSA
(Get-ADDomain).DNSRoot;

# test communication between server and the DC:

Test-ComputerSecureChannel -Verbose

# you need this to continue with these commands

Install-WindowsFeature RSAT-AD-PowerShell

# This is on the target node (MSSQL in this case)

# Import the AD tools into the current session

Import-Module ActiveDirectory;

# test to make sure we can get the domain name. This should return whatever you called your domain.

# after this we can move on and fill out our config for the new gMSA

(Get-ADDomain).DNSRoot;

This next block is a Powershell script to generate the gMSAs. It goes hand in hand with a JSON file (just underneath) that acts as our config file.

Import-module activedirectory

# change this if this isn't where your MSAs are held.
# I've also run it without that at all and it still worked and put the accounts in the
# correct OU
#$ou = $("CN=Managed Service Accounts,") + (Get-ADDomain).DistinguishedName

# need this for when we create the account
$domain = (Get-ADDomain).DNSRoot

# load up our config, edit this file to add in your servers/account
$json = Get-Content 'config.json' | Out-String | ConvertFrom-Json

$accountObj = $json | Select-Object -expand account
$serverObj = $json | Select-Object -expand server

# identify any gMSA accounts in the config file.
$service_gMSA = $accountObj | Select-Object -expand service | Where-Object {$_.type -match "gMSA"}

if ($serverObj) {

    $data = $serverObj

    $serverDetailsObj = $data.ForEach{ return (Get-ADComputer $_.name)  }

    if ($service_gMSA) {
        
        $sub_data = $service_gMSA
        foreach ($sub_key in $sub_data) {
            
            "Checking for " + $sub_key.username + "..."
            Try {
                "...account already exists! See details below:"
                Get-ADServiceAccount -Identity $sub_key.username
            } Catch &#91;Microsoft.ActiveDirectory.Management.ADIdentityNotFoundException] {
                "Creating new gMSA acount:" + $sub_key.username
                If($ou) {
                    New-ADServiceAccount -Name $sub_key.username -Path "$ou" -DNSHostName "$sub_key.username.$domain" -PrincipalsAllowedToRetrieveManagedPassword $serverDetailsObj -TrustedForDelegation $true
                } else {
                    New-ADServiceAccount -Name $sub_key.username -DNSHostName "$sub_key.username.$domain" -PrincipalsAllowedToRetrieveManagedPassword $serverDetailsObj -TrustedForDelegation $true
                }
                
            }
            
        }
    }
}

Import-module activedirectory

# change this if this isn't where your MSAs are held.

# I've also run it without that at all and it still worked and put the accounts in the

# correct OU

#$ou = $("CN=Managed Service Accounts,") + (Get-ADDomain).DistinguishedName

# need this for when we create the account

$domain = (Get-ADDomain).DNSRoot

# load up our config, edit this file to add in your servers/account

$json = Get-Content 'config.json' | Out-String | ConvertFrom-Json

$accountObj = $json | Select-Object -expand account

$serverObj = $json | Select-Object -expand server

# identify any gMSA accounts in the config file.

$service_gMSA = $accountObj | Select-Object -expand service | Where-Object {$_.type -match "gMSA"}

if ($serverObj) {

$data = $serverObj

$serverDetailsObj = $data.ForEach{ return (Get-ADComputer $_.name) }

if ($service_gMSA) {

$sub_data = $service_gMSA

foreach ($sub_key in $sub_data) {

"Checking for " + $sub_key.username + "..."

Try {

"...account already exists! See details below:"

Get-ADServiceAccount -Identity $sub_key.username

} Catch [Microsoft.ActiveDirectory.Management.ADIdentityNotFoundException] {

"Creating new gMSA acount:" + $sub_key.username

If($ou) {

New-ADServiceAccount -Name $sub_key.username -Path "$ou" -DNSHostName "$sub_key.username.$domain" -PrincipalsAllowedToRetrieveManagedPassword $serverDetailsObj -TrustedForDelegation $true

} else {

New-ADServiceAccount -Name $sub_key.username -DNSHostName "$sub_key.username.$domain" -PrincipalsAllowedToRetrieveManagedPassword $serverDetailsObj -TrustedForDelegation $true

}

And now the config file:

{
    "account": {
        "service": &#91;
            {
                "name": "MSSQL-server",
                "type": "gMSA",
                "username": "MSSQL-server",
                "password": null
            },
            {
                "name": "MSSQL-agent",
                "type": "gMSA",
                "username": "MSSQL-agent",
                "password": null
            }
        ]
    },
    "server": &#91;
        {
            "name": "server1",
            "instance": "MSSQLSERVER"
        },
        {
            "name": "server2",
            "instance": "MSSQLSERVER"
        }
    ]
}

{

"account": {

"service": [

{

"name": "MSSQL-server",

"type": "gMSA",

"username": "MSSQL-server",

"password": null

{

"name": "MSSQL-agent",

"type": "gMSA",

"username": "MSSQL-agent",

"password": null

}

]

"server": [

{

"name": "server1",

"instance": "MSSQLSERVER"

{

"name": "server2",

"instance": "MSSQLSERVER"

}

]

}

This should create you as many accounts tied to as many servers as you add to the config file.

Now I didn’t have any issues with this. However, I was running these scripts from my SQL boxes as a domain admin. I had to do something back in the day to Enable Delegation that would allow certain boxes on my domain to create MSA accounts and configure SPNs for me. Essentially, there were two options to do this:

Create the accounts from the DC, enabling the advanced view and doing something with the setSPN command (this was when I used normal MSA’s instead of group – I haven’t read into the differences with these but I think these are a bit easier to get going)
Enable Delegation so that the server can ‘register’ a MSA with the DC. This requires elevated permissions, but these permissions can be revoked after you’ve generated the account.

There are really good guide out there if you Google ‘SQL and setSPN, MSA’ or words to that effect.

https://www.derekseaman.com/2018/09/sql-2017-always-on-ag-pt-3-service-accounts.html

https://medium.com/@jibinpb/create-group-managed-service-account-gmsa-using-powershell-626f8a7a4aa0

https://www.altaro.com/hyper-v/virtual-networking-configuration-best-practices/

Kerberos Authentication to your SQL Server Instance

Hyper-V and Windows Server 2019: Unable to boot

March 19, 2022 by Jamesrandell Leave a comment

It’s been a while since I had to create a new Windows Virtual Machine using Hyper-V. Last set of VM I created were all CentOS for my Cassandra experiments and my Windows 10 Enterprise VM is at least 180 days old as the evaluation period has long expired.

When creating the VM, either using the Quick Create option or the more advanced New VM one, using the ISO or VHDX file it would always fail to boot, citing an unable to load boot disk or something.

You know that screen where it tels you to ‘Press any key to load from disk’? Pressing any key doesn’t have any effect.

The fix is to mash the keys as it boots. There seems to be a tiny time frame between when Hyper-V starts the VM and the boot loader screen takes over. For whatever reason the keyboard gets locked out if you miss this window, and the load process times out and you’re left with a stalled VM.

tl;dr

If you suffer from you VM’s not booting from the ISO or VHDX, hammer a few keys as it loads to get it to load – don’t wait too long!

Python on Windows and aliasing

January 18, 2022 by Jamesrandell Leave a comment

I’m really lucky to have a decent PC that I can spin up multiple VM’s on when fiddling with database stacks, and a MacBook Pro that I can code from bed (being far too lazy helps).

I use VSCode FOR my IDE and have a multi-root workspaces configure for multiple[le GIT repos under a single workspace which I find easier.

This all leads to a slightly different dev environment that I switch between, so I also use some ansible scripts on my GIT repos to make sure I can run things on either machine. I have WSL set up with Ubuntu 20.x running which acts as my ansible host on windows, and ansible runs on OSX anyway.

This rambling finally leads onto Python. I’ve just started really coding in Python to build my REST API tool for Cassandra, however I’ve been tinkering from a infrastructure standpoint via Ansible for a while now. The number one thing I’ve learnt along the way is it’s a little retarded when it comes to figuring out what version you want to use between pip, python, and what ever modules you use in your app, especially on OSX.

In anycase, this is just a small tidbit on the way i explicity mention python3 on the commnd line on my OSX machine, but it’s ‘python’ only on my windows box. The easiest solution to this was t oad dthe following in PowerShell:

Set-Alias -Name python3 -Value python

1	Set-Alias -Name python3 -Value python

Python REST API

January 10, 2022 by Jamesrandell Leave a comment

Some quick notes:

FLASK is a module for Python that sets up a mini web-server, including some routing
FLASK-Restx is a plugin for FLASK that gives you a bit more REST like functionality, and allows you to build an API
Restx is a fork of Restful, which apparently is no longer maintained. The team decided to fork it because the original maintainer didn’t release the py permissions or something, so the new team couldn’t create any new releases, so the latest version is now called Restx

Instead of regurgitating code samples I’ve found on the web, I’ll list some info that took me far to long to find

Namespaces and Blueprints

The examples on how to build a API and pretty basic and don’t go into the detail of scaling up and using a DRY principle.

Blueprints is a way to group similar functionality together. You can group all your /user endpoints in one blueprint for example. You can supply a ‘url_prefix’ to the command which allows you to sort of pick up and move groups of functionality of you API around to a different route if you like, sort of functions like an entry point. I.e one day you may want to change the name from /user to /people