Setting up Kafka with SSL and accessing it with Go
Days of fire Kafka and thunder SSL.
The issue has been some null fields where no nulls should appear, and then not being able to count them. It was very puzzling.
I have been setting up a personal project with Kubeless and Kafka (I saw something similar but way more elaborate at work and found it cool to play with), and I wanted to eventually plug SSL correctly between Kafka and a Go producer (with Sarama). But as it sometimes happen, stars (well…) aligned. I have been helping an adjacent team configuring exactly this for a local testing setup.
This post is better read together with this repository.
There are many tutorials and scripts to generate certificats for Kafka (or Java). In some kind of nutshell, they all involve in some form or another:
- Creating a signing key and certificate, for your fake Certificate Authority (CA).
- Creating a keystore for Kafka.
- Creating a Certificate Signing Request (CSR) into this keystore.
- Signing the request with the certificate from your CA.
- Import the certificate from the CA into the keystore (that seems to be necessary for JVM clients, maybe related to certificate chaining?)
- Import the signed certificate in the keystore.
- Import the CA certificate in the truststore (trust stores contain the certificates from CAs, to validate signed stuff).
This is what the Dockerfile here and the script here are prepared to do.
Both at work and at home I have Mac M1s, and I have found so far that JVM-based Docker images have a tendency to misbehave randomly. Sometimes, they hang. Sometimes they look ok and then hang. Sometimes they segfault. Also, I couldn’t get keytool
to work locally. Combining all these:
- The dockerfile has separate steps per key area, so in case any step hangs/dumps the other keys are cached and you don’t need to repeat.
- The setup may fail locally, repeatedly with M1s: make sure your Kafka is reachable with kcat (née kafkacat) and that its logs are moving. If they don’t,
docker compose down
and bring it up again. Once it’s up it will work.
There’s one crucial part I have omitted: certificates are associated with entities. That’s organisations and servers. In most tutorials you can find around this is set up using the CN
“subject” in keys. This is Common Name, which identifies the server (by server name) where the certificate applies, be it a signing or a client certificate. When a client connects to a server, it will check that the server name matches the name in the certificate offered. In most cases of how to connect Go with Kafka (in particular Sarama) the flag InsecureSkipVerify
is used. With this, the server name is not checked when handshaking, opening the door to man-in-the-middle attacks.
My first head-banging issue was with having used the same Common Name for the CA and the client. So, I used kafka
everywhere. What is the problem? Well, think it in terms of a notary who certifies something.
- Notary: Hi, I’m the notary and my name is Kafka, what do you want?
- Client: I want this certificate saying my name is Kafka.
- Notary: Cool, I’ll sign here with my name, Kafka.
Then, the client takes this to town hall, they see that Kafka says that Kafka is named Kafka and the paper is not accepted.
This was the first problem I encountered with the certificates I was using, and neither Kafka nor Go were verbose enough. They just kept failing the SSL handshake, silently except for the final error. It wasn’t until I verified the client certificate with the CA certificate that openssl
told me.
openssl verify -CAfile our_ca.crt producer-ca-signed.crt
Having signed with the same CN
the command above will tell you that the certificate is invalid because it is self-signed. So, remember: never use the same Common Name for a Certificate Authority and a signed client certificate.
The next issue also took me pretty long, because it was extremely unexpected. Basically because I don’t know much about SSL and I was just trying to get it working via snippets of documentation.
After fixing the certificate with proper, different names, I kept having connectivity issues, Kafka and my Go producer were not shaking hands as they should. I couldn’t figure out any way to debug SSL handshakes “globally” in Go (like you can do with ssh
), so I pinged Kafka from Go with this snippet, just after creating a TLS configuration (I added it to the test producer so it’s somewhere I can copy-paste myself):
dialer := &net.Dialer{
Timeout: 60 * time.Second,
}
rawConn, err := dialer.Dial(“tcp”, “kafka:9094")
conn := tls.Client(rawConn, &tlsConfig)
fmt.Println(conn.RemoteAddr())
err = conn.Handshake()
fmt.Println(err)
fmt.Println(conn.VerifyHostname(“kafka”))
And I got a detailed handshake error…
Turns out Common Name is actually a deprecated section. The correct to use (for many years) should be a Subject Alternative Name(SAN). Most “things” have started deprecating the use of CN, and Go has been pretty strong at this. But all tutorials on configuring Kafka SSL settings use CN
s, if they even care about host name verification.
Then, it took me another pretty long while to convince the openssl
commands I had that they should have SAN
s and not CN
s. The trick is to use an extensions file containing the configuration for the SAN
extension. You can find the file I used here.
The code in the repository has a simplified example of all the pieces needed:
- Creation of certificates.
- Docker compose with Kafka and Zookeeper
- Small producer in Go sending a row
With all the plumbing needed. Next time I fail badly at Kafka and SSL I’ll come back to this.