Troubleshooting
Common issues and solutions for Laminar deployments.
Quick Diagnostics
# Overall cluster health
kubectl get pods -n laminar
kubectl get events -n laminar --sort-by='.lastTimestamp'
# Component logs
kubectl logs -n laminar -l app=laminar-api --tail=100
kubectl logs -n laminar -l app=laminar-controller --tail=100
# Resource usage
kubectl top pods -n laminar
kubectl top nodesInstallation Issues
Pods Not Starting
Symptom: Pods stuck in Pending state
Diagnosis:
kubectl describe pod -n laminar <pod-name>Common causes:
-
Insufficient resources
kubectl describe nodes | grep -A 5 "Allocated resources"Solution: Add nodes or reduce resource requests
-
No matching nodes
kubectl get nodes --show-labelsSolution: Check nodeSelector and tolerations
-
PVC pending
kubectl get pvc -n laminar kubectl describe pvc -n laminar <pvc-name>Solution: Check StorageClass exists and has capacity
Storage Issues
RocksDB Disk Full
Symptom: Controller errors, pipeline failures
Diagnosis:
kubectl exec -n laminar deploy/laminar-controller -- df -h /dataSolutions:
-
Expand PVC (if supported)
kubectl patch pvc laminar-controller-data -n laminar \ -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' -
Compact RocksDB
kubectl exec -n laminar deploy/laminar-controller -- \ laminar admin compact-db
PVC Not Binding
Diagnosis:
kubectl get pvc -n laminar
kubectl describe pvc -n laminar <pvc-name>
kubectl get storageclassSolutions:
-
Check StorageClass exists
kubectl get storageclass -
Check CSI driver
kubectl get pods -n kube-system | grep csi
Network Issues
Ingress Not Working
Diagnosis:
# Check ingress resource
kubectl get ingress -n laminar
kubectl describe ingress -n laminar laminar
# Check ingress controller
kubectl get pods -n ingress-nginx
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginxSolutions:
-
DNS not resolving
nslookup laminar.example.comSolution: Add DNS record pointing to ingress IP/hostname
-
Certificate issues
kubectl get certificate -n laminar kubectl describe certificate -n laminar -
Backend not found
kubectl get endpoints -n laminar laminar-api
Service Discovery
Symptom: Components cannot communicate
Diagnosis:
# Test DNS resolution
kubectl exec -n laminar deploy/laminar-api -- \
nslookup laminar-controller
# Test connectivity
kubectl exec -n laminar deploy/laminar-api -- \
nc -zv laminar-controller 8001Solutions:
-
Check service exists
kubectl get svc -n laminar -
Check endpoints
kubectl get endpoints -n laminar
Performance Issues
High Latency
Diagnosis:
# Check resource usage
kubectl top pods -n laminar
# Check API response time
kubectl exec -n laminar deploy/laminar-api -- \
curl -w "%{time_total}\n" -s -o /dev/null http://localhost:8000/healthSolutions:
-
Add replicas
kubectl scale deployment laminar-api -n laminar --replicas=3 -
Increase resources
resources: requests: cpu: 500m memory: 1Gi -
Check RocksDB performance
kubectl exec -n laminar deploy/laminar-controller -- \ laminar admin stats
Memory Issues
Symptom: OOMKilled, high memory usage
Diagnosis:
kubectl describe pod -n laminar <pod-name> | grep -A 5 "Last State"
kubectl top pods -n laminarSolutions:
-
Increase memory limits
resources: limits: memory: 4Gi -
Add memory-based HPA
autoscaling: targetMemoryUtilizationPercentage: 80
Pipeline Issues
Pipeline Not Starting
Diagnosis:
# Check controller logs
kubectl logs -n laminar -l app=laminar-controller --tail=100
# Check pipeline status
kubectl exec -n laminar deploy/laminar-api -- \
laminar pipelines listSolutions:
-
Check worker availability
kubectl get pods -n laminar -l app=laminar-worker -
Check resource quotas
kubectl describe resourcequota -n laminar
Checkpoint Failures
Diagnosis:
kubectl logs -n laminar -l app=laminar-controller | grep checkpointSolutions:
-
Check storage access
kubectl exec -n laminar deploy/laminar-controller -- \ aws s3 ls s3://laminar-data/checkpoints/ -
Check IAM permissions (for cloud storage)
kubectl describe serviceaccount -n laminar laminar
Upgrade Issues
Helm Upgrade Failed
Symptom: Upgrade stuck or failed
Diagnosis:
helm status laminar -n laminar
helm history laminar -n laminarSolutions:
-
Rollback
helm rollback laminar -n laminar -
Force upgrade
helm upgrade laminar laminar/laminar \ -n laminar \ --force \ -f values.yaml -
Clean stuck resources
kubectl delete jobs -n laminar --all
Useful Commands
# Get all events
kubectl get events -n laminar --sort-by='.lastTimestamp'
# Watch pod status
kubectl get pods -n laminar -w
# Get pod details
kubectl describe pod -n laminar <pod-name>
# Shell into container
kubectl exec -it -n laminar deploy/laminar-api -- /bin/sh
# Check environment variables
kubectl exec -n laminar deploy/laminar-api -- env | sort
# Port forward for debugging
kubectl port-forward -n laminar svc/laminar-api 8000:8000
# Check resource usage
kubectl top pods -n laminar --containers
# Get Helm values
helm get values laminar -n laminar
# Check PVC status
kubectl get pvc -n laminar
# Network debugging
kubectl run debug --rm -it --image=nicolaka/netshoot -- /bin/bashGetting Help
- Check logs - Most issues are visible in component logs
- Check events - Kubernetes events show scheduling issues
- Check resources - Ensure adequate CPU/memory
- Check storage - Verify PVCs are bound and have space
- Check connectivity - Verify network paths between components
Next Steps
- Configuration - Configuration reference
- High Availability - HA setup
- Backup & Restore - Data protection