Work in Progress: This page is under development. Use the feedback button on the bottom right to help us improve it.

Troubleshooting

Common issues and solutions for Laminar deployments.

Quick Diagnostics

# Overall cluster health
kubectl get pods -n laminar
kubectl get events -n laminar --sort-by='.lastTimestamp'
 
# Component logs
kubectl logs -n laminar -l app=laminar-api --tail=100
kubectl logs -n laminar -l app=laminar-controller --tail=100
 
# Resource usage
kubectl top pods -n laminar
kubectl top nodes

Installation Issues

Pods Not Starting

Symptom: Pods stuck in Pending state

Diagnosis:

kubectl describe pod -n laminar <pod-name>

Common causes:

  1. Insufficient resources

    kubectl describe nodes | grep -A 5 "Allocated resources"

    Solution: Add nodes or reduce resource requests

  2. No matching nodes

    kubectl get nodes --show-labels

    Solution: Check nodeSelector and tolerations

  3. PVC pending

    kubectl get pvc -n laminar
    kubectl describe pvc -n laminar <pvc-name>

    Solution: Check StorageClass exists and has capacity


Storage Issues

RocksDB Disk Full

Symptom: Controller errors, pipeline failures

Diagnosis:

kubectl exec -n laminar deploy/laminar-controller -- df -h /data

Solutions:

  1. Expand PVC (if supported)

    kubectl patch pvc laminar-controller-data -n laminar \
      -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
  2. Compact RocksDB

    kubectl exec -n laminar deploy/laminar-controller -- \
      laminar admin compact-db

PVC Not Binding

Diagnosis:

kubectl get pvc -n laminar
kubectl describe pvc -n laminar <pvc-name>
kubectl get storageclass

Solutions:

  1. Check StorageClass exists

    kubectl get storageclass
  2. Check CSI driver

    kubectl get pods -n kube-system | grep csi

Network Issues

Ingress Not Working

Diagnosis:

# Check ingress resource
kubectl get ingress -n laminar
kubectl describe ingress -n laminar laminar
 
# Check ingress controller
kubectl get pods -n ingress-nginx
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx

Solutions:

  1. DNS not resolving

    nslookup laminar.example.com

    Solution: Add DNS record pointing to ingress IP/hostname

  2. Certificate issues

    kubectl get certificate -n laminar
    kubectl describe certificate -n laminar
  3. Backend not found

    kubectl get endpoints -n laminar laminar-api

Service Discovery

Symptom: Components cannot communicate

Diagnosis:

# Test DNS resolution
kubectl exec -n laminar deploy/laminar-api -- \
  nslookup laminar-controller
 
# Test connectivity
kubectl exec -n laminar deploy/laminar-api -- \
  nc -zv laminar-controller 8001

Solutions:

  1. Check service exists

    kubectl get svc -n laminar
  2. Check endpoints

    kubectl get endpoints -n laminar

Performance Issues

High Latency

Diagnosis:

# Check resource usage
kubectl top pods -n laminar
 
# Check API response time
kubectl exec -n laminar deploy/laminar-api -- \
  curl -w "%{time_total}\n" -s -o /dev/null http://localhost:8000/health

Solutions:

  1. Add replicas

    kubectl scale deployment laminar-api -n laminar --replicas=3
  2. Increase resources

    resources:
      requests:
        cpu: 500m
        memory: 1Gi
  3. Check RocksDB performance

    kubectl exec -n laminar deploy/laminar-controller -- \
      laminar admin stats

Memory Issues

Symptom: OOMKilled, high memory usage

Diagnosis:

kubectl describe pod -n laminar <pod-name> | grep -A 5 "Last State"
kubectl top pods -n laminar

Solutions:

  1. Increase memory limits

    resources:
      limits:
        memory: 4Gi
  2. Add memory-based HPA

    autoscaling:
      targetMemoryUtilizationPercentage: 80

Pipeline Issues

Pipeline Not Starting

Diagnosis:

# Check controller logs
kubectl logs -n laminar -l app=laminar-controller --tail=100
 
# Check pipeline status
kubectl exec -n laminar deploy/laminar-api -- \
  laminar pipelines list

Solutions:

  1. Check worker availability

    kubectl get pods -n laminar -l app=laminar-worker
  2. Check resource quotas

    kubectl describe resourcequota -n laminar

Checkpoint Failures

Diagnosis:

kubectl logs -n laminar -l app=laminar-controller | grep checkpoint

Solutions:

  1. Check storage access

    kubectl exec -n laminar deploy/laminar-controller -- \
      aws s3 ls s3://laminar-data/checkpoints/
  2. Check IAM permissions (for cloud storage)

    kubectl describe serviceaccount -n laminar laminar

Upgrade Issues

Helm Upgrade Failed

Symptom: Upgrade stuck or failed

Diagnosis:

helm status laminar -n laminar
helm history laminar -n laminar

Solutions:

  1. Rollback

    helm rollback laminar -n laminar
  2. Force upgrade

    helm upgrade laminar laminar/laminar \
      -n laminar \
      --force \
      -f values.yaml
  3. Clean stuck resources

    kubectl delete jobs -n laminar --all

Useful Commands

# Get all events
kubectl get events -n laminar --sort-by='.lastTimestamp'
 
# Watch pod status
kubectl get pods -n laminar -w
 
# Get pod details
kubectl describe pod -n laminar <pod-name>
 
# Shell into container
kubectl exec -it -n laminar deploy/laminar-api -- /bin/sh
 
# Check environment variables
kubectl exec -n laminar deploy/laminar-api -- env | sort
 
# Port forward for debugging
kubectl port-forward -n laminar svc/laminar-api 8000:8000
 
# Check resource usage
kubectl top pods -n laminar --containers
 
# Get Helm values
helm get values laminar -n laminar
 
# Check PVC status
kubectl get pvc -n laminar
 
# Network debugging
kubectl run debug --rm -it --image=nicolaka/netshoot -- /bin/bash

Getting Help

  1. Check logs - Most issues are visible in component logs
  2. Check events - Kubernetes events show scheduling issues
  3. Check resources - Ensure adequate CPU/memory
  4. Check storage - Verify PVCs are bound and have space
  5. Check connectivity - Verify network paths between components

Next Steps