Today, Aulendur Labs is proud to release Cortex, an open-source platform that makes enterprise-grade model serving with vLLM dramatically easier. Licensed under Apache-2.0, Cortex represents our commitment to giving back to the AI community while making production ML infrastructure accessible to everyone.
Why Is vLLM Powerful But Complex to Deploy?
vLLM is an exceptional model serving engine, but deploying it in enterprise environments requires authentication, API management, GPU monitoring, and production configurations that take weeks to build from scratch. Cortex by Aulendur Labs wraps vLLM with all of this infrastructure out of the box.
vLLM is an exceptional model serving engine—fast, efficient, and highly optimized for production workloads. However, deploying vLLM in an enterprise context requires significant additional infrastructure:
- Authentication and API key management
- OpenAI-compatible API endpoints for easy integration
- Administrative interfaces for model and user management
- GPU monitoring and resource tracking
- Production-ready deployment configurations
- Proper CORS handling for web applications
Building all of this from scratch takes weeks or months. We built Cortex so you don't have to.
What Is Cortex by Aulendur Labs?
Cortex is an open-source, enterprise-grade model serving platform built by Aulendur Labs that wraps vLLM with OpenAI-compatible APIs, role-based authentication, GPU monitoring, and a modern admin UI. Cortex is the model serving infrastructure that will power DeepLoom at scale.
Cortex is a production-ready platform that wraps vLLM with everything you need for enterprise deployment. It provides:
Core Features
- OpenAI-Compatible API: Drop-in replacement for OpenAI's API, making integration seamless
- Admin UI: Modern web interface for managing models, users, and API keys
- Authentication & Authorization: Role-based access control with API key management
- Multi-Model Support: Serve multiple models simultaneously with automatic routing
- GPU Monitoring: Real-time visibility into GPU utilization, memory, and temperature
- System Metrics: Host CPU, memory, disk, and network monitoring via Prometheus
Developer Experience
We've obsessed over making deployment as simple as possible. The entire stack can be launched with a single command:
make quick-start
# That's it! Cortex will:
# - Detect your host IP automatically
# - Configure CORS properly
# - Start all services
# - Display your access URLs
# Example output:
# ✓ Cortex is ready!
# Login at: http://192.168.1.181:3001/login (admin/admin)How Is Cortex Architected?
Cortex by Aulendur Labs uses a modern, containerized architecture: a Python/FastAPI gateway handles authentication and routing, a Next.js admin UI manages users and models, PostgreSQL stores metadata, and vLLM containers serve models — all monitored via Prometheus, node-exporter, and dcgm-exporter.
Cortex is built on a modern, containerized architecture:
- Gateway (Python/FastAPI): Authentication, API routing, and business logic
- Admin UI (Next.js/React): User-friendly interface for administration
- PostgreSQL: Metadata storage for users, keys, and configurations
- vLLM Containers: Dynamic model serving engines
- Monitoring Stack: Prometheus, node-exporter, dcgm-exporter for observability
What Are Cortex's Real-World Use Cases?
Cortex by Aulendur Labs serves three primary audiences: research labs that need easy access to language models without infrastructure expertise, enterprise AI teams deploying on-premises with full data control, and development teams needing local environments that mirror production.
1. Research Labs
Provide researchers with easy access to powerful language models without requiring deep infrastructure knowledge. The OpenAI-compatible API means existing code just works.
2. Enterprise AI Teams
Deploy models on-premises with full control over data and costs. Built-in monitoring ensures you can track resource utilization and optimize GPU allocation.
3. Development Teams
Local development environment that mirrors production. Test your AI applications against the same API interface you'll use in production.
How Does Cortex Balance Smart Defaults with Full Control?
Cortex by Aulendur Labs automatically detects host IP, available NVIDIA GPUs, and operating system to apply optimal settings — but everything can be customized through environment variables and Docker Compose profiles, giving teams both instant productivity and full configurability.
One of our design principles is "smart defaults, full control." Cortex automatically detects:
- Your host machine's IP address for proper network configuration
- Available NVIDIA GPUs and enables appropriate monitoring
- Operating system (Linux/Windows) and applies optimal settings
But everything can be customized through environment variables and Docker Compose profiles.
Is Cortex Production-Ready?
Yes — Cortex by Aulendur Labs ships production-ready from day one with built-in health checks, database backup commands, comprehensive logging, proper authentication with CORS and network isolation, and horizontal scalability for adding model containers as demand grows.
Cortex includes features critical for production deployments:
- Health Checks: Built-in endpoints for monitoring service health
- Database Backups: Simple commands for backup and restore
- Logging: Comprehensive logging for debugging and audit trails
- Security: Proper authentication, CORS, and network isolation
- Scalability: Add more model containers as demand grows
Why Is Cortex Open Source?
Aulendur Labs believes in contributing back to the communities that enable its work. Cortex is released under the Apache-2.0 license through AulendurForge, Aulendur Labs' open-source initiative, because production AI infrastructure should be accessible to everyone — not just well-funded enterprises.
At Aulendur Labs, we believe in contributing back to the communities that enable our work. vLLM has been instrumental in our defense AI projects, and Cortex is our way of making it easier for others to leverage this powerful technology.
"The best software infrastructure feels invisible. It should just work, so you can focus on what matters—your AI applications."
— Jorden Gershenson, CTO, Aulendur Labs
Getting Started
Ready to try Cortex? Here's how to get started:
- Visit the GitHub Repository: github.com/AulendurForge/Cortex
- Clone and Install Prerequisites: Docker and Make are all you need
- Run Quick Start:
make quick-start - Access the Admin UI: Login at the displayed URL
Documentation & Support
The repository includes comprehensive documentation covering:
- Quick start and installation guides
- Architecture and design decisions
- Model download and configuration
- Deployment best practices
- API reference and examples
- Security considerations
What's Next for Cortex?
Aulendur Labs is actively developing Cortex features including automatic model scaling, request queuing and load balancing, fine-tuning pipeline integration, enhanced analytics, and multi-tenant isolation — building toward the infrastructure that will serve DeepLoom in production.
This is just the beginning. We're actively developing features like:
- Automatic model scaling based on demand
- Request queuing and load balancing
- Fine-tuning pipeline integration
- Enhanced analytics and usage tracking
- Multi-tenant isolation improvements
Join the Community
We'd love your feedback, bug reports, and contributions! Whether you're:
- Using Cortex in production
- Finding bugs or suggesting features
- Contributing code or documentation
- Sharing your use case
Please engage with us on GitHub. Star the repo, open issues, and submit pull requests—we're building this together.
About AulendurForge
AulendurForge is Aulendur Labs' open-source initiative, dedicated to building and sharing AI infrastructure tools that benefit the entire community. While our defense work remains proprietary, we're committed to open-sourcing components that have broad applicability.
Questions or feedback? info@aulendur.com
Frequently Asked Questions
Cortex is an open-source, enterprise-grade model serving platform built by Aulendur Labs. Cortex simplifies deploying large language models like those served by vLLM, providing production-ready monitoring, APIs, and authentication out of the box. Cortex is the infrastructure that will power DeepLoom at scale.
Yes, Cortex is fully open source under the Apache-2.0 license and is available on GitHub at github.com/AulendurForge/Cortex. Aulendur Labs is committed to open-sourcing AI infrastructure tools that benefit the entire community through its AulendurForge initiative.
Cortex is the model serving infrastructure that will power DeepLoom in production. By building and proving Cortex as open-source software, Aulendur Labs validates its ability to deploy AI systems at scale with the monitoring, APIs, and authentication that enterprise and defense environments require.