Deploying Bruin on Ubuntu VMs
Managed Option Available
Looking for a fully managed solution? Bruin Cloud provides managed orchestration, monitoring, and scheduling without the operational overhead. Try it free!
This guide walks you through deploying Bruin on Ubuntu-based virtual machines (AWS EC2, Google Cloud Compute Engine, DigitalOcean Droplets, or any Ubuntu server) and scheduling pipeline runs using cron jobs.
Prerequisites
Before you begin, ensure you have:
- An Ubuntu server (18.04 or later recommended)
- SSH access to the server with sudo privileges
- Git installed on the server
- A Bruin project ready to deploy
Step 1: Connect to Your Server
Connect to your Ubuntu VM via SSH:
ssh username@your-server-ip
Replace username
with your actual username and your-server-ip
with your server's IP address or hostname.
Step 2: Update System Packages
Always start by updating your system packages:
sudo apt update && sudo apt upgrade -y
Step 3: Install Git (if not already installed)
Git is required to clone your Bruin projects:
sudo apt install git -y
Verify the installation:
git --version
Step 4: Install Bruin CLI
Install Bruin using the official installation script:
curl -LsSf https://getbruin.com/install/cli | sh
Alternatively, you can use wget
:
wget -qO- https://getbruin.com/install/cli | sh
The installer will automatically add Bruin to your PATH. You may need to restart your shell or run:
source ~/.bashrc # or ~/.zshrc if using zsh
Verify the installation:
bruin --version
Step 5: Clone Your Bruin Project
Clone your Bruin project repository to your server:
cd ~
git clone https://github.com/your-username/your-bruin-project.git
cd your-bruin-project
Replace the URL with your actual repository URL.
Step 6: Configure Credentials
Bruin needs access to your data platforms. Set up your credentials in the .bruin.yml
file in your project root.
Create or edit the .bruin.yml
file:
nano .bruin.yml
Example configuration:
environments:
production:
connections:
google_cloud_platform:
- name: "my_gcp"
service_account_file: "/home/username/.config/gcloud/service-account.json"
project_id: "my-project-id"
postgres:
- name: "my_postgres"
username: "postgres_user"
password: "your_password"
host: "localhost"
port: 5432
database: "mydb"
Storing Service Account Files
If you're using service account files (e.g., for Google Cloud):
mkdir -p ~/.config/gcloud
nano ~/.config/gcloud/service-account.json
Paste your service account JSON content, save, and secure the file:
chmod 600 ~/.config/gcloud/service-account.json
Step 7: Test Your Pipeline
Before setting up automation, test that your pipeline runs successfully:
cd ~/your-bruin-project
bruin run .
If you want to run a specific pipeline:
bruin run pipelines/my_pipeline
Check for any errors and resolve them before proceeding.
Step 8: Set Up Cron Jobs
Cron is a time-based job scheduler in Unix-like operating systems. You'll use it to run your Bruin pipelines automatically.
Understanding Cron Syntax
Cron uses the following format:
* * * * * command-to-execute
│ │ │ │ │
│ │ │ │ └─── Day of week (0-7, Sunday = 0 or 7)
│ │ │ └───── Month (1-12)
│ │ └─────── Day of month (1-31)
│ └───────── Hour (0-23)
└─────────── Minute (0-59)
Examples:
0 * * * *
- Every hour at minute 00 9 * * *
- Every day at 9:00 AM*/15 * * * *
- Every 15 minutes0 2 * * 1
- Every Monday at 2:00 AM0 0 1 * *
- First day of every month at midnight
Create a Cron Job
Open your crontab file:
crontab -e
If this is your first time, you'll be asked to choose an editor. Select nano
(option 1) for simplicity.
Add a cron job to run your pipeline. Here's an example that runs daily at 3:00 AM:
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project >> /home/username/logs/bruin.log 2>&1
Important notes:
- Use absolute paths for both the Bruin executable and your project directory
- Replace
username
with your actual username - The
>> /home/username/logs/bruin.log 2>&1
redirects output to a log file
Multiple Pipelines with Different Schedules
You can schedule different pipelines at different times:
# Run data ingestion pipeline every hour
0 * * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/ingestion >> /home/username/logs/ingestion.log 2>&1
# Run analytics pipeline daily at 6 AM
0 6 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/analytics >> /home/username/logs/analytics.log 2>&1
# Run weekly report every Monday at 8 AM
0 8 * * 1 /home/username/.local/bin/bruin run /home/username/your-bruin-project/pipelines/weekly_report >> /home/username/logs/weekly.log 2>&1
Step 9: Set Up Logging
Create a directory for logs:
mkdir -p ~/logs
Your cron jobs will now write outputs to log files in this directory.
View Logs
Check recent logs:
tail -f ~/logs/bruin.log
View last 100 lines:
tail -n 100 ~/logs/bruin.log
Search for errors:
grep -i error ~/logs/bruin.log
Log Rotation
To prevent log files from growing too large, set up log rotation:
sudo nano /etc/logrotate.d/bruin
Add the following configuration:
/home/username/logs/*.log {
daily
missingok
rotate 14
compress
notifempty
create 0644 username username
}
This configuration:
- Rotates logs daily
- Keeps 14 days of logs
- Compresses old logs
- Creates new log files with proper permissions
Step 10: Set Up Environment-Specific Configurations
Use Bruin's environment feature to manage different configurations:
bruin run . --environment production
Update your cron job to use the production environment:
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project --environment production >> /home/username/logs/bruin.log 2>&1
Step 11: Monitoring and Alerting
Email Notifications on Failure
Cron can send emails when jobs fail. First, install a mail utility:
sudo apt install mailutils -y
Configure postfix when prompted (select "Internet Site").
Create a wrapper script to handle errors:
nano ~/scripts/run-bruin.sh
Add the following:
#!/bin/bash
LOG_FILE="/home/username/logs/bruin.log"
PROJECT_PATH="/home/username/your-bruin-project"
BRUIN_BIN="/home/username/.local/bin/bruin"
echo "=== Starting Bruin run at $(date) ===" >> "$LOG_FILE"
if ! $BRUIN_BIN run "$PROJECT_PATH" --environment production >> "$LOG_FILE" 2>&1; then
echo "Bruin pipeline failed at $(date)" | mail -s "Bruin Pipeline Failed" your-email@example.com
exit 1
fi
echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"
Make it executable:
chmod +x ~/scripts/run-bruin.sh
Update your crontab:
0 3 * * * /home/username/scripts/run-bruin.sh
Step 12: Automatic Updates
Keep your Bruin project up to date by pulling changes before each run:
Update your wrapper script:
#!/bin/bash
LOG_FILE="/home/username/logs/bruin.log"
PROJECT_PATH="/home/username/your-bruin-project"
BRUIN_BIN="/home/username/.local/bin/bruin"
echo "=== Starting Bruin run at $(date) ===" >> "$LOG_FILE"
# Pull latest changes
cd "$PROJECT_PATH"
git pull origin main >> "$LOG_FILE" 2>&1
# Run the pipeline
if ! $BRUIN_BIN run "$PROJECT_PATH" --environment production >> "$LOG_FILE" 2>&1; then
echo "Bruin pipeline failed at $(date)" | mail -s "Bruin Pipeline Failed" your-email@example.com
exit 1
fi
echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"
Security Best Practices
1. Secure Your Credentials
Never commit credentials to Git:
echo ".bruin.yml" >> .gitignore
echo "*.json" >> .gitignore
2. Use SSH Keys for Git
Set up SSH keys for passwordless Git operations:
ssh-keygen -t ed25519 -C "your-email@example.com"
cat ~/.ssh/id_ed25519.pub
Add the public key to your Git provider (GitHub, GitLab, etc.).
3. Restrict File Permissions
chmod 600 ~/.bruin.yml
chmod 600 ~/.config/gcloud/*.json
4. Use a Dedicated User
Create a dedicated user for running Bruin:
sudo useradd -m -s /bin/bash bruin
sudo su - bruin
Then follow all the installation steps as the bruin
user.
Troubleshooting
Cron Job Not Running
- Check if cron service is running:
sudo systemctl status cron
- Check cron logs:
grep CRON /var/log/syslog
- Verify your crontab:
crontab -l
Bruin Command Not Found in Cron
Cron has a limited environment. Always use absolute paths:
# Find the full path to bruin
which bruin
# Use the full path in crontab
0 3 * * * /home/username/.local/bin/bruin run /home/username/your-bruin-project
Permission Denied Errors
Ensure your user has permission to access all files:
chmod +x ~/.local/bin/bruin
chmod -R 755 ~/your-bruin-project
Connection Issues
Test your connections:
bruin connections --environment production
Pipeline Fails in Cron but Works Manually
This often happens due to environment differences. Export all necessary environment variables in your wrapper script:
#!/bin/bash
export PATH="/home/username/.local/bin:$PATH"
export HOME="/home/username"
# Add other environment variables here
# Run your pipeline
/home/username/.local/bin/bruin run /home/username/your-bruin-project
Example: Complete Production Setup
Here's a complete example for a production deployment:
Directory Structure
/home/bruin/
├── projects/
│ └── analytics-pipeline/
├── scripts/
│ ├── run-ingestion.sh
│ └── run-analytics.sh
├── logs/
│ ├── ingestion.log
│ └── analytics.log
└── .config/
└── gcloud/
└── service-account.json
Crontab
# Pull and run ingestion pipeline every 6 hours
0 */6 * * * /home/bruin/scripts/run-ingestion.sh
# Run analytics pipeline daily at 2 AM
0 2 * * * /home/bruin/scripts/run-analytics.sh
run-analytics.sh
#!/bin/bash
set -e
export PATH="/home/bruin/.local/bin:$PATH"
export HOME="/home/bruin"
LOG_FILE="/home/bruin/logs/analytics.log"
PROJECT_PATH="/home/bruin/projects/analytics-pipeline"
echo "=== Starting analytics run at $(date) ===" >> "$LOG_FILE"
cd "$PROJECT_PATH"
git pull origin main >> "$LOG_FILE" 2>&1
if ! bruin run . --environment production >> "$LOG_FILE" 2>&1; then
echo "Analytics pipeline failed at $(date)" | mail -s "Alert: Analytics Pipeline Failed" admin@company.com
exit 1
fi
echo "=== Completed successfully at $(date) ===" >> "$LOG_FILE"
Next Steps
- Explore Bruin Cloud for managed orchestration and monitoring
- Set up CI/CD integration for automated testing
- Learn about quality checks to ensure data quality
- Review best practices for pipeline design