hmu.ai
Back to Architect Agents
Architect Agent

Hyper-Focused Database Schema Optimization Plan for Freelance Developers

Stop doing this manually. Deploy an autonomous Architect agent to handle database schema optimization plan entirely in the background.

Zero-Shot Command Setup

Generate a database schema optimization plan for an existing PostgreSQL database supporting a high-traffic social media analytics dashboard, currently experiencing slow query performance and high CPU utilization.

Core Benefits & ROI

  • Significantly improves dashboard load times and user experience
  • Reduces server load and operating costs (e.g., cloud resources)
  • Extends the useful life and performance of existing database infrastructure
  • Identifies and resolves critical performance bottlenecks proactively
  • Enhances data retrieval efficiency for complex analytical queries

Ecosystem Integration

This "Architect" agent is a critical component of the **Maintenance & Evolution** pillar, providing actionable plans for improving the performance and longevity of existing systems. Its detailed optimization plan directly informs the **Development** pillar by guiding developers on query refactoring, schema changes, and indexing strategies. The output also has implications for the **Deployment & Operations** pillar, recommending database configuration tuning and maintenance routines. By proactively addressing performance issues, it ensures the application remains responsive and cost-effective, upholding a positive user experience.

Sample Output

# Database Schema Optimization Plan for Social Media Analytics Dashboard **Current Issues:** Slow query performance, high CPU utilization in PostgreSQL database. **I. Indexing Strategy Review & Enhancement:** 1. **Analyze Existing Indexes:** * **Action:** Use `pg_stat_user_indexes` and `pg_stat_activity` to identify rarely used or redundant indexes. * **Recommendation:** Drop unused indexes to reduce write overhead and storage. 2. **Identify Missing Indexes:** * **Action:** Analyze slow queries from `pg_stat_statements` (if enabled) or query logs. Look for columns frequently used in `WHERE` clauses, `JOIN` conditions, `ORDER BY` clauses, and `GROUP BY` clauses that lack appropriate indexing. * **Recommendation:** * **B-tree Indexes:** Create B-tree indexes on `foreign key` columns, `ID` columns, and other columns used for equality and range searches. * **Composite Indexes:** For queries with multiple `WHERE` conditions, create composite indexes with the most selective columns first. E.g., `CREATE INDEX ON analytics_data (user_id, dashboard_id, created_at)`. * **Partial Indexes:** For tables with many rows but only a subset frequently queried (e.g., `status = 'active'`), create partial indexes. E.g., `CREATE INDEX ON users (email) WHERE status = 'active';` * **Expression Indexes:** For queries involving functions on columns, create indexes on the expressions. E.g., `CREATE INDEX ON posts (LOWER(title));` 3. **Indexing Best Practices:** * Avoid over-indexing, which increases write times and storage. * Consider data cardinality: indexes are most effective on columns with high cardinality. * Validate index usage with `EXPLAIN ANALYZE`. **II. Query Optimization & Refactoring:** 1. **Analyze Slow Queries:** * **Action:** Use `EXPLAIN ANALYZE` on the top N slow queries (identified from logs or `pg_stat_statements`) to understand execution plans, identify sequential scans, and costly operations. * **Recommendation:** * **Avoid `SELECT *`:** Only fetch necessary columns. * **Optimize `JOIN` clauses:** Ensure joins are on indexed columns. Prefer `INNER JOIN` where appropriate. * **Reduce Subqueries & Correlated Subqueries:** Convert to `JOIN`s or `CTE`s (Common Table Expressions) where possible. * **`WHERE` Clause Optimization:** Ensure filters are applied early. * **Pagination:** Implement efficient pagination using `LIMIT` and `OFFSET` (or keyset pagination for very large datasets). * **Aggregation:** Optimize `GROUP BY` and `ORDER BY` clauses, ensuring relevant columns are indexed. 2. **Materialized Views:** * **Recommendation:** For complex, frequently accessed analytical queries that don't require real-time data, create materialized views. These pre-compute and store the query results. * **Action:** Identify suitable aggregate queries (e.g., daily/hourly social media metrics). Schedule regular refreshes (`REFRESH MATERIALIZED VIEW`). 3. **Window Functions:** * **Recommendation:** Leverage PostgreSQL window functions for complex analytical calculations (e.g., rankings, running totals) to avoid self-joins and improve readability/performance. **III. Schema Design Refinements:** 1. **Normalization vs. Denormalization:** * **Current State Assessment:** Evaluate if the current schema is overly normalized (leading to excessive joins for common queries) or insufficiently normalized (leading to data redundancy and update anomalies). * **Recommendation (for analytics):** Consider strategic denormalization (e.g., adding frequently joined columns to a fact table) for specific high-read-volume analytical tables, to reduce join complexity and improve query performance at the cost of some data redundancy. This should be done judiciously. 2. **Partitioning:** * **Recommendation:** For very large tables (e.g., `analytics_data` with time-series data), implement table partitioning (e.g., by date or `dashboard_id`). * **Benefit:** Improves query performance by scanning smaller segments, simplifies maintenance (e.g., dropping old data), and can improve backup/restore times. * **Action Plan:** Identify suitable large tables. Choose a partitioning key (e.g., `created_at` for range partitioning). 3. **Data Types Optimization:** * **Recommendation:** Review column data types. Use the smallest appropriate data type to reduce storage footprint and improve I/O performance (e.g., `SMALLINT` instead of `INTEGER` if values fit, `DATE` instead of `TIMESTAMP` if time isn't needed). * **Action:** Audit common data types and propose changes where feasible. **IV. Database Configuration & Maintenance:** 1. **PostgreSQL Configuration Tuning:** * **Action:** Review `postgresql.conf` parameters. * **Recommendation:** * `shared_buffers`: Allocate ~25% of system RAM. * `work_mem`: Increase for complex queries with sorts/hashes. * `effective_cache_size`: Set to ~50-75% of total RAM. * `maintenance_work_mem`: Increase for `VACUUM` and index creation. * `max_connections`: Ensure sufficient for application needs without being excessive. 2. **`VACUUM` Strategy:** * **Recommendation:** Ensure `AUTOVACUUM` is properly configured and running. For very high-traffic tables with frequent updates/deletes, a more aggressive `VACUUM` strategy (or manual `VACUUM ANALYZE` during off-peak hours) may be required to prevent table bloat and ensure up-to-date statistics. * **Action:** Monitor table bloat using `pg_stat_all_tables`. 3. **Statistics:** * **Recommendation:** Ensure `ANALYZE` is regularly run (often handled by autovacuum) to provide the query planner with accurate statistics for optimal execution plans. Manually run `ANALYZE` after significant data imports. **V. Hardware & Cloud Resources (If applicable):** 1. **Vertical Scaling (Temporary/Immediate):** * **Recommendation:** Consider upgrading the database instance to one with more CPU and RAM if the above software optimizations are insufficient or as an immediate stop-gap. 2. **Read Replicas:** * **Recommendation:** For a read-heavy analytics dashboard, offload read queries to one or more read replicas. * **Action:** Configure PostgreSQL streaming replication and modify application logic to direct read queries to replicas. **Implementation Plan Overview:** 1. **Audit & Monitor:** Begin by thoroughly auditing current performance, slow queries, and index usage. 2. **Prioritize:** Focus on the highest impact changes first (e.g., critical missing indexes, worst-performing queries). 3. **Test:** Implement changes in a staging environment. Thoroughly test performance with realistic data and query loads. 4. **Rollout:** Gradually deploy changes to production, monitoring performance closely. 5. **Iterate:** Performance optimization is an ongoing process. Continuously monitor and refine.

Frequently Asked Questions

Does this plan include specific SQL scripts for index creation or query changes?

This plan provides a strategic outline and specific recommendations, including examples of `CREATE INDEX` syntax and types of queries to refactor. While it doesn't generate ready-to-execute SQL for *all* identified issues, it gives you the precise guidance needed to construct those scripts, acting as a detailed architectural brief for the database administrator or developer.

How can I ensure these optimizations don't break existing application functionality?

The plan emphasizes a rigorous testing methodology. It strongly recommends implementing changes in a staging environment first, thoroughly testing with realistic data and query loads, and using `EXPLAIN ANALYZE` to validate changes before deploying to production. Additionally, monitoring after deployment is crucial to catch any unforeseen regressions.