Software upgrades are an essential part of maintaining up-to-date and bug-free production environments. However, executing an upgrade can be a complex and challenging task that necessitates careful planning. This blog post offers a software upgrade checklist for production environments to ensure a smooth upgrade process.

Pre-Upgrade Checklist

  • Identify the Scope of the Upgrade: Determine the version you need to upgrade from and to. Review the upgrade path and associated policies. Make note of any exceptions along the upgrade path.

  • Check the Prerequisites: Consult the compatibility matrix for infrastructure versions, such as the database, secret managers, service mesh, Kubernetes cluster, and Docker runtime. It’s crucial to consider version compatibility to prevent issues post-deployment.

  • Assess the Impact: Identify any potential risks or issues that may arise during or after the upgrade. Also, estimate any required downtime. Review “What’s New,” Release Notes, and Important Changes. Schedule downtime for major upgrades like schema changes or database migrations.

  • Develop a Rollback Plan: Creating a rollback plan is essential should the upgrade fail or cause issues. This plan should outline the steps necessary to revert to the previous software version.

  • Notify Stakeholders: Inform end-users, regulators, and internal stakeholders like the IT team and management. Assign a Person in Charge (PIC) for relevant tasks on the checklist and create a Slack channel or Zoom call for real-time communication.

  • Check Configuration: Review any new fields that need to be configured, or assess whether the default values are appropriate for your environment. Update the configuration files to match the new version.

  • Access Secrets: Ensure that all necessary secrets, like the root database password, are available in the secret manager. Update them as needed.

  • Conduct a Dry Run: Perform the upgrade in a non-production environment to identify potential issues or risks.

During Upgrade Checklist

  • Execute the Upgrade: Start the upgrade by following the steps in the official documentation. Closely monitor the process to catch any issues or errors.

  • Health Check: After completing the upgrade, assess the system’s status. If you encounter unexpected issues like pods crash-looping, raise a production issue and contact the relevant team.

Post-Upgrade Checklist

  • Perform Manual Post-Upgrade Steps: Follow any additional instructions as per the documentation. Typical tasks might include garbage collection or removing unused resources.

  • Verify System Functionality: Confirm that the upgraded system functions correctly and that all data and configurations have been properly migrated.

  • Perform a Sanity Check: Conduct user acceptance tests to ensure that the system meets all requirements and that end-users are satisfied.

  • Monitor Metrics: Use dashboards like Grafana to identify any abnormal behavior and review logs for error messages.

  • (For Database Upgrades Only): Check hashing schema columns and procedures such as query result generation, sorting, hashing, and comparison, as well as row counting.

  • Update Documentation: Revise all relevant internal documentation to reflect changes made during the upgrade.

  • Conduct a Post-Implementation Review: Evaluate the success of the upgrade and identify areas for improvement.

Conclusion

Upgrading to a new software version in a production environment is a critical process. A well-structured checklist can make the upgrade go more smoothly. This checklist should be tailored to meet the specific needs of each system and updated regularly to reflect changes in either the production environment or the software version in use.