Skip to content

Conversation

@weizhouapache
Copy link
Member

Description

This PR fixes: #12107 #11879

Step to reproduce the issue

  • create VPC with redundant offering
  • create vpc tier and vm
  • check /etc/dnsmasq.d/cloud.conf

expected: VPC tier gateway as the first option in the line for DNS

dhcp-option=tag:interface-eth2-0,6, VPC tier gateway, DNS1, DNS2

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 17.56%. Comparing base (e23c7ef) to head (e5d7cf2).
⚠️ Report is 5 commits behind head on 4.22.

Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #12161      +/-   ##
============================================
- Coverage     17.56%   17.56%   -0.01%     
+ Complexity    15545    15544       -1     
============================================
  Files          5910     5910              
  Lines        529123   529125       +2     
  Branches      64627    64628       +1     
============================================
- Hits          92937    92931       -6     
- Misses       425733   425740       +7     
- Partials      10453    10454       +1     
Flag Coverage Δ
uitests 3.58% <ø> (ø)
unittests 18.63% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@weizhouapache
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15857

@weizhouapache
Copy link
Member Author

@blueorangutan test

@blueorangutan
Copy link

@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@Jayd603
Copy link

Jayd603 commented Nov 28, 2025

This patch worked for me.

@weizhouapache weizhouapache marked this pull request as ready for review November 28, 2025 16:38
@weizhouapache
Copy link
Member Author

@blueorangutan test

@blueorangutan
Copy link

@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@weizhouapache weizhouapache force-pushed the 4.22-fix-vpc-rvr-dns-list branch from f5b4060 to e5d7cf2 Compare November 30, 2025 12:59
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 15869

@blueorangutan
Copy link

[SF] Trillian test result (tid-14892)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 51401 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12161-t14892-kvm-ol8.zip
Smoke tests completed. 149 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@apache apache deleted a comment from blueorangutan Dec 1, 2025
@apache apache deleted a comment from blueorangutan Dec 1, 2025
@DaanHoogland
Copy link
Contributor

@weizhouapache , does this need testing for non-vpc non-redundant routers? cc @nvazquez @Pearl1594 .

@weizhouapache
Copy link
Member Author

@weizhouapache , does this need testing for non-vpc non-redundant routers? cc @nvazquez @Pearl1594 .

yes @DaanHoogland

the change was introduced in #9102, but may be related to the Netris plugin too (#10458)
@nvazquez @Pearl1594

@Jayd603
Copy link

Jayd603 commented Dec 3, 2025

I just hit the below error trying to deploy an instance with this fix in place. 169.254.39.227 is the current BACKUP VPC router. Will try to debug further.

2025-12-03 16:52:27,797 DEBUG [c.c.a.t.Request] (AgentManager-Handler-9:[]) (logid:) Seq 28-8558246666887546921: Processing:  { Ans: , MgmtId: 90520744930075, via: 28, Ver: v1, Flags: 10, [{"com.cloud.agent.api.routing.GroupAnswer":{"results":["null - success: Creating file in VR, with ip: 169.254.39.227, file: monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d","null - failed: java.io.IOException: Stream closed

Upon second deployment attempt, it tried the PRIMARY router and also failed. hmm.

@weizhouapache
Copy link
Member Author

monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d

@Jayd603 can you run the following commands inthe VPC VR ?

cd /var/cache/cloud
cp processed/monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d.gz .
gzip -dk monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d.gz
update_config.py monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d

@Jayd603
Copy link

Jayd603 commented Dec 3, 2025

monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d

@Jayd603 can you run the following commands inthe VPC VR ?

cd /var/cache/cloud
cp processed/monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d.gz .
gzip -dk monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d.gz
update_config.py monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d

I noticed I used tabs in the python file, durr, fixed that now other errors:

(r-95-VM) Resource [Host:28] is unreachable: Host 28: Unable to start instance due to Unable to start VM:fcdb10f4-1bde-4cc0-a04f-afd20a2392f2 due to error in finalizeStart, not retrying

I fixed the .py files and attempted router reboot - totally fails now.

also - that file you pasted does not exist.

I'm re-deploying fresh routers without your patch - after that, other than modifying the .py files on each router, what else do I need to do to test this?

@Jayd603
Copy link

Jayd603 commented Dec 3, 2025

monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d

@Jayd603 can you run the following commands inthe VPC VR ?

After deploying fresh routers, i was able to apply your patch and reboot them both, then was able to deploy with password functionality. Not sure what happened but seems ok now . I'm doing more testing and will report back if I notice anything.

@weizhouapache
Copy link
Member Author

monitor_service.json.23cd5b3b-c47e-457e-bc28-9bb079d34f9d

@Jayd603 can you run the following commands inthe VPC VR ?

After deploying fresh routers, i was able to apply your patch and reboot them both, then was able to deploy with password functionality. Not sure what happened but seems ok now . I'm doing more testing and will report back if I notice anything.

good, good to know it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Redundant VPC - cloud-init can no longer retrieve passwords from VPC router password server

4 participants