fix(registry/txt): require %{record_type} in prefix/suffix for TXT record management#6293
fix(registry/txt): require %{record_type} in prefix/suffix for TXT record management#6293simonoff wants to merge 2 commits intokubernetes-sigs:masterfrom
Conversation
|
Hi @simonoff. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
We've tried in past. This is not going to work. Go over issues and past PR related. There are all sort of reasons why this is not working. As well as TXT registry uses TXT records to store metadata. /hold |
|
@ivankatliarchuk you mean #5465 ? I will review. There many different issues related to TXT if you point me to specific one you mentioned it will be really helpful for me. As for us its going to be a really big issue as we create and remove DNS records contantly |
|
I could be wrong. But I did a quick search with AI https://github.com/kubernetes-sigs/external-dns/pulls?page=4&q=is%3Apr+txt+registry+txt+record+is%3Aclosed There were few other attempts. Example #3916. Maybe is possible now. Let's say if you could share evidences covering different scenario, maybe we could enable it. Example, just with different TXT scenarios #6253 (comment) |
|
Here is explanation from DeepWiki https://deepwiki.com/search/why-txt-registry-does-not-supp_6dbe3353-4786-40a0-9155-81d2695889dc?mode=deep I would say if we going to enable-it, there are going to be substantial limitations and could be some risk involved |
|
Your example aka |
|
@ivankatliarchuk ok, i will review all this mentions and will try to found a solution. As our case more about managing TXT records for validating domains for mailgun for example. I will do my best to found a good solution. |
|
@ivankatliarchuk i did a research and you mentioned that you have on your table idea to make a pluginable registry. Can I help you with it? And takes it completely? I would prefer to make them out of the tree like providers. what do you think? |
|
To be honest, before implementing a provider webhook, I think we should consider supporting a concrete provider such as etcd or PostgreSQL ( For example, in places where I’ve worked, webhooks that are not part of Kubernetes or CNCF projects are typically not allowed. There is in progress Registry CRD And what should be implemted as well
So, in this order
|
|
@ivankatliarchuk oh such info really interesting and shows the whole case to me. Yeah, the #6182 is a result of not tracking TXT records. We are doing the same and now its a big pain for us. @mloiseleur do you need a help with #5372? For us working both variants - save it in kubernetes itself or using some PostgreSQL db for it. We also made a hack for cleanup of such DNS records, but its a bit manual actions and i would like to avoid them. |
|
Fixes #6182 Ok. Maybe it's actually a bug. Documentation external-dns/docs/registry/txt.md Line 79 in ede1c09 that TXT should be excluded from ownership tracking. /unhold |
|
/ok-to-test |
|
Could you provide evidences #6253 (comment) that this fix actually working? |
|
/retitle fix(registry/txt): TXT records orphaned due to missing type in mapper supportedRecords |
Coverage Report for CI Build 24127033603Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage increased (+2.3%) to 80.527%Details
Uncovered ChangesNo uncovered changes found. Coverage Regressions452 previously-covered lines in 18 files lost coverage.
Coverage Stats
💛 - Coveralls |
|
@ivankatliarchuk can you give me a couple day while we will test a few combinations to prove it on 100%, as for now its just my finding based on debug that such records just skipped. Our current configuration is next one: |
|
Good evening. I have tested it and have found an insresting side effect. Still working on the fix and will back when it will be done. |
|
Ok, such side effect was a bug in DO webhook. I think we need more time to be sure more and I think next week will tell on 100%. @ivankatliarchuk one question. Do I need to make TXT optional in the list of supported records only when TXT added into managed list via flags? Or we can add it always? |
|
I've now idea what the risk is and how it should be implemented optionally or always. Worth to execute all sorts of tests in your environment and share configuration and results with us. |
AndrewCharlesHay
left a comment
There was a problem hiding this comment.
Reviewed this carefully — I think it's correct and worth merging. Here's what I traced through:
The bug is real and the fix is right
supportedRecords is used in two places: dropAffixExtractType (for %{record_type} template affixes) and extractRecordTypeDefaultPosition (for the default txt- positional prefix). TXT was the only common record type missing from the list. Without it, ToEndpointName("txt-foo.example.com") returns ("txt-foo.example.com", "") instead of ("foo.example.com", "TXT"), so the ownership label lookup misses and the record gets logged as "missing owner label" and abandoned.
No self-reference risk
My one concern going in: if TXT is now in supportedRecords, could ExternalDNS try to create an ownership record for its own ownership records? Traced through registry/txt/registry.go — it doesn't. In Records(), all TXT records with a valid heritage=external-dns label are consumed into labelMap and never added to the endpoints slice returned to the controller. So ownership records are invisible to the source reconciler and won't get their own ownership records created.
The corrected test was asserting the bug
The renamed test (was "suffix with txt record", now "no affix with TXT record") was asserting:
wantEndpointName: "txt-foo.example.com"
wantRecordType: ""
That's testing the broken behavior. The new expected values are correct.
One gap worth noting (not a blocker)
There are no suffix-mode tests (--txt-suffix=-%{record_type}) for the TXT case. Looking at the existing test suite, this appears to be a broader gap for other record types too, so I don't think it belongs in this PR — but worth a follow-up.
Overall: LGTM. This is a one-line fix to a genuine data loss bug (orphaned records are never cleaned up), with good test coverage and a clear paper trail back to #6281 which did the same for PTR.
AndrewCharlesHay
left a comment
There was a problem hiding this comment.
Apologies — I left an LGTM earlier without reading the full thread. After catching up, I want to revise my assessment.
There is a regression risk for default-prefix users
The fix is safe when %{record_type} is in the prefix/suffix (your config with --txt-prefix=%{record_type}-e-dns. being a good example), because each record type gets a distinct ownership TXT name — a-e-dns.foo.example.com for A records, txt-e-dns.foo.example.com for TXT records. No collision.
But with the default empty prefix, txt-foo.example.com is already in use as the ownership record for foo.example.com A (or CNAME, etc.). After this change, extractRecordTypeDefaultPosition("txt-foo.example.com") now returns ("foo.example.com", "TXT") instead of ("txt-foo.example.com", ""). That entry lands in labelMap under key {DNSName: "foo.example.com", RecordType: "TXT"}.
If there's a pre-existing unmanaged TXT record at foo.example.com (SPF, DKIM, etc.) that passes through Records() without a heritage label, it could inherit those ownership labels and ExternalDNS would start treating it as owned — potentially deleting it on the next sync.
Suggested path forward
Gating on %{record_type} in the affix would make this safe unconditionally — only add TXT to supportedRecords when recordTypeInAffix() is true. That's a mapper-internal check that doesn't require threading managedRecordTypes down to this layer. It would fix your use case without touching behavior for the much larger population of default-prefix users.
Happy to be wrong on the mechanics here if I'm misreading the flow, but I think this is worth validating before merge.
|
Ok, i have finally tested our case and fix really working. Side effects was on webhook side which we also fixed. |
c460fe7 to
f1e3e03
Compare
|
@AndrewCharlesHay I have made the managing of TXT records and txt prefix/suffix required as otherwise there no possibility to manage TXT records properly. |
f1e3e03 to
fc39f07
Compare
|
Hi @szuecs, when you have some time, could you please take a look? Is TXT support something we’re planning to add to the TXT registry? |
Yes, probably. I never find the time to finish it. |
Hi @simonoff could you share setup as described that you used for testing this fix? |
Sure. As I mentioned before we are using a next params for our external-dns instance: Our use case that we need dynamically approve mailgun integrations for our dynamic previews, which requires us to create a CNAME, MX and TXT records. For example we need to make a preview1243.dev.com, we do next via CRD: apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
name: preview1243.dev.com
namespace: preview1243
spec:
endpoints:
- dnsName: preview1243.dev.com
recordType: TXT
targets:
- v=spf1 include:mailgun.org ~all
- dnsName: mta._domainkey.preview1243.dev.com
recordType: TXT
targets:
- k=rsa; p=KEY
- dnsName: email.preview1243.dev.com
recordType: CNAME
targets:
- eu.mailgun.org
- dnsName: preview1243.dev.com
recordType: MX
targets:
- 10 mxa.eu.mailgun.org
- 10 mxb.eu.mailgun.org
- dnsName: preview1243.dev.com
recordType: A
targets:
- 1.2.3.4
- dnsName: '*.preview1243.dev.com'
recordType: A
targets:
- 1.2.3.4So its point a preview domain to our LB and creates a required records for Mailgun to validate such domain. After approval of feature we just remove such CRD and need to cleanup all such preview records. Unfortunately TXT records not cleared, which causes a lot of troubles when we need to spin up such preview again. So fix adding possibility for TXT registry to track TXT records. As a "guardrail" I have made that such possible only in cases when user chooses to manage TXT records which requires to set a txt prefix/suffix. |
|
Problem with TXT registry - it quite tricky to understand how sometimes to move forward, due to certain limitations of the TXT registry. |
Yes, I understand. Which is why i made a guards for such case. We have tested this for a couple weeks and its completele resolved our issue. After removal of CRD all dns records are cleared. And also its not touch existsing TXT recods so, takes only needed. For this required to have such profix/suffix. |
ivankatliarchuk
left a comment
There was a problem hiding this comment.
Just a heads up for other reviewers.
This is a breaking change for any existing deployment using --managed-record-types=TXT with a fixed prefix (e.g., txt-). Basically we introducing a hard startup failure for existing TXT users. I'm not sure if there are any due to multiple bugs.
As we forcing switching to record_type, renames the ownership TXT records for every record type, not just TXT. The consequence for any existing deployment:
- All existing txt.cname-foo., txt.a-foo. etc. ownership records become unrecognized garbage in the zone
- external-dns re-creates ownership records under the new names on the next cycle
- The old ownership records are never cleaned up (they don't match the mapper so they're invisible to external-dns)
This is the real migration cost the PR doesn't address correctly. The PR should probably call this out explicitly: adopting %{record_type}. prefix is a one-way migration that orphans all pre-existing ownership TXT records.
Anyone already using TXT in managed-record-types was already broken most likely (per the bug).
| _, ok = r.mapper.(mapper.AffixNameMapper) | ||
| assert.True(t, ok) | ||
|
|
||
| // TXT in managedRecordTypes requires %{record_type} in prefix or suffix |
There was a problem hiding this comment.
create multiple tests with t.Run and description
| // affixOnlyRecords are record types that can only be mapped when | ||
| // %{record_type} is present in the prefix/suffix. Without it, | ||
| // "txt-" in default position collides with TXT ownership record naming. | ||
| affixOnlyRecords = []string{ |
There was a problem hiding this comment.
affixOnlyRecords doesn't communicate why these records are separate. requiresTemplateRecords or similar maybe better explain the constraint?
|
|
||
| if a.recordTypeInAffix() { | ||
| for _, t := range supportedRecords { | ||
| for _, t := range append(supportedRecords, affixOnlyRecords...) { |
There was a problem hiding this comment.
Not too sure. Hot-path allocation is here. append(supportedRecords, affixOnlyRecords...) is called for every record lookup in dropAffixExtractType. Since supportedRecords is a package-level var without extra capacity, each call allocates a new backing array. A pre-computed package-level allMappableRecords would avoid this entirely. But I think we may going to have a memory issue here.
There was a problem hiding this comment.
Another variant checks the params not in mapper, but on higher level. In such case we can concatenate them only once.
| } | ||
|
|
||
| // When: Apply changes to recreate missing A records | ||
| managedRecords := []string{endpoint.RecordTypeA, endpoint.RecordTypeCNAME, endpoint.RecordTypeAAAA, endpoint.RecordTypeTXT} |
There was a problem hiding this comment.
silent semantic change: TXT was dropped from managedRecords to avoid the new startup error (since that test uses empty prefix/suffix). This is necessary, but it silently reduces coverage - the test no longer exercises TXT-in-managed-records paths. Should be a separate sub-test with a proper %{record_type} prefix
| // "txt-" in default position collides with TXT ownership record naming. | ||
| affixOnlyRecords = []string{ | ||
| endpoint.RecordTypeTXT, | ||
| } |
There was a problem hiding this comment.
var (
supportedRecords = []string{ ... } // existing
affixOnlyRecords = []string{ ... } // added by PR
allMappableRecords = append(supportedRecords, affixOnlyRecords...) // add this
)
^ I'm not too sure, but range append(supportedRecords, affixOnlyRecords...) , does not seems correct too
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
What does it do ?
%{record_type}in--txt-prefix/--txt-suffixwhen--managed-record-types=TXTis set, as without it, data TXT records are not removed/updatedMotivation
Without this fix, the mapper cannot reverse-map registry TXT record names
like
txt-e-dns.example.comback to their endpoint name and type.ToEndpointName()returns("", ""), ownership labels are lost, andexternal-dns logs:
Skipping endpoint example.com 0 IN TXT ... because of missing owner label
This causes managed TXT records to become orphaned — they are created
but never updated or deleted.
More