Skip to content

fix(registry/txt): require %{record_type} in prefix/suffix for TXT record management#6293

Open
simonoff wants to merge 2 commits intokubernetes-sigs:masterfrom
amoniacou:txt-ownership-fix
Open

fix(registry/txt): require %{record_type} in prefix/suffix for TXT record management#6293
simonoff wants to merge 2 commits intokubernetes-sigs:masterfrom
amoniacou:txt-ownership-fix

Conversation

@simonoff
Copy link
Copy Markdown
Contributor

@simonoff simonoff commented Mar 18, 2026

What does it do ?

  • Require %{record_type} in --txt-prefix/--txt-suffix when --managed-record-types=TXT is set, as without it, data TXT records are not removed/updated
  • Add TXT to mapper's supportedRecords so ownership records for TXT endpoints can be resolved back to endpoint names
  • Document the requirement in registry and source docs

Motivation

Without this fix, the mapper cannot reverse-map registry TXT record names
like txt-e-dns.example.com back to their endpoint name and type.
ToEndpointName() returns ("", ""), ownership labels are lost, and
external-dns logs:

Skipping endpoint example.com 0 IN TXT ... because of missing owner label

This causes managed TXT records to become orphaned — they are created
but never updated or deleted.

More

  • [x ] Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 18, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @simonoff. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. registry Issues or PRs related to a registry labels Mar 18, 2026
@ivankatliarchuk
Copy link
Copy Markdown
Member

We've tried in past. This is not going to work.

Go over issues and past PR related. There are all sort of reasons why this is not working. As well as TXT registry uses TXT records to store metadata.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2026
@simonoff
Copy link
Copy Markdown
Contributor Author

@ivankatliarchuk you mean #5465 ? I will review. There many different issues related to TXT if you point me to specific one you mentioned it will be really helpful for me. As for us its going to be a really big issue as we create and remove DNS records contantly

@ivankatliarchuk
Copy link
Copy Markdown
Member

I could be wrong. But I did a quick search with AI https://github.com/kubernetes-sigs/external-dns/pulls?page=4&q=is%3Apr+txt+registry+txt+record+is%3Aclosed

There were few other attempts. Example #3916. Maybe is possible now. Let's say if you could share evidences covering different scenario, maybe we could enable it. Example, just with different TXT scenarios #6253 (comment)

@ivankatliarchuk
Copy link
Copy Markdown
Member

Here is explanation from DeepWiki https://deepwiki.com/search/why-txt-registry-does-not-supp_6dbe3353-4786-40a0-9155-81d2695889dc?mode=deep

I would say if we going to enable-it, there are going to be substantial limitations and could be some risk involved

@ivankatliarchuk
Copy link
Copy Markdown
Member

ivankatliarchuk commented Mar 18, 2026

Your example aka txt-e-dns.example.com is related to apex zone handling. This is another long standing problem

@simonoff
Copy link
Copy Markdown
Contributor Author

@ivankatliarchuk ok, i will review all this mentions and will try to found a solution. As our case more about managing TXT records for validating domains for mailgun for example. I will do my best to found a good solution.

@simonoff
Copy link
Copy Markdown
Contributor Author

@ivankatliarchuk i did a research and you mentioned that you have on your table idea to make a pluginable registry. Can I help you with it? And takes it completely? I would prefer to make them out of the tree like providers. what do you think?

@ivankatliarchuk
Copy link
Copy Markdown
Member

ivankatliarchuk commented Mar 18, 2026

To be honest, before implementing a provider webhook, I think we should consider supporting a concrete provider such as etcd or PostgreSQL (#2).

For example, in places where I’ve worked, webhooks that are not part of Kubernetes or CNCF projects are typically not allowed.

There is in progress Registry CRD #1 #5372

And what should be implemted as well #3

So, in this order

  • #1
  • #2
  • #3
  • pluggable provider

@simonoff
Copy link
Copy Markdown
Contributor Author

@ivankatliarchuk oh such info really interesting and shows the whole case to me.

Yeah, the #6182 is a result of not tracking TXT records. We are doing the same and now its a big pain for us.

@mloiseleur do you need a help with #5372?

For us working both variants - save it in kubernetes itself or using some PostgreSQL db for it. We also made a hack for cleanup of such DNS records, but its a bit manual actions and i would like to avoid them.

@ivankatliarchuk
Copy link
Copy Markdown
Member

Fixes #6182

Ok. Maybe it's actually a bug. Documentation

The `--txt-new-format-only` flag should be used in addition to your existing external-dns configuration flags. It does not implicitly configure TXT record handling - you still need to specify `--managed-record-types=TXT` if you want external-dns to manage TXT records.
explicitly show --managed-record-types=TXT as a supported use case, and confirm that external-dns is supposed to manage TXT records. No indication
that TXT should be excluded from ownership tracking.

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 18, 2026
@ivankatliarchuk
Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 18, 2026
@ivankatliarchuk
Copy link
Copy Markdown
Member

Could you provide evidences #6253 (comment) that this fix actually working?

@ivankatliarchuk
Copy link
Copy Markdown
Member

/retitle fix(registry/txt): TXT records orphaned due to missing type in mapper supportedRecords

@k8s-ci-robot k8s-ci-robot changed the title feat: add TXT to registry mapper fix(registry/txt): TXT records orphaned due to missing type in mapper supportedRecords Mar 18, 2026
@coveralls
Copy link
Copy Markdown

coveralls commented Mar 18, 2026

Coverage Report for CI Build 24127033603

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage increased (+2.3%) to 80.527%

Details

  • Coverage increased (+2.3%) from the base build.
  • Patch coverage: No coverable lines changed in this PR.
  • 452 coverage regressions across 18 files.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

452 previously-covered lines in 18 files lost coverage.

Top 10 Files by Coverage Loss Lines Losing Coverage Coverage
gateway.go 79 86.53%
store.go 76 65.08%
rfc2136/rfc2136.go 75 61.34%
azure/azure.go 55 76.0%
istio_virtualservice.go 27 87.88%
testutils/metrics.go 27 0.0%
execute.go 22 76.64%
pdns/pdns.go 19 68.75%
azure/azure_private_dns.go 16 75.87%
istio_gateway.go 13 90.36%

Coverage Stats

Coverage Status
Relevant Lines: 21312
Covered Lines: 17162
Line Coverage: 80.53%
Coverage Strength: 1471.73 hits per line

💛 - Coveralls

@simonoff
Copy link
Copy Markdown
Contributor Author

simonoff commented Mar 18, 2026

@ivankatliarchuk can you give me a couple day while we will test a few combinations to prove it on 100%, as for now its just my finding based on debug that such records just skipped.

Our current configuration is next one:

    - --log-level=debug
    - --log-format=text
    - --interval=1m
    - --source=ingress
    - --source=crd
    - --policy=sync
    - --registry=txt
    - --txt-owner-id=hetzner
    - --txt-prefix=%{record_type}-e-dns.
    - --domain-filter=domain.tld
    - --managed-record-types=A
    - --managed-record-types=CNAME
    - --managed-record-types=TXT
    - --managed-record-types=MX
    - --provider=webhook
    - --ingress-class=nginx-ws
    - --txt-cache-interval=0s
    - --txt-wildcard-replacement=w

@simonoff
Copy link
Copy Markdown
Contributor Author

Good evening. I have tested it and have found an insresting side effect. Still working on the fix and will back when it will be done.

@simonoff
Copy link
Copy Markdown
Contributor Author

Ok, such side effect was a bug in DO webhook. I think we need more time to be sure more and I think next week will tell on 100%.

@ivankatliarchuk one question. Do I need to make TXT optional in the list of supported records only when TXT added into managed list via flags? Or we can add it always?

@ivankatliarchuk
Copy link
Copy Markdown
Member

I've now idea what the risk is and how it should be implemented optionally or always. Worth to execute all sorts of tests in your environment and share configuration and results with us.

Copy link
Copy Markdown
Contributor

@AndrewCharlesHay AndrewCharlesHay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed this carefully — I think it's correct and worth merging. Here's what I traced through:

The bug is real and the fix is right

supportedRecords is used in two places: dropAffixExtractType (for %{record_type} template affixes) and extractRecordTypeDefaultPosition (for the default txt- positional prefix). TXT was the only common record type missing from the list. Without it, ToEndpointName("txt-foo.example.com") returns ("txt-foo.example.com", "") instead of ("foo.example.com", "TXT"), so the ownership label lookup misses and the record gets logged as "missing owner label" and abandoned.

No self-reference risk

My one concern going in: if TXT is now in supportedRecords, could ExternalDNS try to create an ownership record for its own ownership records? Traced through registry/txt/registry.go — it doesn't. In Records(), all TXT records with a valid heritage=external-dns label are consumed into labelMap and never added to the endpoints slice returned to the controller. So ownership records are invisible to the source reconciler and won't get their own ownership records created.

The corrected test was asserting the bug

The renamed test (was "suffix with txt record", now "no affix with TXT record") was asserting:

wantEndpointName: "txt-foo.example.com"
wantRecordType:   ""

That's testing the broken behavior. The new expected values are correct.

One gap worth noting (not a blocker)

There are no suffix-mode tests (--txt-suffix=-%{record_type}) for the TXT case. Looking at the existing test suite, this appears to be a broader gap for other record types too, so I don't think it belongs in this PR — but worth a follow-up.

Overall: LGTM. This is a one-line fix to a genuine data loss bug (orphaned records are never cleaned up), with good test coverage and a clear paper trail back to #6281 which did the same for PTR.

Copy link
Copy Markdown
Contributor

@AndrewCharlesHay AndrewCharlesHay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies — I left an LGTM earlier without reading the full thread. After catching up, I want to revise my assessment.

There is a regression risk for default-prefix users

The fix is safe when %{record_type} is in the prefix/suffix (your config with --txt-prefix=%{record_type}-e-dns. being a good example), because each record type gets a distinct ownership TXT name — a-e-dns.foo.example.com for A records, txt-e-dns.foo.example.com for TXT records. No collision.

But with the default empty prefix, txt-foo.example.com is already in use as the ownership record for foo.example.com A (or CNAME, etc.). After this change, extractRecordTypeDefaultPosition("txt-foo.example.com") now returns ("foo.example.com", "TXT") instead of ("txt-foo.example.com", ""). That entry lands in labelMap under key {DNSName: "foo.example.com", RecordType: "TXT"}.

If there's a pre-existing unmanaged TXT record at foo.example.com (SPF, DKIM, etc.) that passes through Records() without a heritage label, it could inherit those ownership labels and ExternalDNS would start treating it as owned — potentially deleting it on the next sync.

Suggested path forward

Gating on %{record_type} in the affix would make this safe unconditionally — only add TXT to supportedRecords when recordTypeInAffix() is true. That's a mapper-internal check that doesn't require threading managedRecordTypes down to this layer. It would fix your use case without touching behavior for the much larger population of default-prefix users.

Happy to be wrong on the mechanics here if I'm misreading the flow, but I think this is worth validating before merge.

@simonoff
Copy link
Copy Markdown
Contributor Author

Ok, i have finally tested our case and fix really working. Side effects was on webhook side which we also fixed.
@AndrewCharlesHay sure i will add such chnages today. Thank you!

@simonoff simonoff force-pushed the txt-ownership-fix branch from c460fe7 to f1e3e03 Compare March 26, 2026 07:14
@k8s-ci-robot k8s-ci-robot added docs size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 26, 2026
@simonoff simonoff changed the title fix(registry/txt): TXT records orphaned due to missing type in mapper supportedRecords fix(registry/txt): require %{record_type} in prefix/suffix for TXT record management Mar 26, 2026
@simonoff
Copy link
Copy Markdown
Contributor Author

@AndrewCharlesHay I have made the managing of TXT records and txt prefix/suffix required as otherwise there no possibility to manage TXT records properly.

@simonoff simonoff force-pushed the txt-ownership-fix branch from f1e3e03 to fc39f07 Compare March 26, 2026 07:20
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 26, 2026
@ivankatliarchuk
Copy link
Copy Markdown
Member

Hi @szuecs, when you have some time, could you please take a look? Is TXT support something we’re planning to add to the TXT registry?

@mloiseleur
Copy link
Copy Markdown
Collaborator

@mloiseleur do you need a help with #5372?

Yes, probably. I never find the time to finish it.

@ivankatliarchuk
Copy link
Copy Markdown
Member

Could you provide evidences #6253 (comment) that this fix actually working?

Hi @simonoff could you share setup as described that you used for testing this fix?

@simonoff
Copy link
Copy Markdown
Contributor Author

Could you provide evidences #6253 (comment) that this fix actually working?

Hi @simonoff could you share setup as described that you used for testing this fix?

Sure.

As I mentioned before we are using a next params for our external-dns instance:

    - --log-level=debug
    - --log-format=text
    - --interval=1m
    - --source=ingress
    - --source=crd
    - --policy=sync
    - --registry=txt
    - --txt-owner-id=hetzner
    - --txt-prefix=%{record_type}-e-dns.
    - --domain-filter=dev.com
    - --managed-record-types=A
    - --managed-record-types=CNAME
    - --managed-record-types=TXT
    - --managed-record-types=MX
    - --provider=webhook
    - --ingress-class=nginx-ws
    - --txt-cache-interval=0s
    - --txt-wildcard-replacement=w

Our use case that we need dynamically approve mailgun integrations for our dynamic previews, which requires us to create a CNAME, MX and TXT records.

For example we need to make a preview1243.dev.com, we do next via CRD:

apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
  name: preview1243.dev.com
  namespace: preview1243
spec:
  endpoints:
  - dnsName: preview1243.dev.com
    recordType: TXT
    targets:
    - v=spf1 include:mailgun.org ~all
  - dnsName: mta._domainkey.preview1243.dev.com
    recordType: TXT
    targets:
    - k=rsa; p=KEY  
  - dnsName: email.preview1243.dev.com
    recordType: CNAME
    targets:
    - eu.mailgun.org
  - dnsName: preview1243.dev.com
    recordType: MX
    targets:
    - 10 mxa.eu.mailgun.org
    - 10 mxb.eu.mailgun.org
  - dnsName: preview1243.dev.com
    recordType: A
    targets:
    - 1.2.3.4
  - dnsName: '*.preview1243.dev.com'
    recordType: A
    targets:
    - 1.2.3.4

So its point a preview domain to our LB and creates a required records for Mailgun to validate such domain.

After approval of feature we just remove such CRD and need to cleanup all such preview records. Unfortunately TXT records not cleared, which causes a lot of troubles when we need to spin up such preview again.

So fix adding possibility for TXT registry to track TXT records. As a "guardrail" I have made that such possible only in cases when user chooses to manage TXT records which requires to set a txt prefix/suffix.

@ivankatliarchuk
Copy link
Copy Markdown
Member

Problem with TXT registry - it quite tricky to understand how sometimes to move forward, due to certain limitations of the TXT registry.

@simonoff
Copy link
Copy Markdown
Contributor Author

Problem with TXT registry - it quite tricky to understand how sometimes to move forward, due to certain limitations of the TXT registry.

Yes, I understand. Which is why i made a guards for such case. We have tested this for a couple weeks and its completele resolved our issue. After removal of CRD all dns records are cleared. And also its not touch existsing TXT recods so, takes only needed. For this required to have such profix/suffix.

Copy link
Copy Markdown
Member

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up for other reviewers.

This is a breaking change for any existing deployment using --managed-record-types=TXT with a fixed prefix (e.g., txt-). Basically we introducing a hard startup failure for existing TXT users. I'm not sure if there are any due to multiple bugs.

As we forcing switching to record_type, renames the ownership TXT records for every record type, not just TXT. The consequence for any existing deployment:

  1. All existing txt.cname-foo., txt.a-foo. etc. ownership records become unrecognized garbage in the zone
  2. external-dns re-creates ownership records under the new names on the next cycle
  3. The old ownership records are never cleaned up (they don't match the mapper so they're invisible to external-dns)

This is the real migration cost the PR doesn't address correctly. The PR should probably call this out explicitly: adopting %{record_type}. prefix is a one-way migration that orphans all pre-existing ownership TXT records.

Anyone already using TXT in managed-record-types was already broken most likely (per the bug).

Comment thread registry/txt/registry_test.go Outdated
_, ok = r.mapper.(mapper.AffixNameMapper)
assert.True(t, ok)

// TXT in managedRecordTypes requires %{record_type} in prefix or suffix
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create multiple tests with t.Run and description

Comment thread registry/mapper/mapper.go Outdated
// affixOnlyRecords are record types that can only be mapped when
// %{record_type} is present in the prefix/suffix. Without it,
// "txt-" in default position collides with TXT ownership record naming.
affixOnlyRecords = []string{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

affixOnlyRecords doesn't communicate why these records are separate. requiresTemplateRecords or similar maybe better explain the constraint?

Comment thread registry/mapper/mapper.go Outdated

if a.recordTypeInAffix() {
for _, t := range supportedRecords {
for _, t := range append(supportedRecords, affixOnlyRecords...) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure. Hot-path allocation is here. append(supportedRecords, affixOnlyRecords...) is called for every record lookup in dropAffixExtractType. Since supportedRecords is a package-level var without extra capacity, each call allocates a new backing array. A pre-computed package-level allMappableRecords would avoid this entirely. But I think we may going to have a memory issue here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another variant checks the params not in mapper, but on higher level. In such case we can concatenate them only once.

}

// When: Apply changes to recreate missing A records
managedRecords := []string{endpoint.RecordTypeA, endpoint.RecordTypeCNAME, endpoint.RecordTypeAAAA, endpoint.RecordTypeTXT}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silent semantic change: TXT was dropped from managedRecords to avoid the new startup error (since that test uses empty prefix/suffix). This is necessary, but it silently reduces coverage - the test no longer exercises TXT-in-managed-records paths. Should be a separate sub-test with a proper %{record_type} prefix

Comment thread registry/mapper/mapper.go
// "txt-" in default position collides with TXT ownership record naming.
affixOnlyRecords = []string{
endpoint.RecordTypeTXT,
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var (
      supportedRecords = []string{ ... }  // existing
      affixOnlyRecords = []string{ ... }  // added by PR
      allMappableRecords = append(supportedRecords, affixOnlyRecords...)  // add this
  )

^ I'm not too sure, but range append(supportedRecords, affixOnlyRecords...) , does not seems correct too

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ivankatliarchuk. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. docs ok-to-test Indicates a non-member PR verified by an org member that is safe to test. registry Issues or PRs related to a registry size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants