Skip to content

Fix #134: (protobuf) proto3 packed-default and oneof field resolution#707

Open
cowtowncoder wants to merge 4 commits into
2.21from
fix-134-protobuf-proto3-packed-default-2.21
Open

Fix #134: (protobuf) proto3 packed-default and oneof field resolution#707
cowtowncoder wants to merge 4 commits into
2.21from
fix-134-protobuf-proto3-packed-default-2.21

Conversation

@cowtowncoder

@cowtowncoder cowtowncoder commented Jul 2, 2026

Copy link
Copy Markdown
Member

Summary

This started as a fix for #134 (incorrect decoding of repeated fields under proto3 syntax, as reported by @vogt31337), and grew to include two more related bugs found while investigating proto3 compatibility more broadly.

1. Proto3 "packed by default" for repeated scalar/enum fields (#134)

proto3 defaults repeated scalar/enum fields to "packed" wire encoding unless explicitly overridden with packed=false. This library only ever honored an explicit packed option on a field, defaulting to unpacked when none was present -- correct for proto2, but wrong for proto3, where that same absence means packed. As a result, content produced by any real protoc-based proto3 encoder (Go, Python, C++, etc.) that relies on the packed default failed to decode.

The fix threads the parsed .proto file's syntax (proto2 vs proto3) down through the schema-resolution pipeline: NativeProtobufSchema (captured from ProtoFile.syntax()) -> TypeResolver -> ProtobufField, which now computes the default packed value as repeated && isProto3 && type.isPackable() only when no explicit packed option is present.

Backward compatibility:

  • Schemas without an explicit syntax = "..."; statement (the common case in this repo's existing tests) get isProto3=false, i.e. unchanged behavior -- verified via ProtoFile.syntax() returning null in that case.
  • Explicit [packed=true]/[packed=false] options are unaffected and always take precedence, for both proto2 and proto3.
  • Message/string/bytes fields remain non-packable regardless of syntax (FieldType.isPackable()).
  • Reading is unaffected either way for old data: ProtobufField.isValidFor() already accepts both packed and unpacked wire types for a repeated field.
  • The one intentional behavior change: writing a repeated scalar/enum field under a proto3 schema with no explicit packed option now produces packed (spec-compliant) output instead of unpacked. No existing test relied on the old proto3 write behavior.
  • All new methods/constructors are additive; no public signatures changed.

2. oneof fields silently dropped during schema resolution

MessageElement (from com.squareup.protoparser) keeps fields declared inside a oneof block in a separate oneOfs() list from regular fields(). TypeResolver._resolve() only ever iterated fields(), so any field inside a oneof group vanished from the resolved schema -- no error, it just could never be read or written. This affects proto2 oneof too, not just proto3, but oneof is far more idiomatic in typical proto3 schemas.

Fix: merge each oneOf.fields() into the field list resolved by TypeResolver, alongside the message's regular fields. ProtobufField's constructor already treated FieldElement.Label.ONE_OF correctly (falls through to non-required/non-repeated, i.e. optional-like) once given the field -- the bug was purely that these fields never reached that constructor at all.

3. FileDescriptorSet-based schema loading crashed on proto3 descriptor sets

Found while reviewing this PR: FileDescriptorSet.FileDescriptorProto.getSyntax() called ProtoFile.Syntax.valueOf(syntax) directly on the raw descriptor string. Real FileDescriptorProto.syntax values (as written by protoc) are lowercase ("proto2"/"proto3"), but the enum constants are PROTO_2/PROTO_3 -- so valueOf("proto3") threw IllegalArgumentException for any proto3-origin binary descriptor set, meaning the FileDescriptorSet.schemaFor()/schemaForFirstType() loading path (as opposed to parsing .proto text directly) was completely broken for proto3.

Fix: map the string explicitly ("proto3" -> PROTO_3, anything else -> PROTO_2, matching the pre-existing null-handling default) instead of relying on valueOf().

Test plan

  • Proto3PackedDefault134Test: proto3 implicit-packed read, proto3 explicit packed=false override, proto2 default (unpacked) unaffected, no-syntax-declaration unaffected, and a proto3 write+read round trip.
  • OneofFieldResolutionTest: verifies all fields inside a oneof block (plus a sibling regular field) are present in the resolved schema, and round-trip correctly through write+read.
  • FileDescriptorSetProto3Test: builds FileDescriptorProto/DescriptorProto objects directly with syntax = "proto3" / "proto2" and verifies FileDescriptorSet.schemaFor() no longer crashes and produces the correct packed default for each.
  • Full protobuf module test suite passes with no regressions after each change.

…chemas

proto3 defaults repeated scalar/enum fields to "packed" wire encoding
unless explicitly overridden with packed=false; this library only ever
honored an explicit packed option (defaulting to unpacked when absent),
causing decode failures against real protoc-based proto3 producers.
Thread the parsed schema's proto2/proto3 syntax through
NativeProtobufSchema -> TypeResolver -> ProtobufField so the default
"packed" value is computed correctly per spec.
during schema resolution

Found while investigating #134 (proto3 compatibility): MessageElement
keeps `oneof`-block fields in a separate `oneOfs()` list from regular
`fields()`, and TypeResolver only ever resolved the latter, so any
field inside a `oneof` group vanished from the resolved schema with
no error -- it just could never be read or written. Merge `oneOfs()`
member fields in alongside regular fields during resolution.
@cowtowncoder cowtowncoder changed the title Fix #134: (protobuf) repeated fields incorrectly decoded for proto3 schemas Fix #134: (protobuf) proto3 packed-default and oneof field resolution Jul 2, 2026
@cowtowncoder cowtowncoder added this to the 2.21.5 milestone Jul 2, 2026
Tatu Saloranta added 2 commits July 1, 2026 21:57
…escriptor sets

Found while investigating #134 (proto3 compatibility): FileDescriptorProto.getSyntax()
called ProtoFile.Syntax.valueOf(syntax) directly on the raw descriptor string, but
protoc writes lowercase "proto2"/"proto3" while the enum constants are PROTO_2/PROTO_3.
Any proto3-origin FileDescriptorSet crashed with IllegalArgumentException instead of
loading. Map the string explicitly instead of relying on valueOf().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant