Building an SDK Generator: Maintaining Custom Files

We've been dealing with a persistent problem in SDK generation: when you add custom files to a generated codebase, they get deleted during the next update. Similarly, any custom code added to generated files gets overwritten in subsequent updates.

After building API client SDK generators for dozens of companies, we kept hearing the same frustration: "Your generated SDK is great, but I need to add my own wrapper class here, or a custom error handler there, and they keep getting deleted when you regenerate."

So we built something different: a code generator that treats your custom files and code as first-class citizens. Our generator also maintains edits to the actual generated files, an even greater challenge, but that solution will be detailed in a future post. This article explains our solution that leans heavily on a macro! in the Rust programming language.

You generate a Python SDK:

my_sdk/
├── client.py            # Generated
├── models/user.py       # Generated
└── exceptions.py        # Generated

You add custom utilities:

my_sdk/
├── client.py            # Generated
├── models/user.py       # Generated
├── exceptions.py        # Generated
├── retry_wrapper.py     # Your custom retry logic
└── test_helpers.py      # Your testing utilities

The next API update comes in. You regenerate. What happens to retry_wrapper.py and test_helpers.py?

In most generators: Gone.

The Roach System: API-First Generation

We call it the "Roach" system because like cockroaches, your custom files survive everything.

This can't just be a CLI tool. The file resolution problem requires tracking what was generated previously to intelligently clean up obsolete files. But we can't force users to keep old API specs around. The workflow needs to be simple to fit nicely into a modern API development flow:

  1. Upload new spec

  2. Regenerate

  3. All customizations maintained and new logic added

So we built our code generator as a synchronous HTTP API:

First SDK generationPOST /sdk returns a complete tar of the SDK

UpdatesPOST /sdk/{id}/update → returns git patch with only the necessary changes to the API reflected

Smart File Resolution

Here's the tricky part. Your API removes the /legacy-reports endpoint. When you regenerate:

  • legacy_report.py should be deleted (no longer needed)

  • retry_wrapper.py should be kept (you added it)

But both look identical to the generator: files that exist but aren't in the current generation spec.

The solution is tracking what was generated before. We built a simple macro in Rust that defines exactly which files belong to the generated SDK structure. Not familiar with Rust macros? Learn more here.

The Roach Directory Macro

Our roach_dir! macro creates structured definitions of what files should exist in each part of the SDK:

#[macro_export]
macro_rules! roach_dir {
    (
        $struct_name:ident,
        {
            $(
                $field:ident: {
                    path: $path:expr,
                    boilerplate: $boilerplate:expr
                }
            ),* $(,)?
        }
    ) => {
        #[derive(Debug, Clone)]
        #[allow(dead_code)]
        pub struct $struct_name {
            pub root: camino::Utf8PathBuf,
            $(pub $field: $crate::RoachPath,)*
        }

        impl $crate::RoachDirTrait for $struct_name {
            fn new(root: &camino::Utf8PathBuf) -> Self {
                Self {
                    root: root.clone(),
                    $($field: $crate::RoachPath::new(&root.join($path), $boilerplate),)*
                }
            }

            fn paths(&self) -> Vec<$crate::RoachPath> {
                vec![$(self.$field.clone(),)*]
            }
        }
    };
}

This macro generates structs that track every file that should be part of the generated SDK, along with their boilerplate content.

Defining SDK Structure

We use the macro to define the complete structure of a Python SDK:

// Package-level files
roach_dir!(
    PyPkg,
    {
        pyproject: {
            path: "pyproject.toml",
            boilerplate: include_str!("./boilerplate/pkg/pyproject.toml")
        },
        gitignore: {
            path: ".gitignore",
            boilerplate: include_str!("./boilerplate/pkg/gitignore.txt")
        },
    }
);

// Source directory files
roach_dir!(
    PySrc,
    {
        init: {
            path: "__init__.py",
            boilerplate: ""
        },
        client: {
            path: "client.py",
            boilerplate: include_str!("./boilerplate/src/client.py")
        },
        environment: {
            path: "environment.py",
            boilerplate: include_str!("./boilerplate/src/environment.py")
        },
        readme: {
            path: "README.md",
            boilerplate: ""
        }
    }
);

The Resolution Algorithm

With this structure defined, we can resolve which files should be kept, updated, or deleted:

pub fn resolve_paths_boilerplate(
    sdk_paths: &[RoachPath],        // What will be generated now
    prev_sdk_paths: &[RoachPath],   // What was generated before
    git_paths: &[RoachPath],        // What exists in your repo
) ->

The core logic differentiates between generated and custom files:

let deleted_sdk_paths: Vec<_> = prev_sdk_paths
    .iter()
    .filter(|prev| !sdk_paths.iter().any(|curr| curr.path == prev.path))
    .collect();

for git_path in git_paths {
    let is_generated_now = sdk_paths.iter().any(|curr| git_path.path == curr.path);
    let was_generated_before = deleted_sdk_paths.contains(&git_path);

    if !is_generated_now && !was_generated_before {
        // Exists in Git, not being generated, wasn't generated before = custom file
        result.push(git_path.clone())
    }
}

How It Works in Practice

Each RoachPath knows both its filesystem location and its expected boilerplate content. This means we can:

  1. Track ownership: Every generated file is explicitly declared in our macro definitions

  2. Detect changes: Compare current boilerplate against what should be generated

  3. Preserve custom files: If a file exists but isn't in any roach_dir! definition, it's custom

The macro system eliminates guesswork. When your API removes the /legacy-reports endpoint, the corresponding legacy_report.py file simply won't appear in the new sdk_paths list, but it will be in prev_sdk_paths, so we know to delete it.

Meanwhile, retry_wrapper.py appears in neither list—it exists in your Git repository but was never part of our generated structure, so it gets preserved.

Real Example

You call the update endpoint with a new API spec:

curl -X

Get back a git patch:

diff --git a/client.py b/client.py
+    def create_organization(self, data: dict):
+        return self._request("POST", "/organizations", data)

diff --git a/models/organization.py b/models/organization.py
new file mode 100644
+@dataclass
+class Organization:
+    id: str
+    name: str

Apply with git apply patch.diff. Your custom files (retry_wrapper.pytest_helpers.py) are untouched. Only generated files that actually changed are in the patch.

Traditional generators force an ugly choice: accept generated code as-is, or maintain complex separation between generated and custom code.

Our API-first approach fixes both the technical problem and the workflow problem:

  • Instant feedback: Synchronous code generation, no waiting

  • Precise updates: Git patches show exactly what changed

  • Easy integration: HTTP APIs work with CI/CD and automation

  • Smart cleanup: Removes obsolete generated files, keeps your custom ones

Your teammates can confidently add files knowing they won't disappear. Your CI can call a simple endpoint to get updates. You get code generation without the usual constraints.

Maintaining Edits to Generated Files: Post Coming Soon

We will publish a post explaining how our unique approach to codegen unlocks maintaining edits to generated files.

Scale your DevEx and Simplify Integrations

Time Saved (Automation)

Automate API connections and data flows, eliminating repetitive manual coding.

Ship Cleaner Code

Production-ready, native-quality code: clean, debuggable, custom SDK structures to your standards.

Always Up-to-Date Docs

SDKs and integrations remain consistent with API and language version updates.

Time Saved (Automation)

Automate API connections and data flows, eliminating repetitive manual coding.

Ship Cleaner Code

Production-ready, native-quality code: clean, debuggable, custom SDK structures to your standards.

Always Up-to-Date Docs

SDKs and integrations remain consistent with API and language version updates.

Time Saved (Automation)

Automate API connections and data flows, eliminating repetitive manual coding.

Ship Cleaner Code

Production-ready, native-quality code: clean, debuggable, custom SDK structures to your standards.

Always Up-to-Date Docs

SDKs and integrations remain consistent with API and language version updates.

Copyright© 2025 Sideko, Inc. All Rights Reserved.

Copyright© 2025 Sideko, Inc. All Rights Reserved.

Copyright© 2025 Sideko, Inc.
All Rights Reserved.