Human-assisted scientific manuscripts with LLMs

23 May 2026

git version-control prose ai deep-learning llm scientific-publishing reproducibility rigor

By popular request, I’m going to share how I wrote some recent projects and their corresponding LaTeX manuscripts in Overleaf using the Scientific Repositories approach I outlined in my previous post in this series.

Previously:

Scientific repositories

Scientific conventional commits

Paper-writing with LLMs (expect some major updates soon!)

If you’re just here for a Skill.md for your favorite LLM, you can just click through to my GitHub repository.

This approach assumes you’re using the scientific-repos pattern to organize your project, and Overleaf to collaboratively write your manuscript in LaTeX with your coauthors.

Here are a few patterns and tools I now use daily to conduct my human-machine team science without sacrificing rigor or reproducibility…and critically, without the horrible, lossy process of copy-pasting text between Overleaf and an LLM chat.¹

Make your Overleaf repository a git submodule of your scientific repository

In my Scientific Repositories repository, I recommend to you and your favorite bot that you make your overleaf repository a git submodule of your scientific repository. This allows you to keep your manuscript in sync with the rest of your project, and to use git to track changes to your manuscript alongside changes to your code and data.

To do this,

Go to your Overleaf project and click on the “Integrations” button on the side panel

Integrations side panel button in Overleaf

DON’T CLICK ON GITHUB! Instead, click on “Git” and follow the instructions to clone your Overleaf repository to your local machine. This will give you a git URL that you can use to add the Overleaf repository as a submodule to your scientific repository. It will give you a prompt like,

git clone https://git.overleaf.com/{YOUR PROJECT ID HERE}

Instead, cd to your scientific repository and run,

git submodule add https://git.overleaf.com/{YOUR PROJECT ID HERE} paper

Note the paper at the end of the command, which is the name of the subdirectory that will be created in your scientific repository to hold your manuscript.

Feed your results directly into your manuscript

We want our manuscript to be tightly coupled to our analysis; when you change your code, you want your manuscript to change with it. This is the whole point of the scientific repository approach! So instead of copy-pasting results from your analysis into your manuscript, you can feed them directly into your LaTeX document.

Here’s how we’ll do that:

Paper-ready results and numbers from your analysis (which, remember, are already saved in a results/ folder in your scientific repository) also get saved with a clear name into paper/_results.tex (or whatever you named your paper submodule). This file is a LaTeX file that defines macros for all of your results, like \newcommand{\NumberOfCellsInsideRegion}{137}.
In your LaTeX document, you can then use these macros to insert your results directly into your manuscript. For example,
```
We found that there were \NumberOfCellsInsideRegion cells inside the region of interest.
```

This way, when you change your code and re-run your analysis, your results will automatically update in your manuscript without any copy-pasting.

Convenient Byproducts

This also means that when you run your analysis, you “en-stale-ify” your manuscript automatically if your results change. So when you push a new analysis, not only does it update your figures and tables automatically per the Sci-Repos pattern; it also not only marks your manuscript as stale, but updates the text of the manuscript itself to reflect the new results.

–

Bonus points: tools for LLM editing in Overleaf directly

A caveat that rides on all of these repositories: When you publish a manuscript, you (the human author) are responsible for the contents of the manuscript. If you are not reviewing automated edits to the manuscript and approving its contents, shame on you! That arxiv will ban authors for this behavior is not only appropriate but in my opinion a bare minimum.

Despite the obvious incredible utility of LLMs for code and analysis generation, I think they are still pitiful for prose generation! Better, in my opinion, to use them as editorial tools, rather than as originators (for now!). Nevertheless, if you want your favorite autonomous agent to have access to Overleaf projects without requiring that they live inside of a parent repository, check out my tool Overphloem, which I’ve been using as a programmatic interface to Overleaf for you to build LLM tools on top of.

That’s important because the process of chatting with a generalist LLM is a major bottleneck for both productivity and reproducibility: Copy-pasting is slow; the LLM’s context goes out of scope rather than living alongside your code; and you should be using tool-enabled LLMs that can read and write files in your repository, not just a chat interface. ↩

Written on May 23, 2026

Comments? Let's chat on bsky or mastodon!