I Built a Whole AI Grading App Because My Bio Professor Asked a Simple Question

TL, DR: My biology professor casually asked if anyone knew a smart way to grade research proposals using AI. While most people thought about prompting ChatGPT, I spent a few days building a full agent-based grading system for our class. It graded proposals with a rubric, clustered them by topic, evaluated feasibility, edited the writing, and even emailed grades and feedback automatically. I called it 201ai. Slightly unhinged, but it worked.

December 16, 2025

It started with a simple question in class.

My biology professor asked whether anyone knew a smart way to grade research proposals using AI. He meant it in a very reasonable way. Maybe use ChatGPT. Maybe automate part of the process. Just something to make grading less painful.

My brain immediately went somewhere else.

It was break, I had time, and instead of suggesting prompts or tools, I decided to build an entire grading system and agent-based app for our class. Not a demo. Not a proof of concept. Something that could actually be used by everyone.

I called it 201ai.

At first glance, 201ai looks like a clean AI tool with a nice interface. You could easily assume it is just another wrapper on top of a language model. That is what most people would build, and honestly, that is what I could have built too.

But the real problem was not how to ask an AI to grade a proposal. The real problem was that grading is a workflow, not a single prompt. Proposals need to be evaluated consistently, compared to each other, checked for feasibility, and turned into feedback that students can actually act on. Professors also do not need more text. They need structure, clarity, and less admin work.

So I designed 201ai around the workflow instead of the chat.

Each proposal is graded against a defined rubric, with a clear score breakdown and reasoning tied directly to the content. The system does not just output a number. It explains why that number makes sense. To make proposals easier to compare, 201ai automatically generates standardized abstracts so that ideas can be scanned and grouped without reading every document line by line.

One of the most useful parts ended up being clustering. The agent groups proposals by topic, similarity, and shared themes. This gives a quick overview of what the class is actually interested in and where ideas overlap. Instead of seeing proposals in isolation, you can see the entire landscape of the class in one view.

Beyond grading and grouping, 201ai also evaluates feasibility. It flags proposals that sound interesting but are unrealistic for the course timeline, lack a clear method, miss key materials, or raise ethical or logistical issues. This was important to me because good ideas are not always executable ideas, especially in an academic setting.

Then I went a step further.

Instead of just giving feedback, the system can apply it. It can rewrite sections, improve clarity, tighten scientific language, restructure methods, and strengthen hypotheses. Think of it as Cursor, but specifically designed for research proposals. The goal was not just to critique students’ work, but to help them improve it in a concrete way.

To close the loop, 201ai can automatically email students their grade and feedback if an email is detected. That includes the rubric breakdown and, if desired, an improved version of the proposal. Because grading does not end when the grade is decided. The follow-up and communication are often the most time-consuming part.

None of this was required. It was not an assignment. Nobody told me to do this. But I like moments like this, when a simple question exposes a deeper problem. Most people hear a question like that and think about tools. I tend to think about systems.

My professor was talking about prompting ChatGPT. I saw an opportunity to build something that actually saves time, improves grading quality, and respects both the professor’s and the students’ time. I also knew that nobody else in the class was going to do this. Not because they are not capable, but because building something like this requires a mix of technical depth and a slightly unhealthy level of obsession.

The funny part is how disproportionate the outcome was compared to the question. It started with “Does anyone know a smart way to grade using AI?” and ended with a multi-agent grading platform that clusters proposals, checks feasibility, edits writing, and emails students automatically.

Looking back, the biggest takeaway for me is that AI is not impressive when it just generates text. It becomes impressive when it removes friction from real workflows. The best projects are not the flashiest ones. They are the ones that quietly make people’s lives easier and actually get used.

Sometimes being a tryhard is not a flaw. It is just energy that needs the right problem.