Measuring estimate accuracy from GitHub history with AI
An AI-powered internal tool built by Leanware engineers to measure how accurate task time-estimates were against the actual commit history. CodiQ analyzes GitHub commits with OpenAI and surfaces over- and under-estimation patterns for the team.
- Geography
- Internal
- Stage
- Internal Leanware build
- Team
- 1 senior + 1 mid full-stack engineer, 1 product designer, 1 product owner
The situation
Every engineering team estimates task duration and every engineering team is wrong about it. The gap between estimated and actual time is data the team is sitting on, but most teams have no structured way to surface that data and learn from it. Leanware was no exception: estimates were made, work shipped, and the gap stayed invisible.
CodiQ was started as an internal build to solve that. The brief was an AI-powered tool, owned by Leanware, that pulled commit history from a project's GitHub repo, compared the time engineers actually spent against the estimate they had committed to at scoping, and surfaced the over- and under-estimation patterns clearly enough to feed into future estimates.
What we built
The engagement is internal, so the client framing reverses: Leanware is both the team and the customer. Build composition was one senior and one mid full-stack engineer, paired with a product designer and a product owner running the build like any other AI Product Engineering project.
CodiQ connects to a project's GitHub repository, pulls the commit history, and pairs it with the task-level estimates that originally scoped the work. OpenAI runs the analysis: which tasks took longer than estimated, which finished faster, and which patterns repeat across an engineer or a team.
The user journey is short: connect repository, select the scope window, run the analysis, review the AI verdict. The output is structured insight into estimation patterns that feed directly into the next round of scoping. The tool also served as a proof point during beta testing that Leanware's AI fluency is operational, not theoretical.
Tech stack: React on the frontend, Python and Django on the backend, Cloud Run for compute, Cloud Run Functions for the AI-call layer, PubSub for event handling, PostgreSQL for storage, and the OpenAI API for the analysis itself.
Outcome
-
AI analysis of GitHub commit history paired with task estimates running in beta
-
Estimation-pattern insight delivered to Leanware engineers as the first cohort of users
CodiQ landed inside the firm as a real working tool. The beta gave Leanware engineers data-driven feedback on their own estimation patterns, and the planned expansions (advanced models, customizable dashboards, multi-VCS integration) are scoped against that working baseline rather than against a green-field redesign.
Engagement line