Baris
Soybas

Engineer at heart — Data Scientist with Operations Research background with three years experience as inhouse consultant owning data-driven solutions with stakeholders up to the C-level

Building machine learning models, LLM solutions, and the full-stack web apps that deliver them. End to end, from raw data to productionsame person across the arc

  1. 01

    Problem Analysis & Requirements

    What problem, for whom, against what success metric. Whatever isn't agreed here gets argued about in the demo — so I keep this phase as long as it needs to be.

  2. 02

    Defining Roadmap

    Work broken into milestones, each ending in a deliverable I can demo. Estimates account for integration delays — three years in regulated finance taught me they're the rule, not the exception.

  3. 03

    Data, System & Concept Design

    Structure before substance — decide the shape of the data, the system, and the deliverable before producing any of it. The cost of getting the foundation wrong is always higher than the cost of debating it for a day, and code review isn't the place to have that debate.

  4. 04

    Build & Deployment

    Iterate in demoable increments — quality is a habit from the first commit, not a phase at the end. Deployment stays reproducible and automated, so shipping isn't a hero exercise.

  5. 05

    Hand-off

    A clean hand-off so the work survives me — documented, reproducible, and owned by the team that inherits it.

Data Scientist, ARAG SE

Consulting · ML & LLM · Full-Stack · Project Management
  • Built secure RESTful APIs with FastAPI, SQLAlchemy, Pydantic and Alembic migrations, backed by SQL Server.
  • Designed Next.js frontends with OAuth flows, integrating LLM-powered interfaces and interactive visualizations delivered up to the C-level.
  • Deployed ML models to forecast costs & premiums for management outlooks — supported by EDA with pandas, NumPy and matplotlib to cluster transactions and detect duplicates across creditor / debtor data.
  • Prototyped a Text-to-SQL chatbot with LangChain, RAG and vector databases — natural-language querying of internal data across 20+ users.
  • Defined branching, testing & CI/CD strategies for dockerized apps across three Azure Kubernetes environments (dev, int, prod) under regulatory compliance.
  • Set the team's technical stack and coding standards — adopted across 7 projects.
  • Owned end-to-end delivery of 3 projects as technical lead — coordinating business and technical stakeholders from scoping through deployment.
  • Mentored working students and junior team members; conducted interviews for technical roles.

M.Sc. Industrial Engineering, RWTH Aachen

Specialized in Operations Research & Management · Product Development

ThesisImplementation of artificial intelligence in assistance systems for the digitalization of production processes at the Institute of Textile Technology, RWTH with Heimbach GmbH, Düren.

A web tool for production-line operators at a textile manufacturer to browse the catalog of known defects and their fixes or describe a new problem and get the most likely causes from past reports.

B.Sc. Industrial Engineering, RWTH Aachen

Specialized in Design Technology

Internship — Implementation of Intralogistics 4.0 in manufacturing at Bosch Rexroth AG, Stuttgart.

Helped digitalize in-plant material flow and built a plug-and-play automation script for end-of-line testing of the produced ActiveShuttle — removing the manual per-unit setup.

ThesisIntegration of a 3.3 MW generator into the 850 kW drivetrain of a wind turbine, at the Chair for Wind Power Drives, RWTH.

Worked out the requirements, created and compared several design concepts, modeled in CAD and verified via calculations & simulations its strength under the test loads.

Languages

  • Python primary
  • TypeScript · JavaScript
  • SQL

Frameworks

  • FastAPI backend
  • React · Next.js frontend

Data & ML

  • pandas · NumPy data handling
  • scikit-learn · MLflow machine learning
  • matplotlib · plotly visualisation
  • LangChain · vector DBs LLM
  • SQLAlchemy · Alembic database
  • Pyomo · Gurobi optimisation

Infra & Cloud

  • SQLite · Jupyter prototype
  • Linux · git · Docker development
  • GitHub Actions · pytest · ruff · pyright ci/cd
  • Azure — Databricks, OpenAI, AKS, SQL Serverproduction
ARAG SE

Landing Page

User management · Authentication · Authorisation · Internal portal

A single entry point that brings the team's dashboards and internal applications under one roof — pairing a curated landing page with centralised authorisation and user management.

P&L Commentary

Reporting · Data engineering · LLM application

Generates analyst commentary on individual Profit & Loss positions (GuV) by feeding curated financial data into an Azure OpenAI model — turning line-item numbers into draft narratives ready for review.

Reserve Analytics

Visualisation · Acturial dashboard · Data engineering

Computes statutory claims and equalisation reserves (Schadenreserve & Schwankungsrückstellung) for German insurance reporting, paired with an interactive dashboard that surfaces the underlying drivers and year-over-year fluctuations.

Cost Allocation Assistant

RAG · Vector databases · Text-to-SQL · Chatbot

A LangChain prototype served as an internal API endpoint that answers natural-language questions two ways; a Text-to-SQL path that queries internal business data directly and a RAG path that retrieves from text-based internal knowledge packed into vector databases.

Personal

Running Dinner

Mixed Integer Problem · Optimisation · Network

An optimizer for a running dinner — given a group of friends and their hosting preferences for appetizer, main or dessert, it assigns who hosts each course so the group's total walking distance across the city stays minimal under specified constraints.

Progrista UI

Component Library · Design System

A React component library published as an npm package — Radix UI primitives styled with Tailwind v4, built with tsup and documented in Storybook.

Meta Backend (WIP)

FastAPI · PostgreSQL · Redis · S3

A FastAPI backend to power my future services — GitHub OAuth authentication, async SQLAlchemy over PostgreSQL, Redis caching and S3/MinIO storage, with Alembic migrations and Docker / Kubernetes deployment.

Football Betting Model

Poisson · Statistics · Web scraping

A Poisson goal-scoring model that predicts football match outcomes — scraping five seasons of results across Europe's top-five leagues, deriving home/away attack and defence strengths, then comparing the model's scoreline probabilities against published odds to surface positive expected-value bets.

Trading Tax Calculator

PDF parsing · Data Handling

Turns broker statements into German tax reports — parses PDF and CSV exports, converts every USD trade to EUR using ECB historical reference rates, then aggregates realized gains, losses, dividends and fees per month and year for filing.

Investing Experiments

Forecasting · LSTM · Backtesting

A cluster of personal market experiments based on yfinance data; stock analysis with indicators like RSI, MACD, Bollinger Bands and ML price forecasting, an LSTM deep-learning price predictor, an ETF ROI backtester over custom date ranges and a compound-interest projector for recurring monthly investments.

Pokémon Analytics

Data analysis · Linear Programming

An optimisation problem to find the optimal six-unit team combined with move data — searching across 9 generations, 1,025 Pokémon and 937 moves — plus analyses that surface curiosities.

Academic

KEA Assistant

NLP · Assistance Systems · Industry4.0

A web tool built for production-line operators to browse known fabric defects with reference images, plus a free-text German search where an MLP classifier over TF-IDF features and text embeddings predicts the most likely root causes from historical defect-cause records, served from PostgreSQL.

Angle-Distance TSP

Machine Learning · Data Analytics

Predicting the objective-function cost of an Angle-Distance Traveling Salesman variant — where edge cost depends on the turning angle at each vertex, not just on distance. ML models trained to estimate route cost without solving the full optimization.

Practical Optimization

Mathematical modeling · Linear programming · Column & Row generation

Six mixed-integer optimization problems modeled & solved in Gurobi — bin packing with conflicts, hospital network design, political districting via column generation, longest-path knapsack, demand-side management and university timetabling.