Latest Posts

Transit Data Ingestion Platform Dev 101: Lyondle & Colosse

Fri May 29 2026

Argumentum: As far as I can remember, I have always been passionate about trains. Over the years, as my skills in computer science grew, multiple projects like MAGGALY2–implementation from scratch of the Lyon's Métro D autopilot, or Lyondle became proofs of this passion.

With the release of Métrodoku nearly 2 months ago, one of my closest friends asked me to develop a similar platform for Lyon's transit network. Therefore, on May 12th I began the development of a trilogy of mini-games now available as Lyondle.

Data Ingestion & Tasse de thé

Initially, I intended to only develop a Sudoku-like game for TCL stations. Still, even a single game required the knowledge of multiple features of stations:

  • Name
  • Identifier
  • Address
  • Geographic location
  • Connnections

Although the list is quite short, manually entering every details for each of the 160 stations is straight up loosing time. Therefore, I had to find a way to automatically ingest all stations data from some source and enrich the final datasets with flags, deductions, etc.

To achieve this, I developed an engine that would retrieve all stations and lines data from Data Grand Lyon, clean the datasets (and God knows how messy and outdated they are), and enrich them.

Still, as the datasets were outdated (the SYTRAL doesn't seem too concerned about the quality of the data they use) a lot of stations IDs or names were non-existent or wrong. So manual verification (2 min) was still needed.

You might think it would have been quicker to manually enter the stations details in an Excel file? The answer is no. In the event of data loss, stations creations, etc. I can refresh/regen the whole dataset with a single command; When, in the other case, everything was lost and had to be redone.

Providing Reliable Data

After finishing the first game, my friend and I decided I should create 2 more games. And with those games creation came the idea of standardising the access to the data of those cleaned-up and enriched datasets. (we will refer to them as datasets now)

Happily–and because I love databases, I created a small (660 lines) in-memory SQL-compatible database that reads/write the datasets and serve them. The database can be made read-only on start up to securely and publicly serve content.

This database powers all algorithms, serverless functions, games, and tools used to make Lyondle work. And, as a clin d'oeil to the SYTRAL's TITAN database, I called it Colosse.

CQL because SQL was not a fit

Although I said it was SQL-compatible, it is powered not by SQL at all but rather by a SQL-lookalike called CQL (ColosseQL) tailored for the retrieval of transit data.

The reason for this decision lays only on the fact that SQL is too vast to be integrated into such a small thing as the retrieval of stations, lines, and live data. For instance, SQL has 929 keywords while CQL only has 18 (including comparators).

CQL KeywordDesriptionExample
FROM ["gcs"/"fs"]:[PATH]Determines the database that must be loaded to execute this query. When accessing Colosse from api.lyondle.fr, this keyword is ignoredFROM gcs:db-files/db.csv
SELECT [PROP1] [COMPARATOR] [PROP2]Select all rows that matches the comparison. A property can be defined in multiple ways. First, a property can begin with int:, str:, or list: to tell the database what type should be used to read the value/property, if no type is specified, the value will be considered as a string. Second, a plain value implies the database will look for a property (int:id = int value of column id); Thus, to provide a litteral value, put the value between brackets (int:[12] = int value of 12). Select supports the following comparators: ==, <=, >=, !=, in, out.FROM gcs:db-files/db.csv SELECT name == [Perrache]
PUSH [PROP] [VALUE]Will append a value to the given property of all rows. Only use it with properties intended to sort lists.FROM gcs:db-files/db.csv SELECT int:id >= int:[30000] IF int:id <= int[30050] PUSH connections T1
SET [PROP] [VALUE]Will replace the value at the given property with the given value.FROM gcs:db-files/db.csv SELECT name == [Perrache] SET terminus true
DELETE [PROP?]Will delete all selected rows when PROP is not provided. When provided before a SELECT statement and with PROP provided, it will delete the whole column specified in PROP.FROM gcs:db-files/db.csv DELETE dglNames
IF [PROP1] [COMPARATOR] [PROP2]Exactly like SELECT, excepts it is used to narrow down a SELECT query. IFs can be chained.FROM gcs:db-files/db.csv SELECT int:id >= int:[30000] IF int:id <= int[30050]
OR [PROP1] [COMPARATOR] [PROP2]Exactly like SELECT, excepts it is used to widen up a SELECT query. ORs can be chained.FROM gcs:db-files/db.csv SELECT int:id <= int:[30000] OR int:id >= int[30050]
ADD [COLUMN]Will add a new column to the database.FROM gcs:db-files/db.csv ADD freq

Colosse is meant to soon ingest live data, stay tuned so your app or service can benefit from cleaned up and easy to retrieve live data.

Accessing stations and lines data from api.lyondle.fr

To access stations and lines data from api.lyondle.fr (read-only data), you just have to submit POST requests to https://api.lyondle.fr/cql with the following body:

{
    "target": "lines" | "stations",
    "query": "CQL query"
}

For instance, if I want to get details about the Perrache station:

{
    "target": "stations",
    "query": "SELECT name == [Perrache]"
}

I get back:

{
    "selected": [
        {
            "id": 32103,
            "name": "Perrache",
            "nameCharacteristics": [],
            "linesType": [
                "tram",
                "metro"
            ],
            "linesColor": [
                "pink",
                "blue",
                "green"
            ],
            "connections": [
                "T1",
                "MA",
                "T2"
            ],
            "stationCharacteristics": [
                "std"
            ],
            "street": "25 COURS DE VERDUN PERRACHE",
            "lon": 4.827112617066,
            "lat": 45.749567419463,
            "borough": 2,
            "city": "Lyon",
            "terminus": true,
            "cognitiveScore": 0.81
        }
    ],
    "success": true
}

Colosse & CQL benchmark

Benchmark of Colosse and CQL were done using pytest-benchmark on a device with the following relevant characteristics:

  • Intel Core i7-10750H (12 cores @ 5.00GHz)
  • 30.32GiB of DDR4 usable RAM
  • 936.34GiB in a KIOXIA KXG60ZNV1T02 SSD
  • Wired RJ-45 internet connection

Benchmark results:

Action / TestInterpretationAvg. Time to CompleteActions Handled Per Second
Init. database from local saveInstantaneous0.071ms14,113
Init. database from remote (GCS) saveSlow due to network latency75.34ms13
Processing 1,000 individual CQL read queriesFetching data via CQL query is smooth and efficient and observes no penalty linked to the size of the returned data batch27.18ms or 0.027ms/query37
Processing 1,000 individual CQL write queriesSaving data is twice as long as reading it, this is a typical behaviour.53.39ms or 0.053ms/query18

Conclusion & Next steps

This project has been really fun to develop. Now, I will focus on fixing bugs, building a community around it, and enabling reliable live transit data with Colosse through HTTP and WS.

You can try Lyondle at lyondle.fr, and start building on Colosse today!