Tiny project: where to live in London

I had an idea this afternoon for a little app to do something that I have wanted done at various times.

Namely, something to help me find a good place to live in London based on commute times to a number of locations.

For example, given your workplace, a family member’s house, a friend’s house, your housemate’s workplace etc. what is the best place to live based on average commute time.

I’m trying to get better at doing tiny projects which I can build quickly, rather than bigger projects I never finish, so I set myself a challenge to deploy something to the web in an afternoon, which at least partially solved the problem I had posed.

I had a rough idea about how it might work, and so I scribbled some notes and got going trying to prove it. Here are the notes I came up with, just to give an idea of the process (I apologise for my writing…):

Ugly UI:

Use the TFL journey planner API, split London into a grid of latitude/longitude locations, and try testing out how long it takes to get from the middle of each grid tile to the locations the user has specified:

Hopefully this will be the data we end up with, a series of tile locations in a dictionary, along with the journey times to each of our user’s locations:

And lastly, to avoid going off on a tangent, these are the things I wanted to prove I could do in some form:

In case that one is too illegible, the things I wanted to prove were:

  • Split London into a grid of tiles, defined by the latitude and longitude of the centre of the tile.
  • Calculate travel times from the centre of each grid tile to the user specified locations.
  • Sort the results based on the best average commute time and fairest split of times, and report back with the best ones.

It ended up working quite well.

First of all I played about with the Transport For London (TFL) API, and figured out that if I registered and got an API key, I could make 500 requests a minute for free. More than enough for my purposes for now.

Then I used Postman to test the API, pass it some locations, and see what form the data came back in. This enabled me to get a simple JavaScript script written which I ran with Node.JS and which allowed me to find the time taken for the shortest journey from point A to point B.

I had to remind myself how to make http requests from inside Node.JS, and ended up using axios which appears to be quite a nice http client, based around promises.

At this point, I had proved that I could pass the API a postcode, and a latitude/longitude coordinate, and get the shortest journey time between the postcode and the lat/long coordinate.

Next, I had to figure out how to split London up into a grid of lat/long coordinates. This was fairly hacky. I ended up clicking on google maps roughly where I wanted to define the North, South, East and West limits of ‘London’, and copying the values it returned as hard coded values for my program, in order to structure my grid of coordinates.

This highly fuzzy methodology eventually gave me the following piece of code:

const generateGrid = () => {
  const bottom = 51.362;
  const top = 51.616;
  const left = -0.3687;
  const right = 0.1722;

  const gridHeight = 6;
  const gridWidth = 10;

  const heightIncrement = (top - bottom) / gridHeight;
  const widthIncrement = (right - left) / gridWidth;

  const grid = [];

  let centeredPoint = [bottom, left];

  for (let i = 0; i < gridWidth; i++) {
    for (let j = 0; j < gridHeight; j++) {
      grid.push([...centeredPoint]);
      centeredPoint[0] += heightIncrement;
    }
    centeredPoint[1] += widthIncrement;
    centeredPoint[0] = bottom;
  }

  return grid;
};

The grid is wider than it is tall, because London is wider than it is tall, which I think makes sense…

So now I had an array of coordinates, representing the centres of my grid tiles spanning London, and the ability to figure out how long it takes to get from one of these coordinates to a given postcode.

I wrote some quite ugly code to loop through all the tiles, and calculate the commute times to each of the user locations, and to calculate the average of these times, and the spread of commute times (the difference between the largest and smallest time):

const getGridJourneys = async (grid, locationsToCheck) => {
  const gridJourneys = {};

  console.log("requesting");

  let withAverageAndSpread = {};

  await axios
    .all(
      grid.map(([gridLat, gridLong]) => {
        return axios.all(
          locationsToCheck.map((location) => {
            return axios
              .get(
                `https://api.tfl.gov.uk/journey/journeyresults/${gridLat},${gridLong}/to/${location}?app_id=${appId}&app_key=${appKey}`
              )
              .then((res) => {
                const key = `${gridLat},${gridLong}`;
                if (gridJourneys[key]) {
                  gridJourneys[key].push(fastestJourneyTime(res.data));
                } else {
                  gridJourneys[key] = [fastestJourneyTime(res.data)];
                }
              })
              .catch((err) => {
                console.error(err.response.config.url);
                console.error(err.response.status);
                return err;
              });
          })
        );
      })
    )
    .then(() => {
      console.log("request done");
      withAverageAndSpread = Object.keys(gridJourneys).reduce(
        (results, gridSquare) => {
          return {
            ...results,
            [gridSquare]: {
              journeys: gridJourneys[gridSquare],
              spread: gridJourneys[gridSquare].reduce(
                (prev, curr) => {
                  const newHighest = Math.max(curr, prev.highest);
                  const newLowest = Math.min(curr, prev.lowest);

                  return {
                    highest: newHighest,
                    lowest: newLowest,
                    diff: newHighest - newLowest,
                  };
                },
                {
                  lowest: Number.MAX_SAFE_INTEGER,
                  highest: 0,
                  diff: 0,
                }
              ),
              average:
                gridJourneys[gridSquare].reduce((prev, curr) => {
                  return prev + curr;
                }, 0) / gridJourneys[gridSquare].length,
            },
          };
        },
        {}
      );
    });
  return {
    gridJourneys,
    withAverageAndSpread,
  };
};

And with a bit more fiddling (and flushing out a lot of bugs), I had my program deliver me results in the following format

[
  {
    location: '51.404,-0.044',
    averageJourneyTime: 60,
    spread: 4,
    journeys: [ 58, 62 ]
  },
  {
    location: '51.446,-0.206',
    averageJourneyTime: 61,
    spread: 5,
    journeys: [ 58, 63 ]
  }
]

So now, in theory I had proved that my program worked, and after putting in some values, the answers I got from it seemed to make sense.

I was pretty happy given this had only taken an hour or so, and considered leaving it there, but I decided to spend another few hours and deploy it to the web.

I started a new Express.JS project, using their scaffolding tool, and pasted all my semi-working code into it. This took a bit of wrangling to make everything work as it did before, but wasn’t too bad.

Then I spent half an hour or so reminding myself how to use Jade/Pug to put together HTML templates, and how routing works in Express.

Eventually I ended up with two views:

A simple form

extends layout

block content
  h1 Where should I live in London
  p Add postcodes of up to 5 locations you want to be able to travel to
  form(name="submit-locations" method="get" action="get-results") 
    div.input
      span.label Location 1
      input(type="text" name="location-1")
    div.input
      span.label Location 2
      input(type="text" name="location-2")
    div.input
      span.label Location 3
      input(type="text" name="location-3")
    div.input
      span.label Location 4
      input(type="text" name="location-4")
    div.input
      span.label Location 5
      input(type="text" name="location-5")
    div.actions
      input(type="submit" value="Where should I live?")

and a results page

extends layout

block content
  h1 Results
  ul
  each item in data[0]
    li 
    div Location: #{item.location}
    div Average journey time: #{item.averageJourneyTime}
    div Journey times: #{item.journeys}
    div Journey time spread: #{item.spread}
    a(href="https://duckduckgo.com/?q=#{item.location}&va=b&t=hc&ia=web&iaxm=maps" target="_blank") Find out more

along with the routing for the results page

router.get("/get-results", async function (req, res, next) {
  const locations = Object.values(req.query)
    .filter((v) => !!v)
    .map((l) => l.replace(" ", ""));

  const results = await getResults(locations);

  if (results[0]?.length === 0) {
    res.render("error", {
      message: "Sorry that did not work. Please try again in a minute!",
      error: {},
    });
  } else {
    res.render("results", { data: results });
  }
});

For context, here is the getResults method

const getResults = async (locationsToCheck) => {
  const { gridJourneys, withAverageAndSpread } = await getGridJourneys(
    generateGrid(),
    locationsToCheck
  );

  const sorted = [
    ...Object.keys(gridJourneys).sort((a, b) => {
      if (withAverageAndSpread[a].average < withAverageAndSpread[b].average) {
        return -1;
      }
      if (withAverageAndSpread[a].average === withAverageAndSpread[b].average) {
        return 0;
      }
      if (withAverageAndSpread[a].average > withAverageAndSpread[b].average) {
        return 1;
      }
    }),
  ];

  const sortedListWithDetails = sorted.map((key) => {
    return {
      location: key
        .split(",")
        .map((i) => i.slice(0, 6))
        .join(","),
      averageJourneyTime: Math.round(withAverageAndSpread[key].average),
      spread: Math.round(withAverageAndSpread[key].spread.diff),
      journeys: withAverageAndSpread[key].journeys,
    };
  });

  return [sortedListWithDetails.slice(0, 5)];
};

I added the generic error message, because the API is rate limited, and if the rate is exceeded, the app doesn’t handle it very well…

I then used Heroku to deploy the app for free, which was, as ever a dream.

Here is a video of the app in action.

And here is the deployed app (assuming it is still up when you are reading this!)

https://where-should-i-live-london.herokuapp.com/

And here is the code.

Overall, I really enjoyed this little exercise, and while there are obviously huge improvements that could be made, it is already a better option (for me at least), than trying individual areas of London one at a time and checking CityMapper to see how long it takes to get to the places I care about.

If I come back to it, I might look into displaying the results on a map, which would allow me to clarify better that the results represent tiles, rather than a specific granular location.

I love how easy it is to quickly prototype and build things using free tooling these days, and it was really refreshing to write a server side web app instead of another single page application. I really believe that for quick proof of concept work, and prototypes, Express.JS and Heroku is a powerful combination. The code is nothing special but it is enough to prove the idea, and to get something running which can be improved upon if I want to later.

WTF is a walking skeleton (iterative design/development)

skeleton cartoon next to bare bones website

I’m going to argue that when developing software, it can be a good idea to be skeletal.

What does that mean?

It means starting with an ugly, or clunky version of your piece of software.

Not only must it be ugly, raw and unsuitable for public consumption to be skeletal, it must also be functional.

That is to say, the skeleton must walk.

Your job once you have this skeletal form, is to go back and attach all of the things that make a skeleton less horrifying (skin and the like).

I’ll give an example and some background to hopefully clarify this idea further, as I think it is an important one, that often gets overlooked.

It’s time to learn React (again)

I have a hard time convincing myself to learn stuff for the sake of it. I like to learn things that increase my understanding of the world, make my life easier, or otherwise help me to solve a problem that I have.

For a long time, this has pushed ‘learning React’ to the bottom of the pile of things I want to do with my spare time.

I am (I think) a solid JavaScript developer, and I actually used React professionally for a brief stint years ago. At that point I had no problem picking up the framework, and I was productive pretty quickly. I like React. I also like Angular.

That said, I am definitely not as productive using React as I would be using Angular, and React has moved on since I last used it, so I tend to stick with what I know.

In an ideal world I’d probably ignore React. I don’t have religious feelings about software tools or frameworks, and I don’t like learning things for the sake of it. I like building things.

I am also currently working as a contractor. This means I need to be very good at the thing I do, in order for people to justify paying a higher rate for my services.

However… the demand for Angular developers in London, where I live, is pretty low at the moment. It seems to largely be large, slow financial organisations that are using it. These are not typically qualities I look for in a workplace.

React on the other hand is booming.

So, TL;DR it’s time to get familiar with React, even though I don’t want to.

Rob’s super simple tips for how to learn technologies really fast

  • DON’T READ THE MANUAL (yet)
  • Read the quickstart
  • Start building things
  • Get stuck
  • Read the relevant part of the manual
  • Get unstuck
  • Repeat

These are the steps I try to follow when learning a new technology. I basically like to get enough knowledge to get moving, then jump in at the deep end and start building stuff.

Once I (inevitably) get stuck, I will go back and actually read the documentation, filling in the blanks in my knowledge, and answering all the many questions I have generated by trying to build things unsuccessfully.

I find that this keeps my learning tight and focussed (and interesting), and means that I don’t spend hours reading about theoretical stuff which I might not even need yet.

So, in order to carry out my learning steps, I needed something to build.

I settled on a video game curation tool, which allows users to sign in, and record their top 5 favourite games in a nice list view, along with some text saying why.

This data can then be used to show the top 5 games per platform (Switch, Xbox, PS4, PC etc.), determined by how often they appear on users’ lists.

I also wanted the ability to see other users’ top 5 lists, via shareable links.

I don’t think this is a website that is going to make me my millions, but it is complex enough to allow me to use React to actually build something.

OK, so what does this have to do with Skeletons?

Well, when I build things, I like to make skeletons first.

So in this case, a walking skeleton of my application should be able to do all of the things I outlined above.

In order for it to be a true functional skeleton, that can be iteratively improved upon, it needs to be as close to the final product as possible, and solid enough to support iterative improvements.

So it can’t be one big blob of spaghetti code which only works on my machine on Tuesdays.

I am building a web application which will persist user preferences, so it has to be:

  • deployed to the web
  • connected to a database
  • able to authenticate a user

Regardless of what I said above about diving straight in. You shouldn’t just dive straight in.

Firstly, figure out, without getting bogged down in what tech to use, what it is you want your stuff to do.

For a web app, you probably want to have some idea about the data structures/entities you are likely to use and what they will represent, and the user flow through the front end.

In this case, I knew I wanted to do something with user generated lists of favourite games, and that I wanted to store and update them.

This meant that a cheap way to get started was to come up with some data structures, and play around with them. So that’s what I did:

/**
 * We should optimise the data structures around the most common/massive/searched entities.
 * I think the reviews are likely to be the chunkiest data set as each user can make muchos reviews.
 * Platforms doesn't fucking matter as there are so few
 * Games are also potentially quite large and need to be searchable
 *
 * Reviews need to be searchable/filterable by: [ platform, game-name, username, tags, star-rating ]
 *
 * Games need to be searchable/filterable by: [ platform, name, tags, star-rating ]
 */

const platforms = {
  "uuid-222": { name: "PS4" },
  "uuid-223": { name: "switch" },
  "uuid-224": { name: "X Box One" },
  "uuid-225": { name: "PC" },
};

const includedBy = {
  [platformId]: {
    [gameId]: {
      [userId]: "user comment",
    },
  },
  "uuid-222": {
    "uuid-312": {
      robt1019: "I loved throwing coconut at ppls hedz",
    },
  },
};

let rankedPs4Games = {};

Object.keys(includedBy["uuid-222"]).forEach((gameId) => {
  const includedCount = Object.keys(includedBy["uuid-222"]["gameId"]).length;
  if (rankedPs4Games[includedCount]) {
    rankedPs4Games[includedCount.push(gameId)];
  } else {
    rankedPs4Games[includedCount] = [gameId];
  }
});

const games = {
  "uuid-313": {
    name: "Zelda Breath of the Wild",
    platforms: ["uuid-223"],
  },
  "uuid-312": {
    name: "Hitman 2",
    platforms: ["uuid-222", "uuid-223", "uuid-224", "uuid-225"],
  },
};

const users = {
  robt1019: {
    name: "Rob Taylor",
    top5: [{ gameId: "uuid-312", platformId: "uuid-222" }],
  },
  didarina: {
    name: "Didar Ekmekci",
    top5: [{ gameId: "uuid-313", platformId: "uuid-223" }],
  },
};

/**
 * Use includedBy count for aggregate views. Only viewable by platform. No aggregate view.
 */

It is important to note that these data structures have since turned out to be slightly wrong for what I want, and I have changed them… iteratively, but playing with them in this form before writing any code allowed my to iron out some nasty kinks, and ensure that I wasn’t trying to do anything that would be truly horrible from a data perspective later on.

I also spent a good hour scribbling in a note pad with some terrible drawings of different screens, to mentally go through what a user would have to do to navigate the site.

At all times we’re trying to make a solid, rough and ready skeleton that will stand up on its own, not a beautifully formed fleshy ankle that is incapable of working with any other body parts!

Be a Scientist

The main benefit of this approach, is that you are continually gathering extremely useful information, and you can very quickly prove, or disprove your hypotheses about how the application should be structured, and how it should perform.

By emphasising getting a fully fledged application up and running, deployed to the web and with all of the key functionality present, you are forced to spend your time wisely, and you take away a lot of the risk of working with a new set of tools.

What did I learn/produce in four days thanks to the Skeleton:

  • React can be deployed with one command to the web using Heroku and a community build pack.
  • How to deploy a React application to Heroku at a custom domain.
  • How to do client side routing in a modern React application.
  • The basics of React hooks for local state management.
  • How to protect specific endpoints on an API using Auth0 with Express.
  • The IGDB (Internet Games Database) is free to use for hobby projects, and is really powerful.
  • How to set up collections on Postman to make testing various APIs nice and easy.
  • A full set of skeleton React components, ready for filling in with functionality.
  • A more thought through Entity Relationship model for the different entities, and a production ready, managed MondoDB database.

If you want to see just how Skeletal the first iteration is, see here:

https://www.youtube.com/watch?v=k2uMrrVkzDk&feature=youtu.be

I didn’t get the users search view working, so that is the first thing to do this week.

After that, my product is functionally complete, so I’ll probably start on layout/styling and maybe some automated testing.

I can already breathe pretty easy though, as I know that nothing I’m trying to do is impossible, as I have already done it.

Anything from here on out is improvements, rather than core functionality.

Frontend:

https://github.com/robt1019/My-Fave-Games

Backend:

https://github.com/robt1019/My-Fave-Games-Express

Adventures in Node town (hacking Slack’s standard export with Node.js)

One benefit of changing jobs quite a lot, is that I have built up an increasingly wide network of people that I like, who I have worked with previously.

A really nice thing about staying in contact with these people is that we are able to help each other out, sharing skills, jobs, jokes etc.

Recently a designer I used to work with asked whether somebody would be able to help with writing a script to process the exported contents of his ‘question of the week’ slack channel, which by default gets spat out as a folder filled with JSON files, keyed by date:

https://slack.com/intl/en-gb/help/articles/201658943-Export-your-workspace-data

https://slack.com/intl/en-gb/help/articles/220556107-How-to-read-Slack-data-exports

My response was rapid and decisive:

Data munging and a chance to use my favourite Javascript runtime Node.js. Sign me up!!!

First, WTF is data munging

Data munging, or wrangling, is the process of taking raw data in one form, and mapping it to another, more useful form (for whatever analysis you’re doing).

https://en.wikipedia.org/wiki/Data_wrangling

Personally, I find data wrangling/munging to be pretty enjoyable.

So, as London is currently practicing social distancing because of covid-19, and I have nothing better going on, I decided to spend my Saturday applying my amateur data munging skills to Slack’s data export files.

Steps for data munging

1) Figure out the structure of the data you are investigating. If it is not structured, you are going to have trouble telling a computer how to read it. This is your chance to be a detective. What are the rules of your data? How can you exploit them to categorise your data differently?

2) Import the data into a program, using a language and runtime which allows you to manipulate it in ways which are useful.

3) Do some stuff to the data to transform it into a format that is useful to you. Use programming to do this, you programming whizz you.

4) Output the newly manipulated data into a place where it can be further processed, or analysed.

In my case, the input data was in a series of JSON files, keyed by date (see below), and the output I ideally wanted, was another JSON file with an array of questions, along with all of the responses to those questions.

Shiny tools!!!

Given that the data was in a JSON file, and I am primarily a JavaScript developer, I thought Node.js would be a good choice of tool. Why?

  • It has loads of methods for interacting with file systems in an OS agnostic way.

  • I already have some experience with it.

  • It’s lightweight and I can get a script up and hacked together and running quickly. I once had to use C# to do some heavy JSON parsing and mapping and it was a big clunky Object Oriented nightmare. Granted I’m sure I was doing lots of things wrong but it was a huge ball-ache.

  • From Wikipedia, I know that ‘Node.js is an open-source, cross-platform, JavaScript runtime environment that executes JavaScript code outside of a web browser. Node.js lets developers use JavaScript to write command line tools‘.

  • JavaScript all of the things.

So, Node.js is pretty much it for tools…

https://nodejs.org/en/

So, on to the data detective work. I knew I very likely needed to do a few things:

1) Tell the program where my import files are.

2) Gather all the data together, from all the different files, and organise it by date.

3) Identify all the questions.

4) Identify answers, and link them to the relevant question.

The first one was the easiest, so I started there:

Tell the program where my import files are



const filePath = `./${process.argv[2]}`;

if (!filePath) {
  console.error(
    "You must provide a path to the slack export folder! (unzipped)"
  );
  process.exit();
} else {
  console.log(
    `Let's have a look at \n${filePath}\nshall we.\nTry and find tasty some questions of the week...`
  );
}

To run my program, I will have to tell it where the file I’m importing is. To do that I will type this into a terminal:

node questions-of-the-week.js Triangles\ Slack\ export\ Jan\ 11\ 2017\ -\ Apr\ 3\ 2020

In this tasty little snippet, questions-of-the-week.js is the name of my script, and Triangles\ Slack\ export\ Jan\ 11\ 2017\ -\ Apr\ 3\ 2020 is the path to the file I’m importing from.

Those weird looking back slashes are ‘escape characters’, which are needed to type spaces into file names etc. when inputting them on the command line on Unix systems. My terminal emulator that I use autocompletes this stuff. I think most do now… So hopefully you won’t have to worry too much about it.

This is also the reason that many programmers habitually name files with-hyphens-or_underscores_in_them.

But basically this command is saying:

‘Use node to run the program “questions-of-the-week.js”, and pass it this filename as an argument’

What are we to do with that file name though?

Node comes with a global object called process which has a bunch of useful data and methods on it.

This means that in any Node program you can always do certain things, such as investigating arguments passed into the program, and terminating the program.

In the code sample above, we do both of those things.

For clarity, process.argv, is an array of command line arguments passed to the program. In the case of the command we put into our terminal, it looks like this:

[
  '/Users/roberttaylor/.nvm/versions/node/v12.16.1/bin/node',
  '/Users/roberttaylor/slack-export-parser/questions-of-the-week.js',
  'Triangles Slack export Jan 11 2017 - Apr 3 2020'
]

As you can see, the first two elements of the array are the location of the node binary, and the location of the file that contains our program. These will be present any time you run a node program in this way.

The third element of the array is the filename that we passed in, and in our program we stick it in a variable called filePath.

WE HAVE SUCCEEDED IN OUR FIRST TASK. CELEBRATE THIS MINOR VICTORY

Now…

Gather all the data together, from all the different files, and organise it by date

const fs = require("fs");

const slackExportFolders = fs.readdirSync(filePath);

const questionOfTheWeek = slackExportFolders.find(
  (f) => f === "question-of-the-week"
);

if (!questionOfTheWeek) {
  console.error("could not find a question-of-the-week folder");
}

const jsons = fs.readdirSync(path.join(filePath, questionOfTheWeek));

let entries = [];

jsons.forEach((file) => {
  const jsonLocation = path.join(__dirname, filePath, questionOfTheWeek, file);
  entries = [
    ...entries,
    ...require(jsonLocation).map((i) => ({ ...i, date: file.slice(0, -5) })),
  ];
});

The Slack channel I am looking at munging is the ‘question of the week’ channel.

When this is exported, it gets exported to a ‘question-of-the-week’ folder.

So first of all I check that there is a question-of-the-week folder. If there is not, I exit the program, and log an error to the console.

If the program can find it, then it gets to work gathering all of the data together.

Here we start to see the benefit of using Node.js with JSON. We are writing JavaScript, to parse a file which uses a file format which originally came from JavaScript!

This means that pulling all of this data together is as simple as getting a list of file names with fs.readdirSync.

This gets all of the names of the files under the question-of-the-week folder in an array, which is, you know, pretty useful.

Once we have those file names, we iterate through them using forEach, and pull all of the data from each file into a big array called entries. We can use require to do this, which is very cool. Again, this is because Node and JavaScript like JSON, they like it very much.

We know we are likely to need the date that the slack data is associated with, but it is in the file name, not in the data itself.

To solve this, we take the file name and put it into a ‘date’ field, which we insert into each data item, using map

the file.slice stuff is just taking a file name like this 2018-06-29.json, and chopping the end off it, so it is 2018-06-29, without the .json bit.

Coooool we done got some slack data by date. Munging step 2 complete.

Identify all the questions

This is trickier. We need our detective hats for this bit.

I won’t lie, I fucked around with this a lot, and I re-learnt something that I have learned previously, which is that it is really hard to take data that has been created by messy, illogical humans, and devise rules to figure out what is what.

What I ended up with is this. The process of figuring it out involved lots of trial and error, and I know for a fact that it misses a bunch of questions, and answers. However, it probably finds 80% to 90% of the data that is needed. This would take a human a long time to do, so is better than nothing. The remaining 10% to 20% would need to be mapped manually somehow.

const questions = entries.filter(
  (e) => e.topic && e.topic.toLowerCase().includes("qotw")
).map((q) => ({
  date: q.date,
  question: q.text,
  reactions: q.reactions ? q.reactions.map((r) => r.name) : [],
}));

‘qotw’ is ‘question of the week’ by the way, in case you missed it.

I find them by looking for slack data entries that have a topic including ‘qotw’, I then map these entries so they just include the text, date, and I also pull in the reactions (thumbs up, emojis etc.) for the lols.

Now we have an array of questions with information about when they were asked. We’re getting somewhere.

Identify answers, and link them to the relevant question

const questionsWithAnswers = questions.map((question, key) => {

  // Find the date of this question and the next one.
  // We use these to figure out which messages were sent after
  // a question was asked, and before the next one
  const questionDate = new Date(question.date);
  const nextQuestionDate = questionsWithReactions[key + 1]
    ? new Date(questionsWithReactions[key + 1].date)
    : new Date();

  return {
    ...question,
    responses: entries
      .filter(
        (e) =>
          new Date(e.date) > questionDate &&
          new Date(e.date) < nextQuestionDate &&
          e.type === "message" &&
          !e.subtype
      )
      .map((r) => ({
        answer: r.text,
        user: r.user_profile ? r.user_profile.name : undefined,
      })),
  };
});

// put them in a file. the null, 4 bit basically pretty prints the whole thing.
fs.writeFileSync(
  "questions-with-answers.json",
  JSON.stringify(questionsWithAnswers, null, 4)
);

console.log('questions with answers (hopefully...) saved to "questions-with-answers.json"');

This bit is a bit more complex… but it’s not doing anything non-standard from a JavaScript point of view.

Basically just search all the entries for messages which fall after a question being asked, and before the next one, and put them in an array of answers, with the user profile and the message text. Then save to a new JSON file and pretty print it.

We are done! We now have a new JSON file, with an array of questions, and all the answers to each question.

It is worth noting that this approach is far from optimal from an ‘algorithmic’ point of view, as I am repeatedly checking the entire data set.

Thing is, I don’t give a shit, because my dataset is small, and the program runs instantly as it is.

If it started to choke and that became a problem I would obviously improve this, but until that point, this code is simpler to understand and maintain.

More efficient algorithms normally mean nastier code for humans, and until it’s needed, as a nice developer you should prioritise humans over computers.

(sorry, computers)

What did we learn?

Slack’s data is quite nicely structured, and is very parseable.

JavaScript is great for manipulating JSON data thanks to its plethora of array manipulation utilities.

You can write a script to automatically categorise Slack export data and put it into a semi-useful state with less than 80 lines of code, including witty console output and formatting to a nice narrow width.

This confirms my suspicion that for quick and dirty data munging, Node.js is a serious contender.

If paired with TypeScript and some nice types for your data models, it could be even nicer.

Here is the result of my labours https://github.com/robt1019/slack-export-parser/blob/master/questions-of-the-week.js