Exploring data from Companies House

Exploring data from Companies House

This video is an introduction to getting data from Companies House. It shows the steps of how you can get company details using both the web interface and the API.

In it, I discuss the limitations of the API, particularly when it comes to searching and retrieving non-standard details on multiple companies.  More or less these are the steps I needed to take when creating the Tech Companies in Edinburgh visualisation for my talk at the Edinburgh.js.

Tools used:

  • Python with CSV, HTTPX, and JSON libraries (see below for code)
  • cURL
  • LibreOffice Calc

Notes

companies.py

import httpx
import json
import csv
import time

with open('Companies-House-search-results.csv', newline='') as f:
  reader = csv.reader(f)
  data = list(reader)

for row in data:
  company_number = row[1]
  print(company_number)

auth_token="insert-your-key"

officer_results = []
for row in data:
  company_number = row[1]
  print(company_number)
  response = httpx.get("https://api.company-information.service.gov.uk/company/%s/officers" % company_number, auth=(auth_token, ""))
  result = json.loads(response.content)
  result["company_number"] = company_number # result doesnt have the number, we will save it here
  officer_results.append(result)
  time.sleep(2.1)

with open('details.csv', 'w', newline='') as file:
  writer = csv.writer(file)
  writer.writerow(["company_number", "active_officers", "sole_directorship"])
  for result in officer_results:
    resigned_count = result.get("resigned_count", 0)
    active_count   = result.get("active_count", 0)
    sole_directorship = (resigned_count == 0 and active_count == 1)
    writer.writerow([result.get("company_number", ""), result.get("active_count", ""), sole_directorship])

Links

Show Comments