StatBank API
Table of contents
IntroductionConsole
General introduction
Function
Language
Format
Tables and subject
Metadata and data
Change log
Introduction
With Statistics Denmark’s API for StatBank it is possible to get access to all the published data in StatBank.
Using Statistics Denmark’s API for Statbank Denmark, you can access all published data in Statbank Denmark programmatically. As a user, you can use Statistics Denmark’s open data directly in various own applications. You can access Statbank Denmark data either via a user interface or via an API. You can make free use of data in the development of various services, e.g. the development of smartphone apps.
StatHost: Get your own data bank
If you want to convey statistical information through your own website, you can make use of our StatHost solution.
Read more about prices and possibilities of getting a StatHost solution
Statistikbanken.dk gives you an overview of the open data that Statistics Denmark makes available from all areas for which we produce statistics. Data can be downloaded in table format, graphics, maps, a number of data formats or as bulk downloads – depending on whether you use the user interface or the API entry point. In Statbank Denmark, you always get the most recently updated data. You can use all statistical data at Statistikbanken.dk and the web page dst.dk free of charge for commercial as well as non-commercial purposes, as long as you include a source reference. This corresponds to the licence named Creative Commons, CC 4.0 BY.
Console
We have made a console available whereby one can try out the API.
The console is a manual tool for programmers who want to use the API. The tool helps the programmer in terms of finding out how the API is called.
The console is in Danish, if the browser has that as preferred language, otherwise in English.
General introduction
Queries are sent to the preferred functions’ URL and the necessary information is posted as a JSON-object.
As a rule, all communication must be UTF-8 encoded.
The API can respond in respectively JSON or XML. Regarding DATA call, other return formats apply.
It is possible to provide parameters in the URL, but it is recommended to POST data. When supplying parameters in the URL (GET), the information must be URL encoded. For examples, see the console.
The base address for the API is https://api.statbank.dk.
Communicate using TLS 1.2
All requests made to the API must use TLS 1.2 or higher (Transport Layer Security). Most newer browsers and programming languages does this by default.
However, in some cases the code base needs to be updated to a later version of the programming language.
For example, the .NET Framework 4.5 does not use TLS 1.2 per default and the following line of code needs to be inserted prior to the request:
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
For the .NET Framework 4.0 it may be necessary to change this to:
ServicePointManager.SecurityProtocol = (SecurityProtocolType)3072;
Some users will use Visual Studio along with SQL Server Integration Services to fetch data. This can cause TLS issues even if .NET 4.7 is used. A script task needs to be inserted in the relevant Integration Services package in Visual Studio (e.g. Sql Server Data Tools) with the above mentioned code line prior to execution of a "Data Flow Task".
If Delphi/Pascal is used, the solution can be something like this::
uses IdHTTP, IdGlobal, IdSSLOpenSSL; ... begin
FHTTP := TIdHTTP.Create;
FIOHandler := TIdSSLIOHandlerSocketOpenSSL.Create;
FIOHandler.SSLOptions.SSLVersions := [sslvTLSv1_2];
FHTTP.IOHandler := FIOHandler;
...
end;
Function
Version number and name of the desired function must always be provided:
http://api.statbank.dk/v1/subjects
http://api.statbank.dk/v1/tables
http://api.statbank.dk/v1/tableinfo
http://api.statbank.dk/v1/data
Language
From the outset, the API will respond in Danish. For every call, the language can be stated with a parameter:
{ "lang": "en" }
Either ”da” for Danish or ”en” for English can be stated.
In some cases the API will return error messages in both Danish and English if the error has resulted in lacking determination of language.
Format
For all other functions than DATA, the formats JSON or XML can be stated:
{ "format": "JSON" }
Tables and subjects
There are two ways of finding tables via the API: the functions SUBJECTS and TABLES.
SUBJECTS
The function SUBJECTS gives access to the hierarchy of subjects that contains all the tables. The hierarchy can be queried one or several levels at a time.
If the function is called without parameters, the upper level in the hierarchy will be returned. Afterwards, these subjects’ subtopics can be found by calling SUBJECTS once again with statement of the desired subjects etc..
The following function specific parameters can be provided:
{ "subjects": [ "02", "2401" ], "includeTables": true, "recursive": true,
"omitInactiveSubjects": true }
subjects: Subjects for which subtopics are wanted.
includeTables: Whether the result should contain tables. Otherwise, only subjects are returned and tables can subsequently be retrieved with the function TABLES.
recursive: Whether subtopics/tables will be retrieved all the way down the hierarchy. Otherwise, only the closest level under the provided subjects will be retrieved.
omitInactiveSubjects: Whether subtopics/tables will be omitted if they or a parent subtopic is no longer updated.
TABLES
The function TABLES return tables that are criteria filtered. If no criteria are specified, all tables are returned.
The following function specific parameters can be provided:
{ "subjects": [ "02", "2401" ], "pastdays": 1, "includeinactive": true }
subjects: Subjects from which tables will be returned. Subject codes are retrieved with the function SUBJECTS.
pastdays: Number of days backwards, for which to return updated tables.
includeInactive: Whether to return tables that no longer are updated.
Metadata and data
A table consists of metadata and data.
Metadata is information about e.g. the table’s title and the unit of data. For certain data formats, e.g. CSV, this information is only available through metadata. Therefore, the information has to be read in connection with every data request.
In terms of other formats such as Excel or chart (PNG), the data response itself contains information about e.g. title and unit. Yet, it might be beneficial to read metadata in order to see e.g. a possible note for data.
TABLEINFO
The function TABLEINFO returns metadata for a table. The answer contains information about variables and their codes. Use these codes when data is requested with the function DATA.
Table must always be stated:
{ "table": "folk1c" }
DATA
The function DATA returns data from a table.
The API tries to eliminate the variables for which values have not been selected. If region is not selected in the table FOLK1C, for instance, data will be returned for the entire country.
In case that elimination is not desired for certain variables, the variable by its code must be provided along with the codes for the desired values. Codes for variables and values are found via the function TABLEINFO.
For instance, the table FOLK1C’s variable “country of origin” has the code “ieland”, and the value “Denmark” for this variable has the code “5100”. If data is desired for those observations where the country of origin is Denmark, the following must be provided:
{ "table": "folk1c", "format": "CSV", "variables": [ { "code": "ieland", "values": [ "5100" ] } ] }
”*” can be used as wildcard for value codes, indicating that ”*” means all values and e.g. ”*K1” means all values with a code ending with “K1”.
Some tables contain variables that cannot be eliminated automatically. Values always have to be selected for these variables. Data formats that are streamed, do not allow elimination of variables.
A total is stated as e.g. “sum(0-4;5-9;10-14;15-19;20-24)”, which will return a sum for the stated codes. A text can be added to a sum: "sum(0-24 year=0-4;5-9;10-14;15-19;20-24)" as well as a code: "sum(0_24|0-24 year=0-4;5-9;10-14;15-19;20-24)". If the same request is used to retrieve data in different languages, it can be useful to add specific texts for every language: "sum(0_24|[da 0-24 år][en 0-24 years]=0-4;5-9;10-14;15-19;20-24)".
Other methods for calculation include multiply, divide and subtract. They work in the same way as sum with the exception that they need precisely two codes, e.g. "subtract(15-19;10-14)" which for the table folk1c will return the difference between 15-19-year-olds and 10-14-year-olds. The other methods follow the same pattern.
Data formats that are streamed, do not currently allow for calculations as mentioned above.
Sequences of codes can be supplied by stating a before/equals/after condition. E.g. the period first quarter of 2010 to fourth quarter 2015 can be stated as ">=2010K1<=2015K4". The condition can be stated with or without "=" and may be combined with other selections for the same variable. Please notice that each selection does not affect other selections. Therefore ">=2010K1" and "<=2015K4" as seperate selections will result in all codes from and including 2010K1 and also all codes until and including 2015K4, which is actually all of the variables codes.
Periods are returned as distinct and sorted (either increasingly or as stated in timeOrder), regardless of the order and duplicates in the request. Periods can – as with other variables’ values – be stated by code and possibly include the wildcard ”*”. Moreover, nth rules can be used for statement of periods. Nth rules always have to be encircled with brackets. +/- placed in front or first/last specifies the desired sequence of the periods. The three latest periods can e.g. be stated (-n +3). Three oldest periods -(-n+3), newest period (1), second newest period (2), second oldest period -(2) etc. Codes and rules can be mixed in a comma separated list. The nth rule works as the css selector described at http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#nth-child-pseudo.
If a period is not stated, data will be returned for the latest period in the table, corresponding to the statement “(1)”.
There are the following function specific parameters. “table” and “format” is mandatory.
{
"table": "folk1c",
"format": "CSV",
"valuePresentation": "Default",
"timeOrder": "Ascending",
"variables": [
{
"code": "OMRÅDE",
"placement": "stub",
"values": [
"000",
"185",
"791",
"787"
]
},
{
"code": "KØN",
"values": [
"*"
]
},
{
"code": "Tid",
"values": [
"2010k2",
"(1)"
]
}
]
}
table: Table that data are retrieved from.
format: Desired format for return of data. See the paragraph Data formats bellow.
valuePresentation: How text is shown. The parameter can be omitted or one of the following values can be stated:
Default, Code, Value, CodeAndValue or ValueAndCode
timeOrder: The sequence for periods stated as ascending or descending.
variables: Collection of variable objects with statement of its code and the codes for the desired values. In the example, data are retrieved for the areas: All Denmark (000), Tårnby (185), Viborg (791) and Thisted (787), for both sexes (*) for the period second quarter 2010 (2010K2) and the latest period ((1)). Moreover, it is stated that the variable “OMRÅDE” is to be placed in the table’s stub (stated as head or stub).
Special calculations
A few special calculations is implemented, namely percent and pro mille calculations.
Below is a description of all the settings for these operations:
- Operation is OPTIONAL. The default is Percent and can also be set to Permille.
- SelectedVariableCode is SEMI-OPTIONAL. It MUST be defined if more than one variable has been chosen in the independent field Variables. A code must be defined for the variable the calculation will be based on, e.g. KØN.
- AddAsNewValue is OPTIONAL. The default is False. If set to True an extra variable will be added that defines the content type (e.g. if it is a specific number or percentage a value refers to).
- Below is the OPTIONAL settings to be used if AddAsNewValue is set to True:
- NewValueName is OPTIONAL. If nothing is defined the value "Procent" or "Promille" is selected in Danish and "Percent" or "Per Mille" in English. This will be the value of the new content type variable. If defined the value will be used as the content type. The original value will automatically be defined by the API (e.g. Number)
- NewValuePlacement is OPTIONAL. If nothing is defined the content type variable will be placed in Head (the other value is Stub). Has no consequence in e.g. CSV files, but can be important to define in the format HTML.
- ContentTypeCode is OPTIONAL. If nothing is defined the content type variable heading will be "IndholdsType" in Danish and "ContentType" in English.
- BasisValueForCalculation is OPTIONAL. If nothing is defined then percent or per mille will be distributed on all values. If a value has been chosen then the distribution will be calculated with the value being 100 per cent (or 1000 per mille).
Data formats that are streamed, do not allow for these special calculations.
An example of all the settings in use:
{
"table": "FOLK1A",
"format": "CSV",
"variables": [
{
"code": "KØN",
"values": [
"*"
]
},
{
"code": "CIVILSTAND",
"values": [
"*"
]
}
],
"valueTransformationSettings": {
"operation": "percent",
"selectedVariableCode": "KØN",
"addAsNewValue": "true",
"newValueName": "Percentage",
"newValuePlacement": "Head",
"contentTypeCode": "Content_Type",
"basisValueForCalculation": "TOT"
}
}
The following data formats can be selected for the DATA function:
PX : PC-Axis (ANSI)
CSV: Semicolon separated
JSONSTAT: JSON-stat
DSTML: StatBank's XML-format
PNG: Graph as picture (see description below)
AREMOS: AREMOS
The following are streaming formats, which are described below:
BULK: Semicolon separated file
SDMXCOMPACT: SDMX-ML Compact
SDMXGENERIC: SDMX-ML Generic
The following are presentation formats for which small changes may be implemented in the production environment, even though the system does not change version number.
XLSX: Excel
HTML: HTML
Special data formats
PNG
Data can be returned as a chart. For this format a number of parameters can be stated, which determine how the chart will be presented in the returned image file.
{ "table": "folk1c", "format": "PNG", "variables": [ { "code": "køn", "values": [ "1",
"2"
] } ], "chartsettings": { "type": "column", "width": "900", "height": "400", "showlegend": "true", "showtitle": "true", "fontsize": "13", "sort": "false", "autopivot": "true", "labelsangle": "0",
"titlecolor": "ff000000",
"seriescolors": [ "00ff00", "0000ff" ], "valuecolors": [ "2ff0000" ] } }
In the example, data is collected from the table FOLK1C. The values for Men (1) and Woman (2) are chosen for the variable sex (KØN). In addition, a number of settings about the chart’s presentation are stated.
type: Chart type for presentation of data. The following values can be stated:
Line, Column, StackedColumn, StackColumn100, Bar, StackedBar,
StackedBar100, Area, StackedArea, StackedArea100, Pie, ColumnAndLines, Population.
width: The image’s width in pixels.
height: The image’s height in pixels.
When changing the size, not only scaling of the image will take place. The chart’s layout itself is determined by the available space.
showlegend: Whether the image must contain a box with specification of series texts.
showtitle: Whether the image must contain the table’s title and subtitle.
fontsize: Font sixe.
sort: Whether the observations must be sorted.
autopivot: Whether the variable with most values must be removed to the label axis.
labelsangle: Angle of texts on the label axis.
seriescolors: Collection of colours for series. Colours are specified as e.g. “ff0000” for red. In the example above, green and blue are specified as the first colours that will be used for series.
valuecolors: Collection of colours for specific values. Specified by code followed by colour. In the example above, it is specified that observations with the code (women) will be displayed in red.
STREAMING FORMATS
If many values are chosen for a table, some extracts may be too large to handle at the server. Some extracts may contain billions of cells. A limit for the number of observations possible to extract for non-streaming formats has been implemented which can vary from time to time. In the update log the current limit will be mentioned when changed. By using streaming formats this limitation is not in effect.
Since dataset of this size easily can be used in e.g. databases, we have decided to make this output type available.
The result is different from the other formats as regards several important aspects:
- Some observations are omitted. If a combination of values is missing, the observation for this combination is 0 (the number zero). If the combination is present, but the number is missing, it either means that the observation is missing, it is confidential or it is too insecure to be stated.
- Variables cannot be omitted (eliminated) and the rows (the values) are not necessarily sorted in the same order as prescribed in the query.
Limitations for extraction of data
Through the years the usage of the API has grown substantially. Therefore we have introduced a limitation on the amount of retrieved cells for regular download formats where all data is retrieved at once. This limitation does not apply to the so-called "bulk" formats where data is streamed in smaller portions with no upper limit.
The maximum number of cells retrieved using "non-bulk" (e.g. CSV, px etc.) has been set to 1,000,000 cells, but can be changed in the future based on user feedback and needs.
The number of cells for a query is calculated based on a simple formula: The maximum number of rows to be extracted multiplied with the number of chosen fields where time and the observation value both are mandatory fields. An observation consists of its value and a number of variables such as the time period and other describing parts (e.g. area or gender). Each part of the observation is a cell itself.
Below is a random example with five variables, including the mandatory time variable that is always retrieved and the value itself (5,825,337). This one row therefore consists of six columns which in total sums up to six cells when number of rows and columns are multiplied.
OMRÅDE |
KØN |
ALDER |
CIVILSTAND |
TID |
INDHOLD |
All Denmark |
Total |
Total |
Total |
2020Q3 |
5,825,337 |
Some queries will not contain the calculated maximum cell count, the reason being missing observations for periods, sums or likewise, but this is not accounted for in the calculation.
This basic algorithm will presumably continuously be tweaked.
Change log
We continuously develop the source code in order to improve and optimize the API. API version number will only be increased when changes will break existing use of API.
Some changes which apply to the usage of the API will – in this period – be added to the following list.
May 22th 2024
- Version 1.5.3: Small upgrade to the technology stack.
March 10th 2023
- Version 1.5.1: Minor changes in the output engine.
August 10th 2022
- Version 1.5: The PX format now contains the optional key VARIABLECODE. This provides a mapping for each variable name to its corresponding variable code. An example: The table STRAFNA3 has a variable with the code IELAND which formerly only was used by its name, "country of origin", in the PX data. The following is added for this variable in the revised version of the PX format: VARIABLECODE("country of origin")="IELAND".
- Version 1.4.2 implements a small change to the interpretation of saved queries.
June 20th 2022
- Version 1.4.1 implements certain performance improvements.
April 26th 2022
- Version number has been implemented, starting with 1.4 due to a number of previous publications since the first edition.
- The API has been prepared for a new "next generation" format for saved queries which is used in conjunction with the "Saved" API method.
- Three calculation methods added: Multiply, divide and subtract. These new methods are called in the same way as the sum method, but for now only two codes can be handled for any of the functions. Read more in the DATA section.
- Data can now be extracted in per cent or per mille using a number of settings. Read more in the DATA section.
- New formats added: HTML5 and HTML5InclNotes. Both return data as HTML, but the latter includes optional notes on the table. The HTML code are different than the API format HTML due to better compliance with web standards.
September 16th 2021
- New functionality for the SUBJECT method added: omitInactiveSubjects can be used to omit subject areas that are no longer active.
- As of this day, we organise our statistics in a new way. Among other things, this means that the subject codes and texts in the Statbank will change, which can affect the use of the API. Tables, and requests towards the tables is not affected.
May 27th 2021
- Adjusted upper limit for retrieved cells to 1,000,000 for standard data formats (number of rows multiplied with number of columns consisting of the variables and the value). Streaming formats are not influenced by this.
June 30th 2020
- Adjusted upper limit for retrieved cells to 500,000 for standard data formats (number of rows multiplied with number of columns consisting of the variables and the value). Streaming formats are not influenced by this.
June 25th 2020
- New version that limits large downloads of more than 160,000 data cells/observations on standard "non-streaming" data formats. If using the so-called streaming formats, i.e. BULK or SDMXCOMPACT/SDMXGENERIC, this limit is not in effect.
December 10th 2019
- The API does not support TLS 1.1 or lower when requests are made to the API. If this causes problems, e.g. with clients using the .NET framework, then the code base must be updated to .NET framework 4.6 or later. Alternatively, the following statement needs to be inserted prior to a request to force the client to use TLS 1.2:: ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;.
June 2019
- All HTTP requests are redirected to HTTPS.
- Now official release - no more BETA.
January 2018
- When extracting data in CSV, content column now always has the code INDHOLD.
- Property "active" added to TABLEINFO. The value indicates whether the table is still expected to be updated.
- Documentation links now with https.
December 2017
- Contact persons returned as collection.
- Property "suppressedDataValue" added to TABLEINFO. Value is to be used for missing row observations in streaming formats.
- Observations for stored rows with missing values returned as ".." instead of "".
- Possibility for supplying a sequence of codes by before/equals/after condition.
October 2017
- HTTPS now supported. Please use right away. Later on we will 301-redirect all HTTP requests, in which case a POST request body will be lost.
- Now possible to allow multiple variables in head of CSV-output
- Now possible to let value presentation override codes in column names of CSV-output
- Bug fixed where third party component failed on dataset with exactly one timeperiod, where it's code contained "-"
- Changed in backend regarding caching of data
September 2017
- Elimination value can no longer be part of a sum
- Sum values can be delimited by comma
- Console use own jQuery, so it can be used in a closed system opened for that specific site
June 2017
- Headers trimmed for special characters to improve usage in other systems
- Correct transfer of chartsettings when using GET request
March 2017
- Improved CORS
- Charts limited to a maximum of 50 series
January 2017
- Added returned status code and headers to console
- Added contact information to TABLEINFO
October 2016
- Added information about documentation of statistics to TABLEINFO
June 2016
- An error when extracting data in JSONSTAT, when multiple variables has geo-information is fixed
April 2016
- Added information on geographic map for variable (map) to TABLEINFO
March 2016
- Added information on subject status (active) to SUBJECTS
December 2015
- Improved error messages for errors when using saved queries
October 2015
- Corrections for content variable in JSONSTAT
- Corrections for selection of valuepresentation
Juli 2015
- An error where time code was presented in English when extracted in Danish is fixed
- Now possible to eliminate a variable by one of its values
Juni 2015
- Presentation text as value followed by code no longe available
April 2015
- The data format in the functions SUBJECTS and TABLES does now correspond to the format in TABLEINFO
- Restrictions for sums for certain tables, variables and method of measurement (stock/flow etc.)
- Comma can be used in summation texts
March 2015
- Incorporations of sums in data extract
November 2014
- Statements of times with nth rules (something resembling http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#nth-child-pseudo)
- AREMOS format is removed
October 2014
- Column headline is added for the formats CSV and BULK
- The format BULK is able to show text now. If codes are desired, choose explicitly valuePresentation=code