We explore the data to identify entries that represent shell companies. The approach is to define criteria that confidently identify firms as either being regular or shell. The criteria will be applied starting from the most unequivocal, with subsequent refinements only applied to firms that have not been identified yet.
require(data.table)
## Loading required package: data.table
setwd("D:/Users/macii/Documents/Dubrovnik/")
load("APR-processed-v4.rda") ## loads d
names(d) ## print column names as a reference
## [1] "year" "EURRSD" "deflator"
## [4] "id" "capital" "costs"
## [7] "empl" "fixedAssets" "foreignOwners"
## [10] "interest" "liabLT" "liabST"
## [13] "liabSTfin" "loansLT" "revenues"
## [16] "totalAssets" "wages" "state.connection"
## [19] "state.owner.type" "state.share" "state.legal.form"
## [22] "priv.method" "reg.code" "region"
## [25] "district.code" "district" "town.code"
## [28] "town" "muni.code" "muni"
## [31] "estab.date" "removal.date" "sector.mod"
## [34] "ind.code" "trade.sector" "trade.name"
## [37] "legal.form" "status" "sitc"
## [40] "compet.cms" "market.growing" "entrepreneur"
## [43] "restructuring" "export.rank" "export"
## [46] "is.sme"
setkey(d, "id")
The main criterion we use to determine if a firm listing is a shell is the number of employees. Since this number changes with time, and we assume companies do not switch between regular and shell, we will consider the highest number of employees for each firm.
setkey(d, id) ## we will be grouping by firm ID
d.max <- d[, list(max.empl = max(empl, na.rm = TRUE)), by = id]
empl.max.trunc <- factor(pmin(d.max$max.empl, 10), labels = c(0:9, "10+"))
# make barplot and save midpoint coordinates
tt <- table(empl.max.trunc) ## summarize by level
bplt <- barplot(tt/1000, xlab = "Highest number of employees", ylab = "Occurrences (in 000)",
main = paste0("Distribution of highest number of employees, N=", sum(tt)))
# get list of percentage share in each category
pctg <- paste0(round(100 * tt/sum(tt)), "%")
text(x = bplt, y = tt/1000 - 1, labels = pctg, cex = 0.8) ## use % as labels
More than 46 thousand companies (25%) have never had a single employee. Some may be independent professionals (shop owners, lawyers, architects), but these are prime suspects for representing shell companies.
We create a new variable that identifies firm type and set it to “Uncertain”
d[, `:=`(firm.type, "Uncertain")] # planned levels: Regular, Shell, Uncertain
Before 1991 private enterprise was heavily regulated in Serbia (then Yugoslavia). The only allowed form of private enterprise were small businesses that could employ up to X people. Entrepreneurs were mostly hairdressers, grocery shops, tailors, small farmers, etc. The only large entities were government-owned and tightly controlled. We count all firms that were established before 1991 to be regular.
# store IDs in vector
before.1991 <- unique(d$id[d$estab.date < as.Date("1991-01-01")])
d[before.1991, `:=`(firm.type, "Regular")]
Status after change:
cat("Before 1991:", length(before.1991))
## Before 1991: 8227
d[, length(unique(id)), by = firm.type]
## firm.type V1
## 1: Uncertain 177105
## 2: Regular 8227
Shell firms do not have large numbers of employees, because these are cumbersome from a reporting/compliance point of view. We mark as regular all firms who as some point have had 10 employees or more.
# create dataset with highest number of reported employees
d.max <- d[, list(max.empl = max(empl, na.rm = TRUE)), by = id]
# store IDs in vector
many.empl <- d.max[max.empl >= 10, id]
# we can safely apply to all firms, no shells yet
d[many.empl, `:=`(firm.type, "Regular")]
Status after change:
cat("More than 9 employees:", length(many.empl))
## More than 9 employees: 24456
d[, length(unique(id)), by = firm.type]
## firm.type V1
## 1: Uncertain 155387
## 2: Regular 29945
Firms with unreasonably high profits, revenues, or assets that have never had any employees are marked as shell companies.
# define profit
d[, `:=`(profit, revenues - costs)]
# create dataset with highest number of reported employees, and revenues,
# profits, assets
d.max <- d[, list(max.empl = max(empl, na.rm = TRUE), revenues = max(revenues,
na.rm = TRUE), profit = max(profit, na.rm = TRUE), assets = max(totalAssets,
na.rm = TRUE)), by = id]
Let's see what is the distribution of revenues, profits, and assets for small firms (always having below 5 employees) in our data. Specifically we are going to look at higher percentiles.
quantile(d.max[max.empl < 5, revenues], c(0.8, 0.9, 0.95, 0.99))
## 80% 90% 95% 99%
## 9812 20360 38206 145347
quantile(d.max[max.empl < 5, profit], c(0.8, 0.9, 0.95, 0.99))
## 80% 90% 95% 99%
## 631 1433 2925 11702
quantile(d.max[max.empl < 5, assets], c(0.8, 0.9, 0.95, 0.99))
## 80% 90% 95% 99%
## 7001 16630 37258 197002
Based on the above statistics, we select upper bounds for all three measures.
mr <- 35000 ## more than EUR 300K revenue
mp <- 6000 ## more than EUR 50K profit
ma <- 50000 ## more than EUR 430K assets
# store IDs in vector
no.employees <- d.max[max.empl == 0 & (revenues > mr | profit > mp | assets >
ma), id]
# mark all firms currently listed as 'Uncertain'
d[id %in% no.employees & firm.type == "Uncertain", `:=`(firm.type, "Shell")]
Status after change:
cat("No employees, high performance:", length(no.employees))
## No employees, high performance: 2180
d[, length(unique(id)), by = firm.type]
## firm.type V1
## 1: Uncertain 153271
## 2: Regular 29945
## 3: Shell 2116
It is also possible that shell companies have a small number of employees.
We would like to identify firms that have unreasonably high revenues or assets
per employee compared to others in the same sector. We exclude companies with
no employees because the ratio is Inf
. We consider a firm to be a shell
company if its revenues or assets per employee are more than 3 times the 90th
percentile for that sector.
d.sum <- d[, list(empl = max(empl, na.rm = T), assets = mean(totalAssets, na.rm = T),
revenues = mean(revenues, na.rm = T), sector = sector.mod[1]), by = id]
d.sum.nz <- d.sum[empl > 0] ## remove firms with no employees
d.sum.nz[, `:=`(rpe, revenues/empl)]
d.sum.nz[, `:=`(ape, assets/empl)]
# get sector 90th percentiles
d.sum.nz[, `:=`(rpe.sector, quantile(rpe, probs = 0.9, na.rm = T)), by = sector]
d.sum.nz[, `:=`(ape.sector, quantile(ape, probs = 0.9, na.rm = T)), by = sector]
d.sum.nz[, `:=`(high.ape, ape > 3 * ape.sector)]
d.sum.nz[, `:=`(high.rpe, rpe > 3 * rpe.sector)]
high.value <- d.sum.nz[high.ape | high.rpe, id]
# mark all firms currently listed as 'Uncertain'
d[id %in% high.value & firm.type == "Uncertain", `:=`(firm.type, "Shell")]
Note, that due to our previous criteria, all firms with 10 or more employees have already been marked as regular.
Status after change:
cat("High revenues/profit/assets:", length(high.value))
## High revenues/profit/assets: 5285
d[, length(unique(id)), by = firm.type]
## firm.type V1
## 1: Uncertain 148899
## 2: Regular 29945
## 3: Shell 6488
Some shell firms are established only for few transactions as a way to shift assets or profits. We mark firms that have only one year of recorded activity and very high profits or revenues. We first identify the firms who have only ever reported once:
# summarize firms by number of years reported and key dates
one.year <- d[, list(years = .N, estab.year = year(estab.date[1]), removal.year = year(estab.date[1]),
year = year[1]), by = id]
one.year <- one.year[years == 1] ## only keep those with a single year
Some of these firms have only one entry because of our period of observation. For this reason, we remove from this list all firms whose only year is 2005 and were not established in that year, and all those whose only year is 2012 ad were not established in 2012.
one.year <- one.year[!(year == 2005 & estab.year != 2005)]
one.year <- one.year[!(year == 2012 & removal.year != 2012), id]
length(one.year) ## how may one-year firms are there?
## [1] 21675
We can now compute some performance measures for this firm. As in the previous section, we are looking for firms who only reported one year, but had very high revenues, or profits, or assets (compared to one-year firms in the same sector). The cutoff for “high” will be similar to the previous section: 3 times higher than the 90th percentile for the sector.
# subset the measures for one.year firms
d.one <- d[one.year, list(revenues, profit, assets = totalAssets, sector.mod)]
# define thresholds at 90%ile
d.one[, `:=`(rt, quantile(revenues, probs = 0.9)), by = sector.mod]
d.one[, `:=`(pt, quantile(profit, probs = 0.9)), by = sector.mod]
d.one[, `:=`(at, quantile(assets, probs = 0.9)), by = sector.mod]
# subset for those who meet at least one criterion
one.hit.wonder <- d.one[revenues > 3 * rt | profit > 3 * pt | assets > 3 * at,
id]
# change to 'Shell' one.hit.wonder IDs that are now marked 'Uncertain'
d[id %in% one.hit.wonder & firm.type == "Uncertain", `:=`(firm.type, "Shell")]
Status after change:
cat("Short life, high r/p/a:", length(one.hit.wonder))
## Short life, high r/p/a: 1779
d[, length(unique(id)), by = firm.type]
## firm.type V1
## 1: Uncertain 148024
## 2: Regular 29945
## 3: Shell 7363
shells <- d[, list(firm.type = firm.type[1]), by = id]
# save(shells, file='~/Dubrovnik/APR-shell-ID-v1.rda')