Identifying shell firms

We explore the data to identify entries that represent shell companies. The approach is to define criteria that confidently identify firms as either being regular or shell. The criteria will be applied starting from the most unequivocal, with subsequent refinements only applied to firms that have not been identified yet.

require(data.table)

## Loading required package: data.table

setwd("D:/Users/macii/Documents/Dubrovnik/")
load("APR-processed-v4.rda")  ## loads d
names(d)  ## print column names as a reference

##  [1] "year"             "EURRSD"           "deflator"        
##  [4] "id"               "capital"          "costs"           
##  [7] "empl"             "fixedAssets"      "foreignOwners"   
## [10] "interest"         "liabLT"           "liabST"          
## [13] "liabSTfin"        "loansLT"          "revenues"        
## [16] "totalAssets"      "wages"            "state.connection"
## [19] "state.owner.type" "state.share"      "state.legal.form"
## [22] "priv.method"      "reg.code"         "region"          
## [25] "district.code"    "district"         "town.code"       
## [28] "town"             "muni.code"        "muni"            
## [31] "estab.date"       "removal.date"     "sector.mod"      
## [34] "ind.code"         "trade.sector"     "trade.name"      
## [37] "legal.form"       "status"           "sitc"            
## [40] "compet.cms"       "market.growing"   "entrepreneur"    
## [43] "restructuring"    "export.rank"      "export"          
## [46] "is.sme"

setkey(d, "id")

The main criterion we use to determine if a firm listing is a shell is the number of employees. Since this number changes with time, and we assume companies do not switch between regular and shell, we will consider the highest number of employees for each firm.

setkey(d, id)  ## we will be grouping by firm ID
d.max <- d[, list(max.empl = max(empl, na.rm = TRUE)), by = id]
empl.max.trunc <- factor(pmin(d.max$max.empl, 10), labels = c(0:9, "10+"))
# make barplot and save midpoint coordinates
tt <- table(empl.max.trunc)  ## summarize by level
bplt <- barplot(tt/1000, xlab = "Highest number of employees", ylab = "Occurrences (in 000)", 
    main = paste0("Distribution of highest number of employees, N=", sum(tt)))
# get list of percentage share in each category
pctg <- paste0(round(100 * tt/sum(tt)), "%")
text(x = bplt, y = tt/1000 - 1, labels = pctg, cex = 0.8)  ## use % as labels

plot of chunk unnamed-chunk-2

More than 46 thousand companies (25%) have never had a single employee. Some may be independent professionals (shop owners, lawyers, architects), but these are prime suspects for representing shell companies.

We create a new variable that identifies firm type and set it to “Uncertain”

d[, `:=`(firm.type, "Uncertain")]  # planned levels: Regular, Shell, Uncertain

Established before 1991: Regular

Before 1991 private enterprise was heavily regulated in Serbia (then Yugoslavia). The only allowed form of private enterprise were small businesses that could employ up to X people. Entrepreneurs were mostly hairdressers, grocery shops, tailors, small farmers, etc. The only large entities were government-owned and tightly controlled. We count all firms that were established before 1991 to be regular.

# store IDs in vector
before.1991 <- unique(d$id[d$estab.date < as.Date("1991-01-01")])
d[before.1991, `:=`(firm.type, "Regular")]

Status after change:

cat("Before 1991:", length(before.1991))

## Before 1991: 8227

d[, length(unique(id)), by = firm.type]

##    firm.type     V1
## 1: Uncertain 177105
## 2:   Regular   8227

Having 10 employees or more: not shell

Shell firms do not have large numbers of employees, because these are cumbersome from a reporting/compliance point of view. We mark as regular all firms who as some point have had 10 employees or more.

# create dataset with highest number of reported employees
d.max <- d[, list(max.empl = max(empl, na.rm = TRUE)), by = id]
# store IDs in vector
many.empl <- d.max[max.empl >= 10, id]
# we can safely apply to all firms, no shells yet
d[many.empl, `:=`(firm.type, "Regular")]

Status after change:

cat("More than 9 employees:", length(many.empl))

## More than 9 employees: 24456

d[, length(unique(id)), by = firm.type]

##    firm.type     V1
## 1: Uncertain 155387
## 2:   Regular  29945

Having high profit/revenue/assets and 0 employees: shell

Firms with unreasonably high profits, revenues, or assets that have never had any employees are marked as shell companies.

# define profit
d[, `:=`(profit, revenues - costs)]
# create dataset with highest number of reported employees, and revenues,
# profits, assets
d.max <- d[, list(max.empl = max(empl, na.rm = TRUE), revenues = max(revenues, 
    na.rm = TRUE), profit = max(profit, na.rm = TRUE), assets = max(totalAssets, 
    na.rm = TRUE)), by = id]

Let's see what is the distribution of revenues, profits, and assets for small firms (always having below 5 employees) in our data. Specifically we are going to look at higher percentiles.

quantile(d.max[max.empl < 5, revenues], c(0.8, 0.9, 0.95, 0.99))

##    80%    90%    95%    99% 
##   9812  20360  38206 145347

quantile(d.max[max.empl < 5, profit], c(0.8, 0.9, 0.95, 0.99))

##   80%   90%   95%   99% 
##   631  1433  2925 11702

quantile(d.max[max.empl < 5, assets], c(0.8, 0.9, 0.95, 0.99))

##    80%    90%    95%    99% 
##   7001  16630  37258 197002

Based on the above statistics, we select upper bounds for all three measures.

mr <- 35000  ## more than EUR 300K revenue
mp <- 6000  ## more than EUR 50K profit
ma <- 50000  ## more than EUR 430K assets
# store IDs in vector
no.employees <- d.max[max.empl == 0 & (revenues > mr | profit > mp | assets > 
    ma), id]
# mark all firms currently listed as 'Uncertain'
d[id %in% no.employees & firm.type == "Uncertain", `:=`(firm.type, "Shell")]

Status after change:

cat("No employees, high performance:", length(no.employees))

## No employees, high performance: 2180

d[, length(unique(id)), by = firm.type]

##    firm.type     V1
## 1: Uncertain 153271
## 2:   Regular  29945
## 3:     Shell   2116

Having much higher revenues or assets / employee than sector: shell

It is also possible that shell companies have a small number of employees. We would like to identify firms that have unreasonably high revenues or assets per employee compared to others in the same sector. We exclude companies with no employees because the ratio is Inf. We consider a firm to be a shell company if its revenues or assets per employee are more than 3 times the 90th percentile for that sector.

d.sum <- d[, list(empl = max(empl, na.rm = T), assets = mean(totalAssets, na.rm = T), 
    revenues = mean(revenues, na.rm = T), sector = sector.mod[1]), by = id]
d.sum.nz <- d.sum[empl > 0]  ## remove firms with no employees
d.sum.nz[, `:=`(rpe, revenues/empl)]
d.sum.nz[, `:=`(ape, assets/empl)]
# get sector 90th percentiles
d.sum.nz[, `:=`(rpe.sector, quantile(rpe, probs = 0.9, na.rm = T)), by = sector]
d.sum.nz[, `:=`(ape.sector, quantile(ape, probs = 0.9, na.rm = T)), by = sector]
d.sum.nz[, `:=`(high.ape, ape > 3 * ape.sector)]
d.sum.nz[, `:=`(high.rpe, rpe > 3 * rpe.sector)]
high.value <- d.sum.nz[high.ape | high.rpe, id]
# mark all firms currently listed as 'Uncertain'
d[id %in% high.value & firm.type == "Uncertain", `:=`(firm.type, "Shell")]

Note, that due to our previous criteria, all firms with 10 or more employees have already been marked as regular.

Status after change:

cat("High revenues/profit/assets:", length(high.value))

## High revenues/profit/assets: 5285

d[, length(unique(id)), by = firm.type]

##    firm.type     V1
## 1: Uncertain 148899
## 2:   Regular  29945
## 3:     Shell   6488

Having short life and high profits/revenues: shell

Some shell firms are established only for few transactions as a way to shift assets or profits. We mark firms that have only one year of recorded activity and very high profits or revenues. We first identify the firms who have only ever reported once:

# summarize firms by number of years reported and key dates
one.year <- d[, list(years = .N, estab.year = year(estab.date[1]), removal.year = year(estab.date[1]), 
    year = year[1]), by = id]
one.year <- one.year[years == 1]  ## only keep those with a single year

Some of these firms have only one entry because of our period of observation. For this reason, we remove from this list all firms whose only year is 2005 and were not established in that year, and all those whose only year is 2012 ad were not established in 2012.

one.year <- one.year[!(year == 2005 & estab.year != 2005)]
one.year <- one.year[!(year == 2012 & removal.year != 2012), id]
length(one.year)  ## how may one-year firms are there?

## [1] 21675

We can now compute some performance measures for this firm. As in the previous section, we are looking for firms who only reported one year, but had very high revenues, or profits, or assets (compared to one-year firms in the same sector). The cutoff for “high” will be similar to the previous section: 3 times higher than the 90th percentile for the sector.

# subset the measures for one.year firms
d.one <- d[one.year, list(revenues, profit, assets = totalAssets, sector.mod)]
# define thresholds at 90%ile
d.one[, `:=`(rt, quantile(revenues, probs = 0.9)), by = sector.mod]
d.one[, `:=`(pt, quantile(profit, probs = 0.9)), by = sector.mod]
d.one[, `:=`(at, quantile(assets, probs = 0.9)), by = sector.mod]
# subset for those who meet at least one criterion
one.hit.wonder <- d.one[revenues > 3 * rt | profit > 3 * pt | assets > 3 * at, 
    id]
# change to 'Shell' one.hit.wonder IDs that are now marked 'Uncertain'
d[id %in% one.hit.wonder & firm.type == "Uncertain", `:=`(firm.type, "Shell")]

Status after change:

cat("Short life, high r/p/a:", length(one.hit.wonder))

## Short life, high r/p/a: 1779

d[, length(unique(id)), by = firm.type]

##    firm.type     V1
## 1: Uncertain 148024
## 2:   Regular  29945
## 3:     Shell   7363

Saving firm type for use in other analysis

shells <- d[, list(firm.type = firm.type[1]), by = id]
# save(shells, file='~/Dubrovnik/APR-shell-ID-v1.rda')