Exploratory data analysis of SME in APR data

Here we look at some summary statistics for Small and Medium Enterprises from the APR data to form an idea of its contents.

require(data.table)

## Loading required package: data.table

setwd("~/Dubrovnik/")
load("APR-processed-stage1-v3.rda")  ## load d, the main data
setkey(d, id)
names(d)  ## list of columns available

##  [1] "year"             "deflator"         "id"              
##  [4] "capital"          "costs"            "empl"            
##  [7] "fixedAssets"      "foreignOwners"    "interest"        
## [10] "liabLT"           "liabST"           "liabSTfin"       
## [13] "loansLT"          "revenues"         "totalAssets"     
## [16] "wages"            "state.connection" "state.owner.type"
## [19] "state.share"      "state.legal.form" "priv.method"     
## [22] "reg.code"         "region"           "district.code"   
## [25] "district"         "town.code"        "town"            
## [28] "muni.code"        "muni"             "estab.date"      
## [31] "removal.date"     "sector.mod"       "ind.code"        
## [34] "trade.sector"     "trade.name"       "legal.form"      
## [37] "status"           "sitc"             "compet.cms"      
## [40] "market.growing"   "entrepreneur"     "restructuring"   
## [43] "export.rank"      "export"

Some companies have never reported a single year during our 8-year observation period. We remove them.

sum(is.na(d$year))  ## how many are there?

## [1] 3399

d <- d[!is.na(year)]  ## only keep those with non-NA year

We are only interested in SMEs, which have a clear definition in Serbian law. However, since Serbia is planning to align it's definition with the EU, we will use the EU definiton which has been in place since 2003. A company is considered an SME if

It has fewer than 250 employees
It has less than EUR 50M yearly revenues
It has less than EUR 43M in total assets

Since our data is in Serbian dinars, whose value relative to the Euro has changed significantly during 2005-2008, we will use the end-of-year exchange rate to convert revenues and total assets to their euro equivalents.

load("~/Dubrovnik/deflators-v2.rda")  ## adds deflate

# add exchange rate to data
setkey(d, year)
d <- deflate[, EURRSD, by = year][d]

# add SME marker
d[, `:=`(is.sme, TRUE)]  ## all firms as SME
d[empl >= 250 | revenues/EURRSD >= 50000 | totalAssets/EURRSD >= 43000, `:=`(is.sme, 
    FALSE)]  ## large firms not SME

# save data here
save(d, file = "~/Dubrovnik/APR-processed-v4.rda")

# only keep SME for rest of analysis
d <- d[is.sme == TRUE]

Company legal form

The dataset encompasses all firms registered in Serbia till year 2012. This includes the largest corporations, small entrepreneurs, as well as independent professionals.

Below is a summary of companies by type

d.form <- d[, list(legal.form = legal.form[1], sector.mod = sector.mod[1], entrepreneur = entrepreneur[1], 
    estab.year = year(estab.date[1])), by = id]
(tt <- table(d.form$entrepreneur))  ## TRUE for entrepreneurs

## 
##  FALSE   TRUE 
## 139358  45519

round(100 * tt/sum(tt))  ## TRUE for entrepreneurs

## 
## FALSE  TRUE 
##    75    25

and by legal form.

tt2 <- table(d.form$legal.form)
(tt2 <- sort(tt2, decreasing = TRUE))

## 
##          Limited liability                Independent 
##                     126552                      43304 
##        General partnership                Cooperative 
##                       4127                       3261 
## Unincorporated partnership           Open joint-stock 
##                       2215                       1246 
##                Joint-stock             Employee-owned 
##                        996                        813 
##                      Other        Limited partnership 
##                        689                        669 
##                     Public             Foreign branch 
##                        506                        268 
##         Closed joint-stock       Business association 
##                        140                         88 
##          Cooperative union     Foreign representation 
##                          2                          1

# percentage of most common types
round(100 * tt2[1:5]/sum(tt2))

## 
##          Limited liability                Independent 
##                         68                         23 
##        General partnership                Cooperative 
##                          2                          2 
## Unincorporated partnership 
##                          1

Firms by establishment and sector

The time of extablishment is very important for Serbian firms. Companies established before 1990 (suring socialism) were either large publicly-owned entities, or small independent firms. The decade from 1990 to 1999 marks the regime of Slobodan Milosevic, which was defined by severe isolation because of international sanctions. In the years 2000 to 2004 Serbia experienced significant liberalizing reforms, but only from 2005 do we have the institutions that pushed for individual entrepreneurship. Since we have detailed data for the activity from that time till 2012, we will treat these years individually.

year.breaks <- c(1900, 1989, 1999, 2004:2012)  ## 2005-2012 separate
period.labs <- c("Socialism", "90ies", "2000-2004", 2005:2012)
d.form[, `:=`(period, cut(estab.year, breaks = year.breaks, labels = period.labs))]

(tt3 <- table(d.form$period))  ## percentage by establishment period

## 
## Socialism     90ies 2000-2004      2005      2006      2007      2008 
##      2651     47142     38074     14650     14402     14018     12734 
##      2009      2010      2011      2012 
##     11633     10905      9671      8997

round(100 * tt3/sum(tt3))

## 
## Socialism     90ies 2000-2004      2005      2006      2007      2008 
##         1        25        21         8         8         8         7 
##      2009      2010      2011      2012 
##         6         6         5         5

We are also interested in the distribution of firms by sector

(tt4 <- table(d.form$sector.mod))

## 
##     A     B     C     D     E     F     G     H     I     J     K     L 
##  5698  2290  7662    11  3946  5706  1053    78  1399  4118  4141  5152 
##     M     N     O     P     R     S     T     U     X 
## 41643 33515  8553 14423  9710  3673  2424 27629  2039

round(100 * tt4/sum(tt4))  ## in percentage points

## 
##  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  R  S  T  U  X 
##  3  1  4  0  2  3  1  0  1  2  2  3 23 18  5  8  5  2  1 15  1

Companies who have missed 2+ years before being removed

We look into the number of years a company proceeds without reporting before beeing removed from the register. If a company files all their reports, for example if they report last in 2010 and are removed sometime during 2011, we count this value to be 0.

# get year of last report and year of removal
d.rep <- d[, list(last = max(year), removal = year(removal.date[1])), by = id]
d.rep[, `:=`(no.report, removal - last - 1)]  ## -1 no report on removal year

We look at the distribution of the number of years with no report

(tt <- table(d.rep$no.report, useNA = "ifany"))  ## summary of years w/o report

## 
##     -4     -3     -2     -1      0      1      2      3      4      5 
##      1      1     10    873  36113   6657   3966   3024   2258    972 
##      6   <NA> 
##    155 130847

tt <- data.frame(tt)
names(tt) <- c("no.report", "count")
require(ggplot2)

## Loading required package: ggplot2

ggplot(tt, aes(x = no.report, y = count)) + geom_bar(stat = "identity", width = 1, 
    fill = "gray50") + geom_text(aes(label = round(count/1000)), vjust = -0.4, 
    size = 4) + scale_y_continuous(breaks = 50000 * 0:2, labels = 50 * 0:2) + 
    labs(list(x = "Years w/o report before removal", y = "Count of firms (in 000)")) + 
    ggtitle("Distribution of no-report years before removal from registry") + 
    theme_bw()

plot of chunk unnamed-chunk-10

There are a few odd cases where this value is negative, but almost never by more than 1 year. This can be explained by companies with entries in the year of their removal. NAs represent companies that are still active.

Classification of firms by size

We now look at classifying firms by size based on their revenues, number of employees, and total assets. We summarize all firms by their average across these measures

d.sum <- d[, list(revenues = mean(revenues), empl = mean(empl), assets = mean(totalAssets)), 
    by = id]
# define breaks for size segments
rev.br <- c(-1, 100, 200, 2500, 10000, 2e+08)  ## 200M theoretical highest
empl.br <- c(-1, 4, 9, 50, 250, 21000)
asset.br <- c(-1, 50, 100, 1000, 5000, 4e+08)  ## 400M highest
size.labels <- c("Micro 5-", "Micro 5+", "Small", "Medium", "Large")
d.sum[, `:=`(revenues.group, cut(revenues, breaks = rev.br, labels = size.labels))]
d.sum[, `:=`(empl.group, cut(empl, breaks = empl.br, labels = size.labels))]
d.sum[, `:=`(assets.group, cut(assets, breaks = asset.br, labels = size.labels))]

Distribution of size by revenue

(tt <- table(d.sum$revenues.group))

## 
## Micro 5- Micro 5+    Small   Medium    Large 
##    30237     6133    55527    44902    48078

tt <- data.frame(tt)
names(tt) <- c("revenues.group", "count")
ggplot(tt, aes(x = revenues.group, y = count)) + geom_bar(stat = "identity", 
    fill = "gray50") + geom_text(aes(label = round(count/1000)), vjust = -0.4, 
    size = 4) + scale_y_continuous(breaks = 20000 * 0:2, labels = 20 * 0:2) + 
    labs(list(x = "Company size", y = "Count of firms (in 000)")) + ggtitle("Distribution of company size by revenue") + 
    theme_bw()

plot of chunk unnamed-chunk-12

Distribution of size by number of employees

(tt <- table(d.sum$empl.group))

## 
## Micro 5- Micro 5+    Small   Medium    Large 
##   149085    18936    13928     2928        0

tt <- data.frame(tt)
names(tt) <- c("empl.group", "count")
ggplot(tt, aes(x = empl.group, y = count)) + geom_bar(stat = "identity", fill = "gray50") + 
    geom_text(aes(label = round(count/1000)), vjust = -0.4, size = 4) + scale_y_continuous(breaks = 20000 * 
    0:2, labels = 20 * 0:2) + labs(list(x = "Company size", y = "Count of firms (in 000)")) + 
    ggtitle("Distribution of company size by number of employees") + theme_bw()

plot of chunk unnamed-chunk-13

Distribution of size by total assets

(tt <- table(d.sum$assets.group))

## 
## Micro 5- Micro 5+    Small   Medium    Large 
##    14693     5693    47865    54382    62244

tt <- data.frame(tt)
names(tt) <- c("assets.group", "count")
ggplot(tt, aes(x = assets.group, y = count)) + geom_bar(stat = "identity", fill = "gray50") + 
    geom_text(aes(label = round(count/1000)), vjust = -0.4, size = 4) + scale_y_continuous(breaks = 20000 * 
    0:2, labels = 20 * 0:2) + labs(list(x = "Company size", y = "Count of firms (in 000)")) + 
    ggtitle("Distribution of company size by total assets") + theme_bw()

plot of chunk unnamed-chunk-14