Here we look at some summary statistics for Small and Medium Enterprises from the APR data to form an idea of its contents.
require(data.table)
## Loading required package: data.table
setwd("~/Dubrovnik/")
load("APR-processed-stage1-v3.rda") ## load d, the main data
setkey(d, id)
names(d) ## list of columns available
## [1] "year" "deflator" "id"
## [4] "capital" "costs" "empl"
## [7] "fixedAssets" "foreignOwners" "interest"
## [10] "liabLT" "liabST" "liabSTfin"
## [13] "loansLT" "revenues" "totalAssets"
## [16] "wages" "state.connection" "state.owner.type"
## [19] "state.share" "state.legal.form" "priv.method"
## [22] "reg.code" "region" "district.code"
## [25] "district" "town.code" "town"
## [28] "muni.code" "muni" "estab.date"
## [31] "removal.date" "sector.mod" "ind.code"
## [34] "trade.sector" "trade.name" "legal.form"
## [37] "status" "sitc" "compet.cms"
## [40] "market.growing" "entrepreneur" "restructuring"
## [43] "export.rank" "export"
Some companies have never reported a single year during our 8-year observation period. We remove them.
sum(is.na(d$year)) ## how many are there?
## [1] 3399
d <- d[!is.na(year)] ## only keep those with non-NA year
We are only interested in SMEs, which have a clear definition in Serbian law. However, since Serbia is planning to align it's definition with the EU, we will use the EU definiton which has been in place since 2003. A company is considered an SME if
Since our data is in Serbian dinars, whose value relative to the Euro has changed significantly during 2005-2008, we will use the end-of-year exchange rate to convert revenues and total assets to their euro equivalents.
load("~/Dubrovnik/deflators-v2.rda") ## adds deflate
# add exchange rate to data
setkey(d, year)
d <- deflate[, EURRSD, by = year][d]
# add SME marker
d[, `:=`(is.sme, TRUE)] ## all firms as SME
d[empl >= 250 | revenues/EURRSD >= 50000 | totalAssets/EURRSD >= 43000, `:=`(is.sme,
FALSE)] ## large firms not SME
# save data here
save(d, file = "~/Dubrovnik/APR-processed-v4.rda")
# only keep SME for rest of analysis
d <- d[is.sme == TRUE]
The dataset encompasses all firms registered in Serbia till year 2012. This includes the largest corporations, small entrepreneurs, as well as independent professionals.
Below is a summary of companies by type
d.form <- d[, list(legal.form = legal.form[1], sector.mod = sector.mod[1], entrepreneur = entrepreneur[1],
estab.year = year(estab.date[1])), by = id]
(tt <- table(d.form$entrepreneur)) ## TRUE for entrepreneurs
##
## FALSE TRUE
## 139358 45519
round(100 * tt/sum(tt)) ## TRUE for entrepreneurs
##
## FALSE TRUE
## 75 25
and by legal form.
tt2 <- table(d.form$legal.form)
(tt2 <- sort(tt2, decreasing = TRUE))
##
## Limited liability Independent
## 126552 43304
## General partnership Cooperative
## 4127 3261
## Unincorporated partnership Open joint-stock
## 2215 1246
## Joint-stock Employee-owned
## 996 813
## Other Limited partnership
## 689 669
## Public Foreign branch
## 506 268
## Closed joint-stock Business association
## 140 88
## Cooperative union Foreign representation
## 2 1
# percentage of most common types
round(100 * tt2[1:5]/sum(tt2))
##
## Limited liability Independent
## 68 23
## General partnership Cooperative
## 2 2
## Unincorporated partnership
## 1
The time of extablishment is very important for Serbian firms. Companies established before 1990 (suring socialism) were either large publicly-owned entities, or small independent firms. The decade from 1990 to 1999 marks the regime of Slobodan Milosevic, which was defined by severe isolation because of international sanctions. In the years 2000 to 2004 Serbia experienced significant liberalizing reforms, but only from 2005 do we have the institutions that pushed for individual entrepreneurship. Since we have detailed data for the activity from that time till 2012, we will treat these years individually.
year.breaks <- c(1900, 1989, 1999, 2004:2012) ## 2005-2012 separate
period.labs <- c("Socialism", "90ies", "2000-2004", 2005:2012)
d.form[, `:=`(period, cut(estab.year, breaks = year.breaks, labels = period.labs))]
(tt3 <- table(d.form$period)) ## percentage by establishment period
##
## Socialism 90ies 2000-2004 2005 2006 2007 2008
## 2651 47142 38074 14650 14402 14018 12734
## 2009 2010 2011 2012
## 11633 10905 9671 8997
round(100 * tt3/sum(tt3))
##
## Socialism 90ies 2000-2004 2005 2006 2007 2008
## 1 25 21 8 8 8 7
## 2009 2010 2011 2012
## 6 6 5 5
We are also interested in the distribution of firms by sector
(tt4 <- table(d.form$sector.mod))
##
## A B C D E F G H I J K L
## 5698 2290 7662 11 3946 5706 1053 78 1399 4118 4141 5152
## M N O P R S T U X
## 41643 33515 8553 14423 9710 3673 2424 27629 2039
round(100 * tt4/sum(tt4)) ## in percentage points
##
## A B C D E F G H I J K L M N O P R S T U X
## 3 1 4 0 2 3 1 0 1 2 2 3 23 18 5 8 5 2 1 15 1
We look into the number of years a company proceeds without reporting before beeing removed from the register. If a company files all their reports, for example if they report last in 2010 and are removed sometime during 2011, we count this value to be 0.
# get year of last report and year of removal
d.rep <- d[, list(last = max(year), removal = year(removal.date[1])), by = id]
d.rep[, `:=`(no.report, removal - last - 1)] ## -1 no report on removal year
We look at the distribution of the number of years with no report
(tt <- table(d.rep$no.report, useNA = "ifany")) ## summary of years w/o report
##
## -4 -3 -2 -1 0 1 2 3 4 5
## 1 1 10 873 36113 6657 3966 3024 2258 972
## 6 <NA>
## 155 130847
tt <- data.frame(tt)
names(tt) <- c("no.report", "count")
require(ggplot2)
## Loading required package: ggplot2
ggplot(tt, aes(x = no.report, y = count)) + geom_bar(stat = "identity", width = 1,
fill = "gray50") + geom_text(aes(label = round(count/1000)), vjust = -0.4,
size = 4) + scale_y_continuous(breaks = 50000 * 0:2, labels = 50 * 0:2) +
labs(list(x = "Years w/o report before removal", y = "Count of firms (in 000)")) +
ggtitle("Distribution of no-report years before removal from registry") +
theme_bw()
There are a few odd cases where this value is negative, but almost never by
more than 1 year. This can be explained by companies with entries in the year
of their removal. NA
s represent companies that are still active.
We now look at classifying firms by size based on their revenues, number of employees, and total assets. We summarize all firms by their average across these measures
d.sum <- d[, list(revenues = mean(revenues), empl = mean(empl), assets = mean(totalAssets)),
by = id]
# define breaks for size segments
rev.br <- c(-1, 100, 200, 2500, 10000, 2e+08) ## 200M theoretical highest
empl.br <- c(-1, 4, 9, 50, 250, 21000)
asset.br <- c(-1, 50, 100, 1000, 5000, 4e+08) ## 400M highest
size.labels <- c("Micro 5-", "Micro 5+", "Small", "Medium", "Large")
d.sum[, `:=`(revenues.group, cut(revenues, breaks = rev.br, labels = size.labels))]
d.sum[, `:=`(empl.group, cut(empl, breaks = empl.br, labels = size.labels))]
d.sum[, `:=`(assets.group, cut(assets, breaks = asset.br, labels = size.labels))]
Distribution of size by revenue
(tt <- table(d.sum$revenues.group))
##
## Micro 5- Micro 5+ Small Medium Large
## 30237 6133 55527 44902 48078
tt <- data.frame(tt)
names(tt) <- c("revenues.group", "count")
ggplot(tt, aes(x = revenues.group, y = count)) + geom_bar(stat = "identity",
fill = "gray50") + geom_text(aes(label = round(count/1000)), vjust = -0.4,
size = 4) + scale_y_continuous(breaks = 20000 * 0:2, labels = 20 * 0:2) +
labs(list(x = "Company size", y = "Count of firms (in 000)")) + ggtitle("Distribution of company size by revenue") +
theme_bw()
Distribution of size by number of employees
(tt <- table(d.sum$empl.group))
##
## Micro 5- Micro 5+ Small Medium Large
## 149085 18936 13928 2928 0
tt <- data.frame(tt)
names(tt) <- c("empl.group", "count")
ggplot(tt, aes(x = empl.group, y = count)) + geom_bar(stat = "identity", fill = "gray50") +
geom_text(aes(label = round(count/1000)), vjust = -0.4, size = 4) + scale_y_continuous(breaks = 20000 *
0:2, labels = 20 * 0:2) + labs(list(x = "Company size", y = "Count of firms (in 000)")) +
ggtitle("Distribution of company size by number of employees") + theme_bw()
Distribution of size by total assets
(tt <- table(d.sum$assets.group))
##
## Micro 5- Micro 5+ Small Medium Large
## 14693 5693 47865 54382 62244
tt <- data.frame(tt)
names(tt) <- c("assets.group", "count")
ggplot(tt, aes(x = assets.group, y = count)) + geom_bar(stat = "identity", fill = "gray50") +
geom_text(aes(label = round(count/1000)), vjust = -0.4, size = 4) + scale_y_continuous(breaks = 20000 *
0:2, labels = 20 * 0:2) + labs(list(x = "Company size", y = "Count of firms (in 000)")) +
ggtitle("Distribution of company size by total assets") + theme_bw()