21
Feb 2015, Shailendra Kadre and Venkat Reddy
By
the end of this blog post, you will get some basic idea the following concepts
that are essential for proceeding with business analytics techniques:
1.
The difference between population and
sample
2.
Different types of sampling
3.
The difference between variable and
parameter
4.
The differences between descriptive,
inferential, and predictive statistics
5.
The steps involved in solving a business
analytics problem
1. Population and Sample
Population
is the complete set of objects or data records that are available for an
analytics project or data analysis. For example, in a countrywide marketing
campaign, a narrowed-down list of the country’s citizens will form the population for the analytics problem.
Generally it might not be possible to analyze the entire population because of
the sheer size of the data, availability of time, funding, or limited
processing power of available computing machines. These reasons may compel you
to consider only a subset of the population. This subset is usually referred to
as a sample in statistical
terminology. If properly chosen, analyzing with a sample can be as good as
analyzing the full population.
2. Different Types of Sampling
A
sample can be formally defined as the
subset of a population that is selected for analysis. The procedure of creating
or collecting this subset is called sampling.
Sometimes, it might be necessary to manually collect some records from the
overall population. There are several types of sampling techniques. The
following are the ones that are most commonly used in business analytics
projects.
Simple
random sampling is the most commonly used sampling method. Randomly choosing
some records from a population (denoted by n) is called simple random sampling.
There are several methods for deciding on the right sample size. Sometimes the
business problem that we are handling gives us an idea of the sample size. Once
the sample size (n) has been decided based on one of the methods, records are
randomly selected from the population. Convenient functions are available in
SAS for this purpose.
A classic
example of random sampling is of a blindfolded man picking up ten apples from a
basket full of apples. All the apples have an equal probability of being picked
from the basket.
Consider
an example population, which has preexisting segments of same or different
sizes. Segments are the population records that are already classified into a
distinct number of subgroups. In such a case, it is best to do a random
sampling from each segment; as such, a sample will truly represent the nature
of such population.
The
size of each segment can be based upon the proportion of that segment in the
entire population. Such segments are usually referred to as strata. The process
of simple random sampling from each strata is called stratified sampling.
Segments can be manually created, and stratified sampling can be performed even
when there are no obvious segments in the population.
For
example, if 1,000 random candidates are to be picked from across the country
for a sporting event, it might be a good idea to pick them proportionately from
each state.
Systematic
sampling is based on a fixed rule, like picking every fifth or seventh
observation from a given population. It is different from random sampling,
wherein any random values are
picked. This type of sampling is generally done if testing is a continuous
process. Recording the room temperature every 60 minutes or measuring the blood
pressure of a patient every 10 minutes are examples of systematic samples.
·
Example:
Consider a mass manufacturing machine that produces simple bolts to be used in
a chemical plant erection project. Every 30th bolt manufactured by the machine
can be collected as sample. This may look like a random sample from the whole
lot, but you are not actually waiting for the whole lot to form; instead, you
are collecting your sample much before creating the heap.
3. The difference between variable and parameter
Simply
put, a variable in a statistical data
table is nothing but a column or a field in the table, a feature that may
change its value from one record to another. It may well be a numeric, which
can be measured for each record, or a non-numeric such as city, gender, or a
status field containing Yes or No entries. Other examples are age, monthly
income, daily sales, and cost data. The following are the major types of
variables that a population or a sample may contain.
Non-numeric,
qualitative, and categorical variables are the type of variables that represent
quality or a characteristic field.
Examples
are shirt sizes expressed as S, M, L, XL, and XXL, or distance, which is
expressed as near and far. It can as well be a Boolean value like a pass or a
fail or a yes or no field.
Parameter
A parameter
is a measure that is calculated on the entire population. Any summary measure
that gives information of population is called a parameter.
For
example, take the data on electricity utility bills of an entire state like
California. It will be huge by any
standards because it represents the variables such as name, address, type of
connection, month, units consumed, and the bill amount for all households in
the state. Now for planning purposes, that is, to forecast the electricity
demand for the next five years in the state, if you calculate the averages on
all the state’s households for the variables like units consumed and bill
amount, it will be termed as parameters. So, two example parameters, that is,
the entire state’s average units consumed per household and the average bill
amount may look like 650 units and $100, respectively. These parameters are
calculated on the entire population, which might be really large at times. So,
it’s not hard to predict that it may require huge amount of computational
effort.
There are
three methods of Statistical analysis: descriptive, inferential, and
predictive. In descriptive statistics methods, the data is simply summarized
using statistical central tendencies and variations. In inferential statistics,
a sample is drawn from the population to infer on the full set of data or
population. Predictive statistics, as expected, can predict the dependent
variable using methodologies such as linear and logistic regression.
The
typical steps in problem solving in Business Analytics are as follows
Many
thanks to you for spending time reading this article. Much more on this and
many other topics is available in the book, Practical
Business Analytics Using SAS: A Hands-on Guide by Venkat Reddy and Shailendra
Kadre. You can buy it right now at Amazon.
The authors are reachable at shailendrakadre@gmail.com
and 21.venkat@gmail.com.
mmorpg oyunlar
ReplyDeleteinstagram takipçi satın al
Tiktok jeton hilesi
tiktok jeton hilesi
antalya saç ekimi
referans kimliği nedir
instagram takipçi satın al
metin2 pvp serverlar
TAKİPCİ SATİN AL
kartal beko klima servisi
ReplyDeleteümraniye beko klima servisi
beykoz lg klima servisi
üsküdar lg klima servisi
tuzla alarko carrier klima servisi
tuzla daikin klima servisi
çekmeköy toshiba klima servisi
beykoz daikin klima servisi
üsküdar daikin klima servisi