What is data? Discuss the various types of data and
their sources available to the
researcher. Evaluate each of there sources in terms of their merits,
demerits and limitations.
(1) In computing, data
is information that has been translated into a form that is more
convenient to move or process. Relative to today's computers and
transmission media, data is information converted into binarydigital form.
2) In computer component
interconnection and network communication, data is often distinguished from
"control information," "control bits," and similar terms to
identify the main content of a transmission unit.
3) In telecommunications,
data sometimes means digital-encoded information to distinguish it
from analog-encoded information such as conventional telephone voice
calls. In general, "analog" or voice transmission requires a
dedicated continual connection for the duration of a related series of
transmissions. Data transmission can often be sent with intermittent
connections in packets that arrive in piecemeal fashion.
4) Generally and in
science, data is a gathered body of facts.
Some authorities and
publishers, cognizant of the word's Latin origin and as the plural form of
"datum," use plural verb forms with "data". Others take the
view that since "datum" is rarely used, it is more natural to treat
"data" as a singular form.
Many potential researchers
are unsure where they can obtain data to begin their research and analysis. We
should first divide the different types of data into two major classifications.
Primary Data
When someone refers to
"primary data" they are referring to data collected by the researcher
himself/herself. This is data that has never been gathered before, whether in a
particular way, or at a certain period of time. Researchers tend to gather this
type of data when what they want cannot be find from outside sources. You can
tailor your data questions and collection to fit the need of your research
questions. This can be an extremely costly task and, if associated with a
college or institute, requires permission and authorization to collect such
data. Issues of consent and confidentiality are of extreme importance. Primary
data actually follows behind secondary data because you should use current
information and data before collecting more so you can be informed about what
has already been discovered on a particular research topic.
Secondary Data
If the time or hassle of
collecting your own data is too much, or the data collection has already been
done, secondary data may be more appropriate for your research. This type of
data typically comes from other studies done by other institutions or
organizations. There is no less validity with secondary data, but you should be
well informed about how it was collected. There are a number of free services
online as well as many other made available through your current status as BYU
students.
Data Sources
Below is a list of general
data sources. For a more complete list vist our newest page by clicking
here.
Inter-University
Consortium for Political and Social Research (ICPSR)
This online data archive
provided by the Institute for Social Research at the University of Michigan is
free to all current BYU students. With the help of over 700 academic
institutions and research organizations, ICPSR has over 500,000 data files
relating to social science fields including education, aging, criminal justice,
substance abuse, and terrorism.
To register to download
data from ICPSR, you first need to be at an on-campus computer because the IP
address is what allows you access. Go
to https://www.icpsr.umich.edu/ticketlogin and select Create Account.
Fill out the information and then you can search for data that is relevant for
your research.
The Institute for
Quantitative Social Science (IQSS
Dataverse Network)
An open-source service,
this "dataverse" network is provided by The Institute for
Quantitative Social Science (IQSS) at Harvard University with over 300
"dataverses" and nearly 650,000 data files available for download.
U.S. Census Bureau (DataFerrett)
Provided on behalf of the
United States Census Bureau, patrons can download data from dozens of
government surveys including the American Community Survey (ACS), the Decennial
Census of Population and Housing (1990 and 2000 available), the National Health
and Nutrition Examination Survey (NHANES), and the Survey of Income and Program
Participation (SIPP). Pop-up blockers must be turned off to run the DataFerrett
application. It also requires a download to your computer.
General Social Survey (GSS)
From their website: "The
GSS contains a standard 'core' of demographic, behavioral, and attitudinal
questions, plus topics of special interest. Many of the core questions have
remained unchanged since 1972 to facilitate time-trend studies as well as
replication of earlier findings. The GSS takes the pulse of America, and is a
unique and valuable resource. It has tracked the opinions of Americans over the
last four decades."
Integrated Public Use
Microdata Series (IPUMS)
From Wikipedia:
"Integrated Public Use Microdata Series (IPUMS) is the world's largest
individual-level population database. IPUMS consists of microdata samples from
United States (IPUMS-USA) and international (IPUMS-International) census
records. The records are converted into a consistent format and made available
to researchers through a web-based data dissemination system. Additional
databases in the IPUMS family include: the North Atlantic Population Project,
the National Historical Geographic Information System, the Integrated Health
Interview Series (IHIS), and the Integrated Public Use Microdata Series-Current
Population Survey (IPUMS-CPS)."
Simple Online Data
Archive for Population Studies (SodaPop)
From their website:
"SodaPop makes as much data as possible available to the research
community. While we are unable to distribute some data to anyone except
PRI-affiliated researchers, most of our data collections are accessible to the
entire Penn State community, and many are also publicly available. Anyone can
use our data description pages, resource links, and variable search."
The Association of
Religion Data Archives (ARDA)
From their website: "The
Association of Religion Data Archives (ARDA) strives to democratize access to
the best data on religion. Founded as the American Religion Data Archive in
1997 and going online in 1998, the initial archive was targeted at researchers
interested in American religion. The targeted audience and the data collection
have both greatly expanded since 1998, now including American and international
collections and developing features for educators, journalists, religious
congregations, and researchers. Data included in the ARDA are submitted by the
foremost religion scholars and research centers in the world."
Geocommons (Geocommons)
This unique website houses
many independently posted data sets. The main feature of this website, however,
is the ability to map geospatial data. Data sets with addresses, county, state,
or country variables can be mapped and color-coded based on quantitative
features.
Common Core of Data (CCD)
- what types of data the proposed research will generate
- which data will have value to other research users and could be shared
- what data formats and quality standards will be applied to enable the data to be shared effectively.
- People and skills - is there sufficient expertise and resource in the research team to manage, preserve and share the data effectively?Is additional specialist expertise (or training for existing staff) required? If so, how will this be sourced?
- Infrastructure - are there appropriate computational facilities to manage, store and analyse the data generated by the research?
- Tools - will additional computational facilities and resources need to be accessed, and what will be the costs associated with this?
- Sources of Primary Data
- The sources of generating primary data are -
- Observation Method
- Survey Method
- Experimental Method
- Experimental Method
- There are number of experimental designs that are used in carrying out and experiment. However, Market researchers have used 4 experimental designs most frequently. These are -
- CRD - Completely
Randomized Design
- RBD - Randomized Block Design - The
term Randomized Block Design has originated from agricultural research. In
this design several treatments of variables are applied to different
blocks of land to ascertain their effect on the yield of the crop. Blocks
are formed in such a manner that each block contains as many plots as a
number of treatments so that one plot from each is selected at random for
each treatment. The production of each plot is measured after the
treatment is given. These data are then interpreted and inferences are
drawn by using the analysis of Variance Technique so as to know the effect
of various treatments like different dozes of fertilizers, different types
of irrigation etc.
- LSD - Latin Square Design - A
Latin square is one of the experimental designs which has a balanced two
way classification scheme say for example - 4 X 4 arrangement. In this
scheme each letter from A to D occurs only once in each row and also only
once in each column. The balance arrangement, it may be noted that, will
not get disturbed if any row gets changed with the other.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- The balance arrangement achieved in a Latin Square is its main strength. In this design, the comparisons among treatments, will be free from both differences between rows and columns. Thus the magnitude of error will be smaller than any other design.
- FD - Factorial Designs - This
design allows the experimenter to test two or more variables
simultaneously. It also measures interaction effects of the variables and
analyzes the impacts of each of the variables.
- In a true experiment, randomization is essential so that the experimenter can infer cause and effect without any bias.
- Sources of Secondary Data
- While primary data can be collected through questionnaires, depth interview, focus group interviews, case studies, experimentation and observation; The secondary data can be obtained through
- Internal Sources - These are within the organization
- External Sources - These are outside the organization
- Internal Sources of Data
- If available, internal secondary data may be obtained with less time, effort and money than the external secondary data. In addition, they may also be more pertinent to the situation at hand since they are from within the organization. The internal sources include
- Accounting resources- This
gives so much information which can be used by the marketing researcher.
They give information about internal factors.
- Sales Force Report- It
gives information about the sale of a product. The information provided is
of outside the organization.
- Internal Experts- These
are people who are heading the various departments. They can give an idea
of how a particular thing is working
- Miscellaneous Reports- These
are what information you are getting from operational reports.
- If the data available within the organization are unsuitable or inadequate, the marketer should extend the search to external secondary data sources.
- External Sources of Data
- External Sources are sources which are outside the company in a larger environment. Collection of external data is more difficult because the data have much greater variety and the sources are much more numerous.
- External data can be divided into following classes.
- Government Publications- Government
sources provide an extremely rich pool of data for the researchers. In
addition, many of these data are available free of cost on internet
websites. There are number of government agencies generating data. These
are:
- Registrar General of India- It is
an office which generate demographic data. It includes details of gender,
age, occupation etc.
- Central Statistical Organization- This
organization publishes the national accounts statistics. It contains
estimates of national income for several years, growth rate, and rate of
major economic activities. Annual survey of Industries is also published
by the CSO. It gives information about the total number of workers
employed, production units, material used and value added by the
manufacturer.
- Director General of Commercial
Intelligence- This
office operates from Kolkata. It gives information about foreign trade
i.e. import and export. These figures are provided region-wise and
country-wise.
- Ministry of Commerce and
Industries- This
ministry through the office of economic advisor provides information on
wholesale price index. These indices may be related to a number of sectors
like food, fuel, power, food grains etc. It also generates All India
Consumer Price Index numbers for industrial workers, urban, non manual
employees and cultural labourers.
- Planning Commission- It
provides the basic statistics of Indian Economy.
- Reserve Bank of India- This
provides information on Banking Savings and investment. RBI also prepares
currency and finance reports.
- Labour Bureau- It
provides information on skilled, unskilled, white collared jobs etc.
- National Sample Survey- This
is done by the Ministry of Planning and it provides social, economic,
demographic, industrial and agricultural statistics.
- Department of Economic Affairs- It
conducts economic survey and it also generates information on income,
consumption, expenditure, investment, savings and foreign trade.
- State Statistical Abstract- This
gives information on various types of activities related to the state like
- commercial activities, education, occupation etc.
- Non Government Publications- These
includes publications of various industrial and trade associations, such
as
- The Indian Cotton Mill Association
- Various chambers of commerce
- The Bombay Stock Exchange (it publishes a directory containing financial accounts, key profitability and other relevant matter)
- Various Associations of Press Media.
- Export Promotion Council.
- Confederation of Indian Industries ( CII )
- Small Industries Development Board of India
- Different Mills like - Woolen mills, Textile mills etc
- The only disadvantage of the above sources is that the data may be biased. They are likely to colour their negative points.
- Syndicate Services- These
services are provided by certain organizations which collect and tabulate
the marketing information on a regular basis for a number of clients who
are the subscribers to these services. So the services are designed in
such a way that the information suits the subscriber. These services are
useful in television viewing, movement of consumer goods etc. These
syndicate services provide information data from both household as well as
institution.
- In collecting data from household they use three approaches
- Survey- They
conduct surveys regarding - lifestyle, sociographic, general topics.
- Mail Diary Panel- It
may be related to 2 fields - Purchase and Media.
- Electronic Scanner Services- These
are used to generate data on volume.
- They collect data for Institutions from
- Whole sellers
- Retailers, and
- Industrial Firms
- Various syndicate services are Operations Research Group (ORG) and The Indian Marketing Research Bureau (IMRB).
- Importance of Syndicate Services
- Syndicate services are becoming popular since the constraints of decision making are changing and we need more of specific decision-making in the light of changing environment. Also Syndicate services are able to provide information to the industries at a low unit cost.
- Disadvantages of Syndicate Services
- The information provided is not exclusive. A number of research agencies provide customized services which suits the requirement of each individual organization.
- International Organization- These
includes
- The International Labour
Organization (ILO)- It publishes data on the total and active
population, employment, unemployment, wages and consumer prices
- The Organization for Economic
Co-operation and development (OECD)- It publishes data on
foreign trade, industry, food, transport, and science and technology.
- The International Monetary Fund
(IMA)- It
publishes reports on national and international foreign exchange
regulations.
From their website:
"The Common Core of Data (CCD) is a program of the U.S. Department of
Education's National Center for Education Statistics that annually collects
fiscal and non-fiscal data about all public schools, public school districts
and state education agencies in the United States. The data are supplied by
state education agency officials and include information that describes schools
and school districts, including name, address, and phone number; descriptive
information about students and staff, including demographics; and fiscal data,
including revenues and current expenditures."
EconData.net (EconData)
From their website:
"We have 1,000 links to socioeconomic data sources, arranged by subject
and provider, pointers to the Web's premiere data collections, and our own list
of the ten best sites for finding regional economic data."
World Bank Data (WorldBank)
From their website:
"At the World Bank, the Development Data Group coordinates statistical and
data work and maintains a number of macro, financial and sector databases.
These databases are used by teams to prepare Country Assistance Strategies, poverty
assessments, research studies and other forms of economic and sector
work. This site is meant to provide all users with improved access to
World Bank data and to make that data easy to find and use."
Panel Study of Income
Dynamics (PSID)
From their website:
"The study began in 1968 with a nationally representative sample of over
18,000 individuals living in 5,000 families in the United States. Information
on these individuals and their descendants has been collected continuously,
including data covering employment, income, wealth, expenditures, health,
marriage, childbearing, child development, philanthropy, education, and
numerous other topics."
Statistics in Sports:
Sports Data Resources (AMSTAT)
This site hosted by the
American Statistical Association lists sports data resources around the
web.
Researchers should
maximise access to research datasets of value to the wider research community
in a timely and responsible manner. Any data that is shared should be of a
sufficiently high quality that it will have value to other researchers and
should be provided in a format that enables it to be used effectively.
15. We recognise that
in some cases it may not be appropriate for researchers to share their data.
However, if your research meets the criteria for requiring a data management
and sharing plan but you are intending not to share your data, the reasons for
this must be clearly justified.
16. Data should be
shared in accordance with recognised data standards where these exist, and in a
way that maximises opportunities for data linkage and interoperability.
Sufficient metadata must be provided to enable the dataset to be used by
others. Agreed best practice standards for metadata provision should be adopted
where these are in place.
17. When developing
data management and sharing plans, researchers should therefore consider and
briefly describe:
In preparing data
management and sharing plans, researchers should consider carefully any
resources they may need to deliver their plan.Where dedicated resources are
required, these should be outlined and justified as part of the plan.
41. Issues to consider
include:
I am always left astounded at the level of dedication and hard work you put in every situation. May you reach every height of success!
ReplyDeletescanner services