【文档说明】Hypothesis-Testing假设检验讲义(中英文版)(ppt-34页)课件.ppt,共(32)页,272.500 KB,由小橙橙上传
转载请保留链接:https://www.ichengzhen.cn/view-2959.html
以下为本文档部分文字说明:
1HypothesisTesting2假设检验5Population:TheUniverseDataorinformationthatdefinestheentiresetParameters(m,s)may,ormaynotbeknown.Sample:Asubsetdataorin
formationthatpossessesthesamecharacteristicsasthatofthepopulation.Wecancalculatestatistics(XBar,s).W
emakedecisionsaboutthepopulationbasedonthesampleHowmanysamplesshouldbetaken?Whyshouldwetakeasample?Shouldthe
sampleberandom?Isitpossibletohavesamplingerror?PopulationsandSamples6总体:统计总体用以定义所有可知或不可知参数(m,s)的数据或信息样品:总体中具有共同特征的子集。可以计算其形成的统计表(X).我们以样本为基础
做出总体决策应取多少样本?为何要选取样本?样本需要随机抽取吗?可能出现取样错误吗?总体和样本7Samples?WhyUseThem?Whyuseasampleinsteadofapopulation?UsingasamplereducestimeandcostCapturingd
ataontheentirepopulationmaybeverydifficult,ifnotimpossible.WhentouseasampleWeusesamplestoBaselineapro
cessUsesamplestoevaluatetheresultsofacontrolledchangetoaprocess.Howshouldthesamplebetaken?Seesection5.8样本?为何使用样本?为何采用样本而非总体?采用样本可减
少时间和成本消耗即使可能,获取总体数据也是非常困难的.何时采用样本?我们利用样本定流程基线利用样本对过程的可控变化结果进行评估.如何获取样本?请看第五部分.9SampleAAllprocesseshavevariation.Samplesfromagivenproc
essmayvary.SampleBHowcanwedifferentiatebetweensamplebased“chance”variationandatrueprocessdifference?Howcanwedependonasa
mple?10样本A所有的过程都有差异.来源于给定过程的样本也可能是多样化的.样本B我们怎样区分随机变化的样本和真实总体的差别呢?怎样使用样本?11ConfidenceIntervalsandPointEstimatesC
onfidenceintervalsidentifyarangeofplausiblevaluesforasamplestatisticofapopulationparameter.Theycanbeeitherone-sidedortwo-sided.122RiskRisk122
RiskRiskX111222222RiskRisk111222222RiskRiskXSampleMeans,SampleStandarddeviation,SampleVariancesandothersample
statisticsareknownasPointEstimatorsbecausetheyaresinglevaluesusedtorepresentpopulationparameters12可信区间和特征值的估计可信区间确定了总体参数中样本统计可能的数值范
围.它们可以是单边也可是双边。122RiskRisk122RiskRiskX111222222RiskRisk111222222RiskRiskX样本均值、样本标准偏差、样本方异和其它样本统计被称为特
征值评估者。因为它们是用以代表总体参数的单一数值。13HypothesisTestsPointEstimatesofparametersandConfidenceIntervalInterpretationarebothmeansformakinginferencesaboutsamp
ledata.Hypothesistestsaredesignedtohelpusmakeaninferenceaboutthetruepopulationvalueatthedesiredlevelofconfidence.Wewilluseconfide
nceintervalsandtestsofsamplemeans,variancesandsamplestandarddeviationtoinvestigatedifferenceandcause/effe
ctrelationshipsusingdata.HypothesisTestshelpdetermineifanapparentdifferenceisrealorcouldbeduetochance.Byusingdataandhypothesistesting,wecanquant
ifyourlevelofconfidencethatthedifferenceisreal.14假设检验对参数特征值估计和可信区间的诠释都是得出样本数据推论的路径.假设检验是用以帮助我们在需要的可信度上对真实的总体数值进行推论的。我们将用可信区间和样本均值、样本差异及样
本标准偏差测验来研究使用数据的差別和因果关系。假设检验有助于判断一个明显的差别是否真实存在还是偶然的,而且还可以提高差异真实性的可信度.15AStatisticalHypothesisAnassertionorconjectureaboutoneormoreparametersofthe
populationTodeterminewhetheritistrueorfalse,wemustexaminetheentirepopulation.Thisisimpossible!!Insteadusearandomsampletoprovide
evidencethateithersupportsordoesnotsupportthehypothesis.Theconclusionisthenbaseduponstatisticalsignificance.Itisimportanttorememberthatthiscon
clusionisaninferenceaboutthepopulationdeterminedfromthesampledata.16统计假设对于一个或多个总体里的参数的肯定或推断为了判断它的正误,我们必须检查总体
的全部。这是不可能的!!我们应使用随机样本,观察其是否能支持该假设.从而该结论是建立在统计学意义的基础之上的.必须记住该有关总体的结论是由样本推测出的.17WhyDoHypothesisTesting?1.Toimproveprocesses,weneedtoidentif
yfactorswhichimpactthemeanorstandarddeviation.2.Oncewehaveidentifiedthesefactorsandmadeadjustmentsforimprovement,weneedt
ovalidateactualimprovementsinourprocesses.3.Sometimeswecannotdecidegraphicallyorbyusingcalculatedstatistics(samplemeanandstandarddeviation)ifthe
reisastatisticallysignificantdifferencebetweenprocesses.4.Insuchcasesthedecisionwillbesubjective.5.Weperformaformalstatistical
hypothesistesttodecideobjectivelywhetherthereisadifference.Datahelpseveryonemakesthesamedecision.18为何要做假设检验?1.为了改进过程,我们需要确定影响
均值和标准偏差的因素.2.一旦确定了这些因素并对改进措施进行了调整,我们就需要验证其在过程中的切实效果。3.若过程中存在统计上的重大差别,有时我们就不能利用图表或算得的统计数据(样本均值和样本标准偏差)作出决策.4.在这种情况下,决定可能是主观的.5.我们采用正统假设检验以客观地判断是否存
在差別。数据帮助每个人作出同样的决定。19NatureofHypothesesNullHypothesis(Ho):UsuallydescribesastatusquoTheoneyouassumeunlessotherwiseshownSignsusedinMinitab:=Alte
rnativeHypothesis(Ha):UsuallydescribesadifferenceTheoneyouacceptorrejectbaseduponevidenceSignsusedinMinitab:not=or<or>OrItseitherNull(same)orA
lternative(Different)20假设的种类虚无性假设通常用以描述现状除非其它方面有所说明,否则就是人为设想的。在Minitab中用“=”表示选择性假设(Ha):通常用以描述差別以证据为基础接
受或拒绝的类型在Minitab中用“not=or<or>”表示Or不是全虚性假设(相同)就是选择性假设(相区别的)21HypothesisTestingGuiltyvs.InnocentExampleTheAmericanjustices
ystemcanbeusedtoillustratetheconceptofhypothesistesting.InAmericaweassumeinnocenceuntilprovenguilty.Innocencecorrespon
dstothenullhypothesis.Itrequiresstrongevidence,“beyondareasonabledoubt,”toconvictthedefendant.Returningaguiltyverdictcor
respondstorejectingthenullhypothesisandacceptingthealternativehypothesis.Morespecifically,wehavesignificantevidencetosupportthatadifferenceexi
sts.Ho:Personisinnocent.Ha:Personisguilty.Whatarethepossibleoutcomeswhenthetruthisknown?22假设检验有罪vs.无罪的案例
美国的司法体系可以用于阐述假设检验的概念.在美国罪犯在被判有罪之前均是清白的.清白对应虚无性假设.它需要强而有力的证据,必需“排除所有合理的怀疑”才能把被告定罪.若陪审团裁定被告有罪则相当于拒绝虚无性假设接受选择性假设
.更具体些,我们拥有重要的证据证明差別的存在.Ho:被告是清白的.Ha:被告有罪.当得知真相后,可能的结果是什么?23TruthVerdictHo,=Ha,not=Ho,=Ha,not=InnocentJailedGuiltySetFreeInnocentSetFr
eeGuiltyJailedSetFreeJailInnocentGuiltyTheTypeIError(error)isrejectingHowhenitistrue–sometimescalledtheproducer’srisk.TheTypeIIErr
or(berror)isfailingtorejectHowhenitisfalse–sometimescalledtheconsumer’srisk.DecisionType1ErrorType2ErrorbCorrectDecisionCorrectDecisionTruthH
o,=Ha,not=Ho,=Ha,not=RiskDecisionMakinginourCourtsandinBusiness24真相裁决Ho,=Ha,not=Ho,=Ha,not=清白监禁有罪释放清白释放有罪监禁释放监
禁清白有罪TypeI错误(错误)当Ho是无误时而拒绝–有时称作生产者风险TypeII错误(b错误)是当Ho有错误时却接受–有时称作消费者风险.DecisionType1错误Type2错误b正确决定正确决定真相Ho,=Ha,not=Ho,=Ha,not=法庭
和商业上的决策风险25ThepValueAnotherwaytomeasuretheriskinthedecisionisthroughthepValue.Thep-valueisknownastheObservedLevelofSignificance
forafactor.Itisthechanceofobservingthisamountofdifferenceifthesampleisconsistentwiththepopulation.Thep-valueisalsotheprobabil
ityofbeingwrongifwerejecttheNullHypothesis(TypeIError.)Unlessthereisanexceptionbasedonengineeringjudgment,wewillsetanacce
ptancelevelofaTypeIerrorat=0.05.Thus,anyp-valuelessthan0.05meanswerejecttheNullhypothesis.26p值衡量决策风险的另一种方法是通过P值.P值是指一个因素可测
的重要性水平.当样本和总体相对时,P值是指观测到其中差別的机会率.P值也指如果拒绝虚无性假设可能发生错误的概率(错误I)除非在基于工程判断上的例外,我们将错误I的可接受水平定在=0.05.从而,任何小
于0.05的P值就表示虚无性假设被拒绝。27DefiningHypothesesNullHypothesesHO:X1=TargetHO:X1=mHO:X1-X2=0HO:m1m2=0HO:X1=X2=X3=….XnHO:s1=s2
HO:S1=S2HO:S1=S2=S3=….SnAlternativeHypothesesHA:m1m2InequalitiesaretwosidedtestsHA:X1X2HA:m1<m2HA:X1<X2OneSidedt
estareusedfor<or>hypotheses.HA:m1>m2HA:X1>X2HA:X1X20___________________HA:X1X2<0___________________HA:X1X2>0___________________H
A:s1s2_______________________HA:s1<s2_______________________HA:S1>S2_______________________ScriptingHypothesesasequationsisusefulwhenstating.28定义
假设虚无性假设HO:X1=TargetHO:X1=mHO:X1-X2=0HO:m1m2=0HO:X1=X2=X3=….XnHO:s1=s2HO:S1=S2HO:S1=S2=S3=….Sn选择性假设HA:m1m2不等式是针对两边的测试HA:X1X2HA:m1<m2HA:X
1<X2单边测试用于<或>假设.HA:m1>m2HA:X1>X2HA:X1X20___________________HA:X1X2<0___________________HA:X1X2>0___________________HA:s1s2______________
_________HA:s1<s2_______________________HA:S1>S2_______________________把假设以等式陈述很有用.29HypothesisTestingProtocolThehypothesesarea
lwaysstatementsaboutthepopulationparameters.1.Stateyournullhypothesis(Ho)HO:Theheightofcitizensincountry
AisequaltotheheightofcitizensincountryB(mA=mB).StateyourAlternativeHypothesis(Ha)HA:TheheightofcitizensincountryAislessthantheheight
ofcitizensincountryB(mA<mB).2.DeterminetheappropriatestatisticaltestbasedontheHypothesisbeingtested.3.Determinethelevelofacceptablerisk.R
isk:usually5%(Default)bRisk:Usually10–20%(Default)30假设检验协议假设总是关于总体参数的陈述.1.定明虚无性假设(Ho)HO:A国与B国居民身高相等(m
A=mB).规定选择性假设(Ha)HA:A国居民身高低于B国居民的身高(mA<mB).2.基于在需被测试的假设上,决定适合的统计测试.3.决定可接受的风险程度.风险:通常5%(预设值)b风险:通常10–20%(预设值)31HypothesisTestingProtocol(
Cont)4.Determinethepropersamplesizeforthetest(Section5)5.Collectasampleofobservationsfromthepopulation.6.Calculatestatisticsbasedonthesample.7.Useast
atisticaltesttotestalternativehypothesis.8.Basedonthetestresult,weacceptorrejectHobasedonthepreviouslydeterminedcrit
erion.9.Translatetheresults.Translatethestatisticalconclusiontoapracticalone.StatisticalConclusion:Canweprovedifferencestatistically.PracticalCon
clusion:Dowecareaboutthedifference?32假设检验协议(续)4.决定适合测试的样本数量(第五部分)5.从总体中选取需观察的样本群.6.计算样本的统计.7.用统计测试方法测验选择性假设.8.在测试结果的基础上,根
据先前确定的标准,做出选择或拒绝Ho的决定.9.转化结果.将统计的结论转换为现实的结果.统计结论:我们能从统计的角度证明差別吗?现实结果:我们需注意这些差別吗?33We’llbeusinghypothesistestingasamethodtoprov
echangeTargetordesiredvalueOneSampletAremultiplesamplesthesameMean:twosampletorANOVAVariation:FtestorTestforEqualVarianceProportionofOccurren
ce:ChiSquareC234我们将假设检验作为证明差异的一种方法目标数值一个样本t若干样本是否一致一致:两个t样本或ANOVA不一致:用F测试或均等差异测试事情发生的概率:ChiSquareC2